 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We would like to thank you for joining today's Data Diversity Webinar, Approaching Data Governance Strategically. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions, he will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right hand corner of your screen, or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag dataed. And if you like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle for that feature. And to continue the conversation and networking after the webinar, just go to community.dativersity.net. To ask the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days, continuing links to the slides. And yes, we are recording and will likewise send a link of the recording of this session. I can speak today. As well as any additional information requested throughout the webinar. Now, let me introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. He has written dozens of articles and 11 books. The most recent is Your Data Strategy. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his expertise. Peter has spent multi-year immersions with groups as diverse as the US Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Peter, hello and welcome. Welcome, Shannon. Thank you so much for your kind introduction that you always do for us and for getting us hosted and everything else. So welcome everybody today. It's a beautiful day on the east coast. And so we'll see how it goes. Of course, you're not thinking about that. You're thinking about, I've got to do some data governance and I got to find out what it is. Well, let's talk about it. First part of this from a strategic perspective is that you need to understand that there's generally low understanding of data related issues out there in the world. And that's a challenge for you because now you're going to tell people to manage something that they really have trouble figuring out exactly what it is. Data by itself is uniquely valuable. And unfortunately, however, largely composed of rot. We'll come back to the definition of rotten just a little bit. And so what I'll leave you with is sort of the case for better decisions around data. Three particular strategies. First, keep data governance practically focused. The discipline is quite immature at this point. And going by anybody's book is generally not a good starting place. There is one exception to that rule. So I'm advocating a more targeted approach to data governance that has worked much better, I think, than the over planned scenarios most organizations try to adopt. Strategy number two is that data governance must equal HR at the programming level. I say that in the sense that data governance must be implemented as part of a larger data management program that you are implementing in your organization. It is central to how data management works. And it must be decoupled in particular from it strategy, which is focused at a different level of granularity. This will give us make it the data strategy that we do in there. And our data governance strategy should be focused directly on the organization strategy doing something useful. Because if we're not doing something useful, it becomes a problem as well. Finally, our third strategy, gradually add ingredients don't try and start off with a full plate right away. Again, the analogy here is if you're driving, you know, let somebody else do the gear shifting for you and you keep driving straight down the road or try it with an automatic transmission first and then add stick shifting later. So we're going to talk about ingredients that we can add, which include frameworks, good place to start, stewards will talk about how they work and again, the best way to introduce that to your organization. We'll look at some checklist and approach and really key to this is just talking about some worst practices to do. We'll try to finish up with some governance in action, a little bit of storytelling, how important storytelling is to governance as a profession. And then as always, we'll finish with takeaways and references in a QA session that we look to get to about 56 minutes from now. So they start out with data is a confused subject. IT thinks it's a business problem. But their attitude is generally if I can connect to the server, then my job is done because they feel that interpretation of data is not necessarily in their forte. The business, however, thinks IT is managing the data. After all, there's a title of somebody called a chief information officer, what else would that individual be doing? As a result, data has fallen into a giant chasm between business and IT. And it's up to data governance to come back and work with both groups to try and repair that particular set of broken issues that we have right at the moment. Now data, we're trying to get people to think about them from an asset perspective, especially from a governance perspective, there's no point in governing them if they are not assets, they are assets, therefore, they need governance, government is the proper sorry, governance, I'll say it right the third time. There we go. Picked up your thing Shannon today. Anyway, data is our most powerful underutilized poorly managed asset is the only asset that you have that doesn't deplete. It doesn't degrade over time and it is a durable asset, meaning that you're going to invest money into it and hope to get more additional resources out of it in the long run. Most people, however, think of things like data as a new oil. If you Google that phrase, you'll see 5 million phrases out there, 5 million hits. It's just crazy. It's the wrong way to think about data because oil is a production function. You use it one thing, you're done with it. That's not the way data works. A better way to think about data when somebody shows you this phrase and says, look, data's the new oil. It's kind of nice. How about the new soil? Soil implies that we're going to have two things that are different from oil. First, soil requires some preparation where you actually prepare the pallet of grant land that you're going to plant this in and you don't just walk around the yard and then throw seeds randomly and hope the good things happen. Secondly, you don't plant things on Monday and expect to eat them on Friday. It takes time to develop it. On the other hand, back to the storytelling, if you call it bacon and people at least pay attention to it, that may be progress in most of the organizations that I've worked with over the past 35 years. So data deserves its own strategy. It deserves attention on par with similar organizational assets and it requires professional administration to make up for past neglect. Again, these confounding characteristics are problematic. Data is complex and detailed and because it's complex and detailed, outsiders generally don't have patience to sit around and talk to you about it. If you understand it, you have to find other data people to talk to because most of the people who are hearing about this are unqualified from an architecture engineering perspective. As a result, data is inconsistently taught. The business impact is not well understood and it has to be learned at the work group level by each work group that is out there in your organization. There are better ways to approach us. Here is an analogy that I love that a colleague from Finland showed me many, many years ago, but this is a guy who is doing a great job by playing the piano by throwing pink balls at it. Well, he got very good at it, but gosh, if the approach was designed to actually just play the piano, there are better ways of approaching it. And of course, he doesn't know this. So it's a problem for us organizationally. Next very confusing issue about data is that we have a lot of wheat versus chaff. Now, first of all, most would agree that better organized data increases in value. If data is poorly organized, it takes people who get paid money longer to find things. Therefore, poor data management practices are costing organizations a lot of time, money and effort. And in our measurements, 80% of organizational data out there is rot. I hope I don't scare you data governance professionals because you're managing a bunch of rot, but it is actually kind of important. We'll see how that insight plays into it a little bit later on. By the way, the only argument I ever have gotten in 35 years, about 80% of data being rot is that that number is too small. I have had organizations that say that up to 95% of their organizational data is rot. What is rot? You keep asking. It is data that is redundant data that is obsolete data that is trivial. And therefore, we shouldn't spend any resources looking at it at all. Actually, my wife corrects me slightly on this one. It's redundant incomplete, obsolete or trivial. I will complete it with trivial, but I know I'd be calling your data a riot. Today's political climate that may get us a visit from people. Never mind. We're not going there. All right. So let's look at some numbers around this though for how much data there is against that. So it shows 60 to 70% of enterprise data is never analyzed. 54% is unidentified. 30% is rot in the one study that came up which left them with 14% business critical data. So here's the first takeaway from all of this. If you're trying to manage everything, it's too much. Let's find out which parts are the good bits and manage them and get rid of the parts that are just in the way through a program dedicated to reduce, recycle and reusing data. That's not the subject of this conversation. However, it is important to recognize that that's the way it works. We of course have no surprising news here that says that goodness gracious, half of corporations have actually made data errors, made an inaccurate business decision based on good or bad data. And that is because unfortunately business decision makers are not data knowledgeable and technical decision makers are not data knowledgeable. Therefore, the two of them combine to make bad data decisions. Those bad data decisions lead to poor treatment of organizational data assets, as well as poor quality data. Each of those things leads to poor organizational outcomes. And if we don't fix it, the process continues to spiral downward. So we have to find ways of getting our data better and also our decision making processes around data better. Data governance is important because it costs organizations millions each year in productivity in redundant and siloed efforts in poorly thought out hardware and software purchases in delayed decision making and in reactive instead of proactive initiatives. Again, 20 to 40% of all IT spending can be reduced through better data governance. And that is the angle in this political and employment climate that we should be focusing on for this particular sort of environmental complexities that we're dealing with. Sorry. I'm going to go back to productivity for just a quick second here, the first item on the list. And that is the subject of the next book that's coming out, which I think I'm about two weeks from finishing the first draft on. Todd and I have been working very hard on this and we believe that we can save knowledge workers an hour a day. Now, an hour a day for knowledge workers may not sound like a lot, but it is a lot. And again, I think we'll see some buzz around that as well. We've got some pretty good numbers to look at this. So that's our data confounding characteristics. Let's look at keeping the organization focused. And the idea here is the discipline is relatively immature is not a best set of practices out there. We have some things that look like they're working well and things that they don't. But by gosh, we have not had time to do academic studies. So by the book is not a great place to start. And we'll talk about a more targeted approach to data governance in here. So I'd like to start out with a question. How old is your profession? I'm married to an accountant. So I have to be careful about the way I ask that profession. But we do have accounting records. I'm showing you in the bottom left hand corner of this page that are in cuneiform. They were written 2000 years, excuse me, 6000 years before DC. And that means that the accounting practices had 8000 years to formalize its practices around a set of pieces that we call generally accepted accounting practices. And these are wonderful pieces that allow organizations to look and say we are following the right type of practices and procedures. Our profession in the data world may be traced back to Ada Lovelace in the 1850s or so in terms of looking at everything. And it's okay to recognize that we are currently immature. It's not okay for us to stay immature. And we've got to do better than that. So your organization is certainly familiar with the concept of corporate governance. And again, a couple of definitions here, I won't bother to read them to you. Just understand even that this area is in the papers and rapidly evolving. Just last August 24th, CEOs from major corporations said it's not all about the bottom line anymore. And that's kind of an interesting approach. We will have to wait and see what happens to see whether that is going to stick or not. But it's still, nevertheless, a good conversation to have. So if we've got corporate data governance, we have IT governance. And IT governance is about aligning IT strategy with business strategy. After all, they should be aligned. If they're not, it's going to be harder to achieve objectives. IT governance is around focusing on measurable results and answering particular questions, such as, are they strategically aligned? What type of value delivery system is appropriate? Do they have the right type of resource management? Are they managing risks appropriately? And do they have the right performance measures in place? As long as we're doing definitions, let me give you one more book. This is my colleague and friend, John Lattely's book on data governance, second edition notice on this, get the second edition. It's not necessarily a specific cookbook, but it is definitely the best piece that I use for governance around all of these, because if you look on the internet, there are about seven generalized definitions that have come up over time. I'm not going to go through these with you. It's just not worthwhile because it's not a concept that's easy to explain to management. The question is, what is data governance? And the best way to explain it to people is managing data with guidance. If we understand that we're managing data with guidance through our data governance efforts, people get that. It's a very simple message, and it's something that we should ask. Also, if you ask it in this fashion, it gives a good question back to people, would you want your sole non-depletable, non- degrading, durable strategic asset managed without guidance? Of course, the answer is no. On the other hand, we're an experienced group on this call here, so I'm going to add one other word here. Even though we talk about managing data with guidance to outsiders on this, what data governance is also about is managing data decisions with guidance, and that's really important to understand. Let's see how that works out. First of all, if we start out with our dimbok wheel here, and you'll notice that data governance is central to everything that's happening in there, there is a framework that we put together in Dama that we talk about here. It's an input-output diagram. It's not a terribly articulate framework, but it is nevertheless a good one that describes all of this. Again, I'm not going to go through these with you. You get these slides, so you will be able to take a look at these later on and see how it works, but what I want to do is show you how it works in practice, and this is not something that we put in the dimbok. It's probably something that we should have. For example, what we do is we call it iteration of a data governance exercise and implementation of a data strategy, so we may have a goal to do something around some combination of three, and this is really the key, three areas in the dimbok here because in most organizations they've had a chance to do these things one time exactly on this for a structured approach to this. The second time we go through that, we may do it slightly differently keeping the data warehousing, which means it's the second time we've gone through that exercise, but the first time we've gone through metadata management, and again, twice also going through data governance. Again, our third iteration of this may be again a warehousing and a BI play, but then we go over here to reference and master data management. Now notice we've done data governance three times, doorhousing three times. We ought to be getting good at these things, but we've done metadata and reference and master data one each another time on this. The key for these is to focus in on a specific project that allows the organization to go further. Some aspect of data governance that is a clear support for an organizational strategy. Also find some data that is used by the business of course in there and look for an opportunity to practice your data skills as I just showed you, we practiced data governance three times through that last exercise. That is the very definition of a lighthouse type project that allows you to find that sweet spot that allows you to go through and exercise each of these things. Let's talk for a minute now about the difference between data governance and data management. Governance is guidance policy edict. For example, if I mark something or make a policy in data governance that says all information not marked public should be considered confidential. That means when people walk into my office under that data governance guideline, I will be turning papers over face down because it is not marked public and I have a person in the office who doesn't appreciate that. Similarly, data management here as well. The idea of data management is that we're doing the things that the policy aspect is governing. So we have to make sure that we work on these. The key to this is getting organizations to understand the transition that they need to go through. At the moment of the vast majority of organizations out there are trying to do data as a corollary or an ancillary to IT projects. That is generally not going to work well. And the reason for that is because programs and projects operate on a different basis. Programs are ongoing. They have a no end data. Similarly is ongoing. It generally does not have an end where the project of course does have an end. Programs are tied to a financial calendar which means they have ongoing sources of resources to do and a responsibility to show how those resources were accounted for. That program management is governance intensive and that's an appropriate level to do this type of governance work out as well. They've got a greater scope of financial management in that they're not strictly on expenditures but also in management of the expenditures. And change management is a very key aspect of this. Now it's just literally down the hall here at Virginia Commonwealth University talking to a colleague of mine because the HR departments have not been around as a centralized fashion as they currently are longer than about 80 years. And so we'd like to say in the data world and particularly in the data governance world, your data governance program, therefore your data program must last as least as long as your HR program. Just imagine somebody saying to the organization okay well I think we're done with HR now we don't need it anymore. We've had plenty of HRs. We've got everybody higher. The lawyers are all happy at the moment. Go ahead HR we don't need you anymore. Not going to happen. And we have to create that same tie in organizational minds, mind managers, and executive leadership that your HR program and your data program are going to have the same lifespan and need the same kinds of support. Another aspect of this is whether organizations are focused on IT, the vast majority of them are, or what we call application-centric project development. There's a good reason for this. Organizations start out with a strategy and they create some IT projects to implement that strategy and data and information are sort of the tail, leg, and the dog. Nobody really has an idea of what's going on about that. Dave McHome has a wonderful book on this called Software Wasteland. I would encourage you to read that book because it makes a couple of points in here. Data is always going to be formed around the applications not around the organization-wide requirements. The process architecture is narrowly conformed to the applications. There's practically no chance for data reuse given this type of an approach here. So even though it seems rational, strategy, IT projects, data and information, it is however wrong. Before we get to the right way, let's just take a look and see why that might be the case. The organizational strategy starts off at the top, again, whatever that organizational strategy is, of course, should be supportive of the organizational strategy, but in the past we've also said the data strategy should be subordinate to IT. I want you to picture Morgan Freeman saying this is wrong. I can't do his voice, but it is absolutely incorrect on this. The correct way to look at this instead is to say that organizational strategy is to be supported by both an IT and a data strategy. And given that kind of an approach, I would also say that the data strategy is dominant to the IT strategy because the IT strategy is focused at a lower level of granularity than at the data strategy level. And that is very critical for everybody to understand. If you tell most organizations this is what they need to do, their heads explode because it doesn't work that way currently in them. Why doesn't it work that way because we in the academic community have taught them incorrectly? That's a problem. We have to teach people that they have learned bad habits from us in the past and that we need to change those bad habits to some good habits to support what we call data centric development. In support of strategy, this is the way we should do it, not the way we are doing it. In support of strategy, the organization shares specific shared data, not data-based, data-based goals and objectives. Those are the information, the data requirements that are needed to support that strategy. These data strategic goals allow us to understand organizational IT projects from a organization-wide perspective, an enterprise wide perspective. So the order changes from strategy IT projects, data and information to strategy data and information and IT projects, even though it looks like a simple change here on PowerPoint, it is not a simple change. It is a very difficult change for organizations to understand this. It is moving people's cheese. It is messing with their rice bowls. I don't care how you want to describe it. There are some problems in this area in order to do it. You need to take a good dose of culture medicine to take a look at how this works. I'll share a couple of examples as we get to the end of this particular webinar on this. But let's just look what happens if we do this correctly. And by the way, there's a book out there. That, as Shannon mentioned, data strategy. The data assets are developed from an organization-wide perspective, not on what can software package X provide us, but on what organizational data needs and complement organizational process flows. I can't tell you how much money we have saved organizations by coming into them and untangling some spaghetti flows that they've had around that are really problematic. And finally, with this type of an architecture from a data-centric development perspective, you have the opportunity to maximize data and information reuse. If we don't maximize reuse, it doesn't get reused. And people do it over and over again. One of the other fun tricks that I've done for organizations over the years is to go in and find out and show them how much money they can save by not buying their own data back from other types of people. Yes. Organizations go out and buy a copy of their own data back from other people. Sometimes there's a good reason for that, but generally it's an unknown unwanted consequence around all of that. So let's look and see how data governance in context works with data strategy. In most organizations, again, the data strategy is what the data assets do to support the organizational strategy. And data governance is about seeing how well that data strategy is working because after all, what could be more, excuse me, what could be, what could be more important than whether the data is supportive of the organizational strategy. I've mentioned that twice. Now we need to show it on the diagram here again. From our perspective, an organizational strategy is what should we do with data to help the organization achieve its objectives. From a data governance perspective, then there's going to be an interface to IT and whatever projects are going on over there, as well as loops into operations with a couple of feedback loops here. This is not a chart that I would show most individuals on this. It's just too complicated, but that's the sort of the context that we're doing. Here's the chart I would show people. Data strategy, again, is what the data assets do to support strategy and to be truly effective, they must be expressed in business goals because if we don't express the data strategy in business goals, the data governance perspective is much more difficult in terms of figuring out what their trying to do. Similarly, the conversations in data governance must be metadata grounded. If they are not grounded in metadata, too easily a disconnect occurs within the organization. So again, our data strategy expresses what data governance should be guiding towards in helping data to achieve specific tangible business goals, and that those are expressed both in terms of progress reports and results in terms of metadata. Let's extend this just a step further now and pull some data stewards out of it as well. So again, we've got the business goals and the metadata in there. The data stewards also need that same metadata and that's why the language of data governance has to be metadata because otherwise, again, the risk of misuse or misunderstanding is absolutely crazy. The focus of the stewards are how are we doing from an implementation perspective around data governance initiatives. And let me take it to another component here as well. We've got a lot of things that we're learning about data governance, but in particular, there's sort of a a I don't want to say passive versus aggressive data governance. It doesn't really work that way, but there are less active and more active aspects of it. And we're going to spend a few minutes on the slide talking about how that works. First of all, as you start in with data governance, you've got some idea that there's some feedback that you need to get to learn more about what's going on in the data world, whatever your data leadership components are, whether you are the data leadership in and of itself, many one person shops out there that are doing it. By the way, let me stop right here and give you a little bit of advice around this as well. If you have the opportunity to be given a tenth of 10 people's time or a single person, take the single person because the synergy you get from not having to switch tasks when you're dividing it up into tenths of an hour time is crazy. So I will always take a dedicated individual over somebody who's supposed to do this off the side of their desk. Now data leadership starts on a governance side and governance says, hey, you know, we're going to make some policies and that's going to make some changes in data over time. And usually management's response to that is, so what? And the easy answer for many industries is to say, and we will be compliant. And in which case, the executive says, great. And the next question is, when? Now, the way I'm showing the upper half of this diagram is kind of like being in a boat at the bottom of Niagara Falls and talking about having a discussion on how to clean up the quality of the water. Yes, if we implement new policies in the organization, data will improve over time. But of course, you can see I have a snail there and it will take time for that water to make its way downstream, mix with the other pieces of water and somehow produce some good results over this. It's generally not acceptable. And the smaller the organization is, the less acceptable it is to be focused so heavily in this mode on this. So another aspect of data governance is looking at specific opportunities to be proactive in data improvement. And the way I like to think of it is that your data leadership and data governance teams are going to be sort of like a fire department. They do not sit around at the firehouse and twiddle their thumbs while they're waiting for fires in between the process of fighting fires. They are out doing all of the policy work that is done in the top half of this diagram here. And there is occasionally a fire. And then they go out on the bottom half of this diagram with the data improvement track that I have where you can see I put some wheels on the snail and hopefully the data improves as a result of focus in these areas. Both of these activities will give you more feedback and that will help you to fine tune your efforts even further around all of this, eventually adding other components. We've already talked about stewards, data community participants, the different general generators and creators. And where we've had a trouble as an industry is that if you look at the box that I've labeled data, things happen. You'll see the top half of it again as it continues. Data improves over time. The bottom half of it, data improves as a result of intensive focus. We're going to go in and fix something. And you'll notice that I put the approximately equal to sign there because as an industry we need to do better between understanding that data things happen and then organizational things happen. And the more obvious, the more hit them in the face examples that we can have where data things happen resulting in good things that happen to the organization, the better off the easier it will be to grow good programs that help the organization to more correctly utilize the data that they need to have in here. So again, the idea here is that data improves over time gradually towards the top and more rapidly at the bottom. And depending on where you are in your organizational journey, you may need to do more or less proactive work around the process. Hopefully you get everything cleaned up and it's all set and ready to go. You do not need to do quite as much proactive stuff. My guess is most of you aren't facing that situation. So again, data is not a project. It is a durable asset with a useful life of more than one year very rarely does data stop and start. Therefore, looking at reasonable project deliverables in 90 day or two weeks sprint increments is unreasonable. Data evolution is measured in years. That means your organization is going to have to make a commitment to it. But that's what a data program that lasts as long as an HR program is talking about. Only then will you have the ability to evolve the data sets that you have. One of the other fun things about living as long as I have and I'm only 61 so hopefully I'll go a little further on this is that the I go back and visit some of these organizations 10 20 30 years after I first visited them and see friends and colleagues that are there and they're managing the same data. The data sets that they manage are significantly more stable over time than are the processing and the approaches to technology that they use in order to deliver this. This data program governance should be creating ready made data architectural components that are a prerequisite to some of the very good techniques that we use to develop software because if you're in the middle of an agile sprint and all of a sudden you discover that the data requirements are not correctly understood. You have no choice but to pull the rip cord and stop the bus at the next stop. And don't worry there's plenty of other stuff to go ahead and make a focus your software efforts on. But if you do it on this one without those data requirements down it is going to be a wasted effort. And the reason for that is that IT and agile production in general are really excellent and better than excellent. They are the best way we have found to deliver higher quality software faster. There is not any question about it. However, the differences between the way one builds IT systems and system components versus the way one evolves data over time are vastly different. Data evolution must be separated from made external to and proceed system development activities and believe it or not of your governance activities you're going to spend a lot of time telling people why data management and IT must be separated and sequenced appropriately. And the reason for that is because they don't match up. So this more targeted approach to data governance then allows organizations to really start to focus and really get the idea that data governance is central to data management. Now we're going to talk about some ingredients that we can start to add to this overall process and see what types of results that we can get. The first one is a little bit I call the data governance sandwich. Now it turns out that data literacy is of uneven quality across our organizations. Again I say that hopefully the literacy book will be out later this summer. You guys can see a little bit more of the facts and figures that are there. Also of course our data supply is of uneven quality because we haven't had data governance programs in there keeping our data nice and shiny and sparkly and clean. Finally our use of standard data within the organization is also of uneven measures. Each of these don't make for a very appealing sandwich in this case. But if I through the magic of PowerPoint start to move some of the rough edges off and learn how to use these components better I may see how the three of these things can be combined to make a well engineered machine. Now this concept that I'm representing with this very simple diagram here has to occur millions of times a day sometimes hours. I had one customer that had a query that ran 10 billion times a day shaving just a fraction of a percent of the time of that query that ran billions of times a day off of that query added up to an awful lot of information and value savings very, very quickly. These concepts that we're talking about here from a governance perspective cannot happen without engineering and architecture understanding in the group. And it is critical to get some of these to the folks that you're working with not just on your teams but also out there in the business because the business does appreciate you need to do X and Y before you can get to Z. Now I came up with a story as I was vacationing in India on the tea farm that I'm showing you here. And they had a sign at the cash register that said quality engineering and architecture work products do not happen accidentally. And I thought to myself, wow, I've traveled halfway around the world to get some really wonderful deming wisdom at a tea farm when I'm on vacation in India. Again, you know, I like to insert words. I'm going to go change it just a touch and say it's quality data engineering and architecture work products cannot happen accidentally. In fact, they are what we call foundational to our discipline. So this is my house in Montpelier, Virginia a couple of years ago. I'm what's called a horse husband for those of you that don't know me. That means I'm married to a wonderful woman who has a t-shirt that says I love my husband and in small letters almost as much as my horse. So as we're going to the process of building the barn, we had to stop at this particular point and document something. What we were documenting, of course, was the Hanover, Virginia inspection foundation inspection. The bank gave us only enough money to build the foundation and then they made us stop so that we could have a foundation inspection. The foundation inspection is an objective test that a qualified engineer comes on to the property and does some things like kicking and checking and poking and prodding to see if the foundation is of sufficient use to create a barn on top of. Now, why does the bank do this? And why is it a good idea instantiated in building codes? The answer is because if I build a very poor quality barn on a good foundation, it's okay, but if I build a good quality barn on a poor foundation, those of you who know me know that I will spend money on vacations back to that bills for the horses before I'll spend money paying off the mortgage. So in the bank's mind, that is just a good decision on how to loan money out. However, there is no IET equivalent. It is critical for you in data governance to start putting the brakes on some of these poorly foundational projects that do not allow the organization to succeed. Let's talk a little bit about frameworks in here. There are a series of ideas guiding analyses around they will help you organize the project data make decisions, assess progress, for example, you might say don't put up the walls until the foundation inspection is passed. Well, you know why now at that point and then put the roof on as soon as possible so you can get in there and get out of the weather and make it all dependent on continued funding. I mentioned already the DIMBOC framework here that you can take a look at here is one from a Gwen Thomas group the data governance Institute. A lot of people have looked at that. The idea of looking at these frameworks is not that one is right for you, but you use one as a starting place to see what you think you will need in the organization. Our good friend Bob Siner has a wonderful piece that he uses here. IBM's data council had one here. This is another one from IBM. Here's one from SAAS Institute baseline consulting. Again, I'm not going to walk through these diagrams. You do that and take a look and see what works for you what doesn't work for you because the idea of coming up with this framework is critical to helping people understand what's going on. Now I like to actually explain this to people a little bit more simply as a starting point although the frameworks are useful and maybe many people see in those frameworks something that reflects their organization and they are very successful at using them. I like to of course always include IT as part of our component in here because IT is foundational to what we do in the data governance community. Now like most consultants this is a four quadrant chart and on the left side of this quadrant the domain expertise is less and on the right hand side of this quadrant the domain expertise is greater. Similarly on the left hand side of the chart the roles are more formally defined and the roles are less formally defined on the right hand side of the chart. That's the upper half of the chart on the sorry the left hand side of the chart and the right hand side of the chart. On the upper half of the chart you're going to find individuals who encounter governed data less directly and on the bottom half of the chart you're going to find people who encounter governed data more directly. Again, continuance in both cases. Finally, the time dedicated by individuals to data governance activities is more dedicated in the bottom half and less dedicated in the top half. So let's see what the four quadrants are. Obviously, we start off with leadership, add data stewards or data trustees, participants, subject matter experts, people who you need to make sure they are part of this group and everybody else. Most organizations will draw a line around the left-hand side of this and say our data governance group is comprised of leadership and data stewards and work with it from there. Seems like a good place to start. Seems easy to explain to people and we can also talk a little bit about roles in this as well. Leadership's role is to make sure that the program is funded at the appropriate levels and that management understands what those levels of funding will provide in terms of value to the organization. They are always listening to everybody for data and feedback. They make decisions that are then implemented by the stewards. So the leadership gets to say off with their heads and the stewards go chop them off. Don't worry, Dave, stewardship is not about killing people. Sorry. Stewards will require action and sometimes the action comes from the subject matter experts and hopefully it filters on out through the rest of the community where they get changes, et cetera, et cetera. Again, feedback will come into these areas. Ideas for improvement will come in and hopefully some guidance going back to the organization. As I showed you with the other diagram, I would not show this entire diagram to everybody. I would show this version of the diagram because it's a simpler version of the diagram. It's most importantly, it tells you whether you're in here or out. How do I tell whether I'm a data steward or not? I'll have you pass the data stewards training, et cetera, et cetera. Let's take a look at the steward from that perspective as well. Again, wonderful book when you're ready for it. I certainly will recommend it by a colleague. This is David Plotkin's book here and he's got different data steward types, business stewards, technical stewards, project stewards. I love the book and the definitions are all correct. I think this is not a good starting place for most organizations. It is better to deal with this at a higher level of abstraction. I can't tell you how many times I've been on calls with people where they say, well, I'm an operational data steward and this means and this project data steward keeps trying to tell me how these things are. Again, it's just not helpful. We've got to grow into these roles. By the way, if you've got these types of data stewards, you're also going to need an auditor in the process and a manager of data stewards that are there. Again, I'm poking a little bit of fun from David Plotkin's book. It's a wonderful book. He has a wonderful instructor around these areas, but I wouldn't start here. This is just entirely too much detail. Again, our process is gradually had. So let's just take the concept of a steward. What is a steward? Well, Webster says one who actively directs. Okay. So a data steward then would be one who actively directs the use of organizational data assets in support of specific mission objectives. And in this process of getting started with data governance, it will be critical to understand we go through a one time startup process. Could be a little bit painful. But we're going to go through the whole process because most people look at this and they call these lines and arrows and where do I go and how does I get this to work? And where's the whole picture? I don't really get the whole thing out of this thing. And it can be confusing, even when you've got really good information, good stuff that you could take a look at. So let's put the rest of this diagram up here when I was having a little bit of fun with you guys trying to make sense of what's coming up and say, how does this actually work? Well, on the left hand side, the startup occurs once. And that's why I'm saying make it as simple as possible, gradually add your ingredients to it. You need to access your context, you need to divine a data governance roadmap and secure some sort of mandate. I've already expressed to you my preference to assign full time people to it instead of part time people. Because with that group, you need to have a plan, evaluate your results, and apply change management to that revised plan that goes on the right hand side of this occurs bunches of times. The left side only occurs once. Now again, in keeping with our theme simple, if you go through the DIMBOK, the DIMBOK has great advice. Look at goals and principles, primary deliverables, roles and responsibilities, scorecards that are there, checklists, all sorts of different things that are there. I just went through six slides. I'm going to show you all six of them. Don't start with that much detail. It will overwhelm you. The biggest business I have right at the moment is helping people get their data governance programs restarted because people have been working on them for a while. They've gotten off track. They have not learned how to evolve. The key is to understand that while evolution by itself is not goal oriented, our data governance efforts here and by repeatedly focusing on things is describing a learning process that the organization is going through and learning much about the specifics of what it needs to understand to become better at managing data with guidance, which it currently does not do well at this point in time. So again, left hand side repeats, none. Just does it once. Right hand side repeats, bunches and bunches of times. And hopefully you got that from those other pictures too, you're doing a data governance strategy loop, you're doing a data, excuse me, a warehouse data loop, etc, etc, etc. So we've talked a little bit about how to do it and kind of how not to do it, not sort of thing here. I'm going to put on in the background here. Actually, hang on, I did one thing funny. I forgot my Patty Smith going there we go. I'm back in here. Sorry guys, not that I don't like Patty Smith. So you can hear that's a karaoke version of the Hotel California. This is a bit out there on the internet. As the warm smell of bagels went up through the air, up ahead in the distance, I'm not singing right up ahead in the distance, I've lost shimmering light. My head grew heavy. My sight grew dim. I had to stop for the night as I stood in the doorway, I heard the meeting bell. And I was thinking to myself, this could be heaven. Oh wait, this can't be heaven, but this could be hell. These are the bad stories you hear about data governance. stakeholders sitting around and argue about things. And somebody finally says, are you guys talking about this and this? We fixed that problem last year. What's going on here is that they're not focusing enough on the metadata aspects of it. So much of the music down, you get the picture for that one as well. And the key is that you get by and if yes, this is important, but not actually committing to the business. And the way I do it with organizations is that it's actually a kind of a 12 step approach that organizations have to sort of step up and say, you know, we haven't been doing what we wanted to do in the past and we pledge to do better in the future. So there's a little process in the new book for looking at that buy in but not committing is absolutely a terrible process. Ready fire aim is the next worst process to look at. That's getting too over anxious to put it out there with 70,000 stewards. There are organizations who have done this. There are great presentations out there from the lessons that these organizations have learned and shared and quite frankly, very courageous for coming forward and saying that they're learning organizations and how to learn from that. So premature, if you will, is there not the good way to do it trying to cure all your data? Well, let's just think about this for a minute. What have you learned from this webinar? 80% of the data is rot. If you can figure out which is the good stuff, you only have to clean one fifth of the data instead of all of the data that's out there trying to solve world hunger or boil down. If a big one is no good and a little one's no good, then a medium size one must be correct. Well, in the data governance world, that's probably not correct. And more importantly, as we get closer and closer to a world that is governed not just by GDPR and CCPA, but other things, PIPA is another one, by the way, during the federal government area, these are not going to work. We're going to have to have some very, very good things. Keeping an eye on just meeting schedules, so much of data governance can be helped with automated workflows and practices and things, but so much early on is hard. And a lot of it requires shared understanding. That's got to be our goal. Because if you fail to implement some of these things, it's kind of like in the fall of this year, October of this year, the United States is supposed to switch over to something called real ID, means everybody's got to get a proper ID and show that they have had passage. This is, by the way, a response to 9-11, which occurred in 2001, and it is now 2020, it's going to be implemented now pushed off because of the coronavirus to the next year. Excuse me. So it took us 20 years to implement that particular response to this. Well, it's reasonable for us to think these things are going to take some time, but it's not fail. It's not appropriate for us to not implement something. You've got to have a, as we've mentioned before, change management initiative working in this area. If you don't understand what that is, you aren't qualified to make sure it's ready. Find somebody who is, there are lots of people around who do this for a living. Number eight, assuming that technology alone is the answer. Too many times I see organizations that come to me and tell me that their data governance is going to be managed by software package X. And I don't mean a data governance package. I just mean the software package. Again, technology alone is never the answer in data on these things. You've got to build a sustainable and ongoing process. I've mentioned already the process of becoming a learning organization. Very, very critical for this. And finally, the last worst past practice in data governance is ignoring shadow data systems. They also need to be governed. If they are not governed, it becomes horrendously problematic. Let's dive into a couple of quick examples in here. First one is from a wonderful organization that we worked with that just kept saying to us, getting data around here is like that Catherine Data Jones scene where she's having to go through all of those lasers. And yes, it is critical when the organization heard their needs and said, yeah, you're right, we're making it really difficult for you guys to do your jobs. And it's going to be an issue. So they use that language in that governance effort, and people were able to relate to it much more directly than if they had used some abstract concepts. Data governance is actually specific to your organization. So it's very critical that you be the ones and not somebody else guiding your organization into its effort. Quick story here, when I was young back in the old days, I was sent to Detroit. I was working for the US Federal Government Department of Defense and was told to go learn from Detroit's example, the private industry examples that we should adopt in the government. And one of the things I brought back with me was an advertisement and one of the papers that had an advertisement for 11 CIOs for Chrysler's Transmission Division. And there's a couple of things wrong with that. We won't go there. But good, good, interesting evolution. I ended up reading a book by David Halberstam and looking at some of the manufacturing techniques, though, and it was kind of interesting. In Detroit, for example, if they were adding things to the engine, well, the way it was considered successful was if it came along and you able to put the machine, the bolt, the part on it, let's say you're putting an oil pump onto an engine. But if you could put it on there without slowing down the assembly line, it was considered successful. And that led to different types of bolts, because they'd put on whatever bolt they felt they needed to. And they'd use different wrenches in the process. And that meant you need to, if you were going to repair these engines, have three different bolt inventories and three different wrench sets, not only at the manufacturing site, but also at the fixed up site as well. Toyota, on the other hand, added an extra step in that process. And after they built the first prototype, they came back and said, don't use one bolt for all three assemblies or all 30 assemblies or whatever the number is, but see how many you can standardize on. And by doing that, they were able to not go to one, one, one, but in this case two, from many to fewer operations in all of that. Again, those type of stories without the absolutes, we're not going to clean up all the data. We're not going to make everything standard, but we're going to look for places that we can benefit from standards being introduced to the organization, so that we will fix data earlier in the process. I mentioned before I was working for the Department of Defense, one of the things that was a lot of fun and great to work with, was the actual implementation of governance within the Army. Now, the first implementation of data governance within the Army, the folks that were alerted to this process, kind of pointed out an interesting piece. Governance and compliance is central to things that happen in the Army. Again, you can see this particular official diagram shows strategy, policy, architecture, investment, acquisitions, oversight, operations are all governed and compliance is mandatory around these areas. And when it was pointed out to them that data was not being governed, they went, oh, we've got to fix that right away. Now, that's a wonderful opportunity for somebody to take advantage of and say let's use the Army's really good discipline around governance and governance practices to improve the way the Army's governance of its data helps to improve its mission. Let me show you one specific example of that as well. This was one of those Lighthouse projects that I was describing earlier on. When the Army or anybody buys a tank, it turns out it comes out with about three million pieces of data. Now, that's wonderful, but if you don't know how many of those pieces of data control the obsolescence of that tank, you might end up with some challenges. And in this particular group that we ended up working for, we found five billion dollars of equipment that was out of sync in the inventory, which translated into large amounts of dollar savings from a data governance initiative. It was done in conjunction with a data quality initiative, but this organization did a great job of pulling together the business case and then implementing the actual savings that came out of it. Yes, even in the Defense Department, a billion dollars does get noticed. Another quick data governance story here. This is back during the last oopsie. We had a pandemic this time. The last one was a financial crisis and Barclay's bank was getting ready to buy Lehman Brothers data assets. Actually, assets and the data was part of the assets. They were agreed upon a long list of contracts. They were put on an Excel spreadsheet and 179 of them, Barclay's, declined to purchase from Lehman Brothers. They got to the agreement. They handed the first year associate the Excel spreadsheet that contained the 179 contracts that they didn't want to buy. The first year associate was fixing this up to present it as a work product the next day in front of the court. Work was done probably after 1130 because of the metadata that you could look on it. The sale closed on the September 22nd, but during that overnight error the 179 contracts that were marked as hidden in Excel became unhidden when the junior associate reformatted the document globally and then the sale closed, meaning that even though they had agreed not to buy those 179 contracts they ended up becoming part of the sale and had that go back to the judge and say uh judge I know this is what we said we were going to do this is what we accidentally did can we get out of this it was a mess most importantly Barclay's as an institution has incredibly good data governance around their spreadsheets because of precisely this error. One final quick one here getting close to the end on all this. There's a bank in Japan called Mizuho Securities our wonderful organization have done some very good work but they got caught early on within UPSI. The UPSI was that there was a trader at Mizuho Securities who wanted to sell one share of a file excuse me a company called jcom for 600,000 yen so it's about 3,000 pounds at the time. Unfortunately the trader hadn't had coffee that morning and sold instead of one share for 600,000 yen 600,000 shares for one yen oops can't do the math it's about three and a half uh 350 million dollars most importantly though the in-house system did not have limit checking there was no way of turning of saying wow we shouldn't be selling something for a yen it's too little money anyway in Japan a yen pretty small right so it makes sense the Tokyo Stock Exchange where the order went did not have limit checking as well and neither system had order cancellations that were critical about that. Again each of these things has been an example of sort of some storytelling processes around that and it is critical that you have these stories because people remember these kind of stories if you don't have the ability for all your teams to be telling these stories it will be very difficult for them to understand the confusing non-law physics obeying asset that you have your data so a couple quick takeaways and we'll get to the top of the hour for some questions and answers looking forward to this guys uh again the need for data governance is increasing not just because data is increasing but we have very little practice involvement the limitations the the places you come to education are the places you've come to today dataversity is the best source for that information on there it is a new discipline and it has to conform to constraints there is no one best way there is a best way for your organization and that can only be found by yourself data governance must be driven by a strategy that complements the organizational strategy without that strategic focus data governance efforts become drift they don't have focus they're not crisp and our three strategies keep it practically focused because you need to have very simple things and you need to practice it and get good at it it's like playing chopsticks on the piano you're going to drive somebody crazy but you're going to get really good at it implement it as a program and not as a project there should be no question of how many people are you going to have next year or what are you going to be able to do with less next year your data governance initiative has to be implemented as a program to be successful and finally gradually add your ingredients don't start with them all at once it's much confusing learn the value of these stories because if you learn the value of these stories you will be able to help your organization in a much more effective fashion than if you simply try to implement this by an e-dict type of thing last little bit on this but just to understand that data is currently understood by it in the business as this little sort of bat sign that hangs out somewhere in there yeah it's there we got to pay attention to it but i'm not really sure whose responsibility it is to work with it for a bit they get to know it's a little bit different data is actually a monstrous thing that we have not been addressing properly in the past and we need to put more effort into if we are going to do it however there's too much of it and part of governance is reducing significantly the amount of data that we are simply trying to manage but that's of course not the real state of data the real state of data is that it's everywhere it's continuing to engulf us and it is at this point you guys need to get involved and help your organizations on it i've included a couple of references on here as well as john ladley's book on this and it is now time for your questions and answers back over to you shannon peter thank you so much for another fantastic webinar and thanks to all of our attendees who are so engaged in everything we do have been really enjoying the chat going on there as well and just to answer the most commonly asked questions just a reminder i will send a follow-up email to all registrants by end of day thursday with links to the slides and links to recording and anything else requested throughout so diving in here peter you know it um and if you do have questions feel free to submit it like i said in the q&a section at the bottom right uh peter data doesn't degrade does it is is it not true that the value of data decreases over time to the point where it might not be uh that useful for analytics that might be discoverable by legal entities and is actually a liability so a fantastic question and thank you for for bringing it up it does require a little bit of clarification to this when i make a claim here that data is not repeatable not degrading and durable in nature this predates a little bit but also very much mirrors doug lani's approach in his book infonomics which i'd also recommend to all of you to to read in there when i say it doesn't degrade my email address is when i've had the early 19 actually the late 1970s so more than 35 years for an email address that does not degrade over time however a collection of email addresses if i have a hundred of them that are on a mailing list and i'm trying to use it for marketing purposes we do have a statistic there that says one quarter of those will be useless at the end of 90 days so that may be what the questioner is trying to sort of tease out there the actual email address if it stays good is going to be there and it doesn't degrade over time you can use my email address over and over and over again and the worst that may happen is i may block you or classify your email as junk mail to this but not certainly uh it does not degrade in that same sense so it is a unique characteristic of data and it is a very interesting one which means we have to do things a little bit differently than everybody else thank you for a great question and peter how can we change the bed habit taught to make the data part of the it strategy more over with the microservice cloud hype where domain driven design is sold and driven by it applications so these are really good approaches to many things and the question is where should they be utilized again if you're telling me that out of an it project i'm going to actually go back to the slide before this one which is showing you the wrong way to do it an it project mindset is looking at specific ways of acquiring technology and saying the price of flash drive memory will be down to this level tomorrow we can upgrade phones or change desktops make them go away from spinning discs and go to flash memory as we're adding things to this trying to do a cohesive data program out of that we've been trying for years and years to get it to do projects well that's the whole reason pmi has been as successful as it has because it is a wonderful way of adding project management discipline to organizational behavior however data cannot exist as a project so that is why this is wrong and what we're again trying to do here is to say that from a data perspective yes there's absolutely coordination with the it department but that your data assets are much more broadly implemented and much more stable as a organizational structure to build on than the it strategy is on that Shannon did I get to the answer to the question I'm not sure I hit it exactly there I believe so certainly can have the follower follow add additional questions if there are on that yeah so let me move on to the next question for now from a data architect perspective since data is everywhere when is it when is it crucial to define roles and responsibilities in the data governance journey I think that the key is up front and and again in vague terms so let me go back and pick on Dave Plotkin's really good book I hope you guys don't think I'm actually clicking on Dave Colacan he's a wonderful guy it's a great book on this but let's just say that you've decided that you're going to start up from the ground a data storage program and that you're going to implement all of these roles initially can you imagine the first meeting where people sit around and somebody says well I'm a business data steward and somebody else says well I'm an operational data steward what's the difference between the two of these and there are differences there are substantive differences that are well defined in this book but that's too too much detail up front again just right up front we're going to have four groups of people and I'll get you that slide right and those four groups of people are going to do these kinds of things and we'll do a little bit more and a little bit less and we'll get better at it you know the first song we're going to learn is not going to be the song that we're going to play in concerts but nevertheless a simple diagram like this ought to be very digestible early on so question when do you do this yes somebody needs to be leader by the way I tell groups like this that if you have questions as to who's leading in your organization take a piece of paper or make yourself a sign and put on it I am in charge of data governance for this entire organization somebody will come along and go hey who told you you could be in charge of the data governance for this organization in which case you can hand them the sign and say here's your sign if not me then whom is in fact in charge here so leadership is absolutely crucial to this make sure somebody's in charge put yourself in charge keep it there until somebody tells you you're not and it when they say you're not in charge say that's fine then who is so all you really need to do to start with is leaders and a couple stewards I got another question that's related Shannon I'll keep going on this for just a second but somebody had said okay look do I need to have stewards define for each of my subject areas to start with it'd be lovely if you did but if you don't can you get started with one subject area and practice in that subject area for a year before you go out and try to add other areas to it absolutely and I have seen organizations be successful across statistics two-thirds of all of the data management initiatives including all of the data governance initiatives that are out there are focused only on a department level are not focused two-thirds of them are not focused on the enterprise level out there so it's never too soon to create these roles but it's often early in the project too soon to introduce too much complexity into the project again hopefully you'll buy Dave's book though it's definitely a good one thank you Shannon well thanks to the questioner can you give an example of Goldilocks of a Goldilocks syndrome as one of the worst practices great question when organizations are called to make decisions about things that they don't know much about they tend to look at solution methods as opposed to actual solutions so for example if you have a disagreement oftentimes splitting the baby seems like the right answer of course you all know that splitting the baby is not a healthy thing for the baby and so consequently not the right answer under the circumstances Goldilocks syndrome is kind of like that I've got a little too much a little too little how does it all work out my favorite example of the Goldilocks approach is an undergraduate question that we used to ask on our exams and what we would say is what is the role of an analyst when they're faced with a situation where somebody comes in and says I want a really big warehouse the sales VP I want a really big warehouse over here so yes sales person is going to want to have all that stuff in the warehouse so the sales people can say it's in the warehouse it'll be in your hands and as soon as you give me the cash right if you ask the finance VP that same question how big a warehouse do we need in this organization finance VP is going to ask for a warehouse of infinitely small size because the ideal situation for a warehouse is not to have any inventory that's sitting around being carried and consequently will be done now I tell the story is an illustration of Goldilocks because when you ask students what is their role as an analyst in solving that equation they come in and I go well I'll do some things and I ask them how much do you actually know about warehousing and they kind of go well not much but you know maybe just a medium size one will be perfect I don't know again the proper answer for this is that the analyst's role is to make sure the problem is solved accurately and correctly by qualified individuals so if nobody in the room is talking about what warehousing expertise is they're not going to be able to solve the problem correctly and in this area it is the same kind of thing here we don't want to go in and look specifically around for problems that are sort of abstract in nature we want to find it as quickly as possible because data is so we say amorphous not really a great word in there but it doesn't obey the laws of physics and it's hard to see feel and touch and so that makes it a little bit challenging given all of those activities anyway I think I've exhausted that one Shannon did I get that one done as well hate to keep coming back to you that's okay no it's good it's great the questions keep coming in and they're not clarifying questions so this is good and moving on here the question so can you give an example of oh we already did that one it just came in again so maybe this is a clarifying I know right no I think it was just entered twice oh so regarding going actually going back to the first question about data degradation the question was more about patient data where can you get where you can get a lawsuit 20 years after a procedure so I don't know if you want to add any more to that it's it's worth asking and again thank you for bringing that back up because it is an important aspect yes data also is becoming a liability for organizations in there and so many organizations are caught between they're on the horns of the dilemma if you will you know in many studies we have proven that keeping all of your email is generally a good thing for employees because they can find stuff at it they remember having written it they're linked to the motor memory syndrome on that but on the other hand corporate document retention policies may call for them to remove all of their email that is older than three years old or something along those lines so yes there are absolutely some very big tensions in theirs I think Shannon actually does sponsor some webinars around that topic it's certainly not one that we cover here but it is absolutely a topic for data governance and it's worth pointing out that in a legal profession data governance is actually kind of easier because they understand these concepts a lot as well so make use of the organizational culture that does exist in these places so that you can try to leverage all of that what are your thoughts on managing unstructured data with all the data sprawl endpoint servers etc at cloud how important in classification in automating the process of data movement to lower tiered storage and ultimate disposition so I like to talk about and this is another area that governance should play a role in first of all let's just talk about data moving to the cloud in the first place I have three rules for data in the cloud that are different from data outside of the cloud data inside the cloud must be of higher quality than data outside the cloud think about it for just a minute would you want the opposite to be true of course you wouldn't but it's not going to accidentally be there as we saw from our demo code earlier so you have to make some steps to engineer that the data will be higher quality in the cloud than outside the cloud that's the first rule second rule is that data in the cloud must be by definition also more shareable than data that is outside the cloud and if you think about it that is an architectural concept that you can look at and evaluate and even measure and those measurements are critical to telling you what is the use of data throughout your organization's third aspect of data in the cloud is that it should be again higher quality and more shareable and the third one by a natural association is that your data in the cloud should be less in volume than your data outside the cloud if you're doing a rehosting of your existing data set the reason for that is quite obvious if four fifths of your data is rot you should screen that data out to filter it out keep it from going into the cloud keep from paying cloud vendors to manage the same piece of data over and over and over and over and over again in these areas so let's let's talk about data in the cloud as let's do it in a very careful engineered fashion because that will allow us to maximize our reuse and with high quality data in there so given all of that now let's talk about what you would put in the cloud if you are simply hoovering as we say up a bunch of old data that is unstructured data and you need to have it if you're working in the legal profession I've already mentioned them once electronic document discovery is a phenomenally interesting business just to start out from a data management perspective I have a friend in Colombia who makes her living just figuring out other people's data problems in the data discovery area because it's complicated enough there the unstructured data that comes with that is really a wrong term to use I hate to be pedantic about this but it is important that we actually talk about it for the fakes that it is unstructured data usually comes down to things whether or not rectangular or tabular in nature so if data were truly unstructured you could not structure it that is the definition of unstructured something that cannot be structured anything that can be made more structured starts out at least semi-structured let's just give an example of a document a document perhaps that follows an international standard published open like Microsoft Word Microsoft Word documents are inherently XML based and there is a structure that you can use to go parse that that is not an unstructured document it is a semi-structured document and if you want to add information from that document into a data collection it can be made more structured but it doesn't sound as good as when you're saying I'm taking unstructured data and turning it into structured data anytime somebody says that to me I hand them a glass of water and say please turn it into wine I would consider that to be a more useful process that said that is not saying poo poo on the unstructured data what we're saying is we have a lot of it but we have also lots of utilities that we can use to find out whether we already have a copy of that unstructured data in our cloud data set so let's apply those same three rules to data in the cloud should be of higher quality than data outside the cloud data in the cloud should be more shareable and data in the cloud should be in less volume by significant amounts and let's apply that to the same again it's called unstructured I like to call it non-tabular data and we can apply that now if you're not familiar with the concept of structure in a Word document it's partly because they put a feature there that can be used but isn't forcing people to use it so if you use the embedded headings the embedded footnotes the embedded pieces like that these are all structures that the HTML document understands and that you can read that HTML document and take that semi-structured data and make it a little bit less semi-structured a little bit more structured into that that's the approach we should take to our unstructured data volumes and volumes of things like outside of laboratory context volumes and volumes of semi-structured data that organizations decide that they want to keep is worth filtering and keeping track of it in some ways it's the same as your metadata repository there's an unending list of things that you could put in your metadata repository what things should you put in your repository and the answer is only things that add value same here if you're finding value in some types of semi-structured data then that is absolutely worthwhile to look at from a governance perspective I went on there for about six minutes I apologize for that but I know somebody's really hankering about the unstructured stuff and it's a good good conversation to have they are happy to carry it on as Shannon said in the community pages afterwards so Peter you know what is the best way to manage rot I jumped over the slide that says reuse recycle and reduce but that's really what it comes down to this is a people management problem when you look at the amount of data there when you do a data inventory which is inevitably one of the first things that people tell CDOs and data leaders that they should do as a data inventory they will find that they have lots of it so the question is what can we do to reduce that how can we find data that's out there well you're not going to find it by sitting around and waiting for it to come to you and announce itself you have to find it through active measures so the bottom half of that implementation chart that I showed around data governance take a group of some of your cycles that you have in data governance and do an exploration the average organization maintains customer data across 13 disparate collections of data if your organization simply measures the number of places that you have and it's less than 13 you're in the top half of the class and then the question becomes what could I do that would in fact add value to the process of coming down with it can I take data that already exists and learn how to use it again and again and again and again and again and again with that additional part are there other purposes look at the 3M manufacturing when I create other data other metadata for the result of a manufacturing a data factory process is there a way that we can reuse that remainder so that just as they find out that you know all products get 100% use of one form or another it's very likely that your data can be much higher in its utilization rate although don't try for 100% that's like a boil in the ocean solution again great question thank you for that there's a request for a workshop solely on rot so we can we can definitely look at that I probably could spend days on it yeah next time we can get together that'll be great for rot we are beginning to use a policy to execute data from queries but can we better address it via etl an etl initiative so that's a great question I can't answer that for you guys you know more about the expertise there the nice thing about etl though is that etl is hard coded metadata business rules transformations all sorts of things are encoded into that metadata and that metadata can be used to help everybody understand it also your etl process can be a pre-screening process around your data profiling initiatives if you're lucky enough to already have a tool in existence there as you run things through the etl you can also put another data feed off to the side and do some sampling and things and see if you're putting the same or similar things into whatever data collection you're etl-ing into in there most people very much under-utilize the amount of blood, sweat, and tears that have gone into creating etl we've got a group here in Richmond, Virginia that thinks that data engineering is solely about etl and they look at all kinds of high-speed transformations and things it's a really, really fun group to work with the world is broader than just that but it is an absolutely Ferrari type of a process to go work with it and really, really nice Peter if you are aware of the Forrester approach what do you think with starting DG strategy by doing data management capabilities well it's always good to start out with a sort of capabilities assessment of your organization imagine yourself and most of you know me a little bit at least imagine me trying to do a four-minute mile you know it's conceivable that somebody is chasing me I might be able to get there but probably not realistic and similarly for your organizations that assessment tells you what you're capable of doing right now and what you're capable of growing towards as you grow in and mature into these things so they can be very, very useful to help guide the organization as it's trying to figure out exactly where it goes and what should be that first piece of activities there again I mentioned it before but the lighthouse project concept is a very, very good way of thinking about what happens when we find that slide here where you're just trying to say look I've got many projects that could be done that need to be done but here's a good way where we can absolutely find win-win-win types of activities that occur yes something that helps organizational strategy there's a bunch of them what can we do and what also then if I did to support an organizational strategy would also be data that the business really recognizes is really seen as important and that process would help us burnish up some needed data skills in there yes that is the definition of the way this works and focusing in on those projects with the specific goal of not just producing work products but in fact producing work products and an organization that is lean and mean and able to attack this and that your data governance program will be the envy of the world out there as opposed to the laughing stop which again I'm doing very much extremes here of course doesn't work out that way great question thank you just having a lot of rot mean that you have redundant workflows in your organization should you go after rot by consolidating what your org is doing yes different topic for us but yes the idea of finding rot is really all about finding what our good friend Tom Redmond calls hidden data factories in your organization and those hidden data factories are costing you time and money and if their organizations people are understand with the aisle every time I get this data I have to change it from here to here or this this person never does this part right so I'll just correct them you know there's lots and lots of examples of all of this these are really critical to stamping out because that's not what you pay people to do let's take the very obvious example of this which is data science this was the silver bullet of the last decade where everybody said yeah just do data science and everything will be just fine well it turned out it didn't work out well and the reason for that is twofold actually it's threefold the first piece of it is that the organizational data scientists and let's just pretend that they all get paid $200,000 a year they don't but let's just pretend that they do or maybe $100,000 a year all right they spend 80% of their time munging the data well that's a governance issue that should not be allocated fixing the data should not be allocated unless to people who get big salaries on that this is why we have support staff and data management can provide support staff to a group of data scientists the ratio turns out to be one data person for every data 10 data scientists that are there and you can increase the productivity of that group so that they are no longer 80% unproductive all the time but they're only 60% unproductive all the time and by the way I've just doubled their productivity so that is a focus at which we need to look at another aspect of this is that people just don't understand and because their knowledge of these topics is limited they are unable to conceptualize that these things might be needed it was not obvious to the group at the hospital that when they checked everybody in and bypass the admission code not realizing that it defaulted to knee surgery that the hospital director next year would look around and say well I guess our hospital does more knee surgery than anybody else around here we should invest in more knee surgery type activities a good person making a good decision on bad data and of course that's why it didn't work out for them anyway thank you for allowing me to tell that story I like to tell it I love it I think we have time for just one or two more questions here Peter you know how strictly inventorying data differ from inventorying process with details about data utilize those process so the process data interaction is represented by a concept called the crud matrix again naming things is probably not our strengths in the data community crud stands for the way in which a process interacts with the data item and it either creates reads updates or deletes it with some people have added an archive function to what the other questioner said as well which is very reasonable having that information again with the hundreds and hundreds of organizations I've worked with over the years one in 10 attempts to do either of these types of activities well and that's unfortunate organizations are not big enough they're not mature enough they don't understand that this is necessary I'll show them something like the dimbok and they'll go that's great I've never heard of that before you know where did that come from where's has been all my life um just a very very very very challenging set of of organizational constraints on this and so to to not have that is is very much of a problem and the question becomes how do we make this real for people and that's where the storytelling aspect comes in as we try to come back time and time again and tell the same story over and over again so that eventually people are looking at you and going please don't tell me that story again you've told it to me too many times that's when you know you have succeeded and gotten the message across well Peter I love it that does bring us very close to the half hour here I just want to send answer again the additional questions just a reminder as I will send the follow-up email by end of day Thursday to everybody with links to the slides and links to the recording and here are the next fabulous topics coming up in July and August hope you can join us again always the second Tuesday of each month and Peter thank you again for this fantastic presentation and thanks to all of our attendees for being so engaged in everything we do and attending today's webinar I hope everyone has a great day and stay safe out there thanks all thank you Shannon thank you as always