 Hello and welcome my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity webinar data management versus data governance programs sponsored today by Irwin by Quest. It is the latest in the monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions you will be muted during the webinar. For questions we will be collecting them via the Q&A section or if you'd like to tweet we encourage you to share highlights or questions via Twitter using hashtag data ed. And if you'd like to chat with us or with each other we certainly encourage you to do so and to open and access the either the Q&A or the chat panels you may find those icons in the bottom of your screen. And just to note the Zoom chat defaults descended just the panelist but you may absolutely change that to chat with each other throughout. And to answer those commonly asked questions as always we will send a follow-up email to all registrants within two business days containing links to the slides and yes we are recording and will likewise send a link of the recording for this session as well as any additional information requested throughout the webinar. Now let me turn it over to Danny for a brief word from our sponsor Erwin by Quest Danny. Hello and welcome. Hey Shannon thanks so much. Happy to be here and happy to be sponsoring such a an excellent webinar with an incredible presenter. Always nice to work with the professor. So I've got a couple of slides to get through before we get to to the meat of the presentation. Just you know I'm sure a lot of this isn't news to you folks out there but you know data continues to be an opportunity but unfortunately also a struggle and there's lots of reasons behind that you know the types of data that we have and the types of technologies that are out there continue to grow and we need to bring all of those things on board in order to meet the needs and transform our business in the way that we want to. Folks are taking that data and they're spreading it out well beyond the firewall and running all sorts of hybrid cloud environments with data in different places where people need to get and mix and match this stuff. 75 percent of organizations according to Gartner are going to be you know deploying multiple data hubs for data sharing and governance by 2024 or you know gone are the days of the Uber warehouse that satisfied or tried to satisfy everything and of course you know there's always the risk of data especially around our sensitive data you know with it being you know keeping that secure private and well-governed being you know a top five data management challenge and I thought this was important as we're talking about data management and data governance because there are differences and there are definitely synergies and where they're overlap but I'll leave that to Peter you know my message you know metadata is your friend metadata is is going to help you solve a lot of the problems on both sides of the equation in fact for any data stakeholder out there because there's data management data governance data consumption why is metadata your friend well it helps you answer a lot of questions common questions that go across those different disciplines by providing you a real you know landscape of your data you know where else can you get one place to look at all the data that's available to you and and start to try to find it understand it and see if it's fit for use and if you can layer that with all of the good things that come from your data management teams as well as your data governance teams it really helps you accelerate your time to value in terms of understanding the data that you have is it fit and and you know really start start making good decisions around that data driven by that data and then finally of course you know in in our fast-paced world automation becomes more and more important and metadata is is a great source of or provides a great opportunity for automation because of its clear understanding of the environment and all of the detailed goodies that are in there that you can then start to harness and put into your art you know your automation you know programs in fact around automation you know just some things that come from from a good metadata driven or metadata active metadata management approach you know things like lineage understanding the pedigree of your data where did it come from what happened to it on the way is it the right data for the for the job that I have in front of it you know impact analysis allow you to you know plan things and make sure that you're not making mistakes because of your unaware of you know or have unintended consequences shall we say so reducing your deployed defects which saves time and money allows your folks to really be focused on driving value for your organization it's a great place to really start to classify your sensitive data once you've discovered it and and provide that visibility that's going to mitigate the risk associated with it and then you know really getting into the the the sort of weeds and the pipes if you will you know allowing you to automate the generation and orchestration of your data pipeline so that people can get faster time to value from that data and then of course overall for your entire organization a place for you to discover and help you navigate that landscape so that every stakeholder is as literate as possible and can be as efficient as possible using that data because we're still spending far too much time just trying to find out what data we have what it means and if it's appropriate for what I'm trying to do with it so really data intelligence it's all about these simple five steps harvesting what you have curating it so that people can understand it from a number of different perspectives you know putting all the rules and governance in place to make sure that people understand how it should and shouldn't be used and then activating that to provide efficiencies and insights across that you then socialize out to a number of different stakeholders in context so that they can understand it from their perspective and participate in the larger data community which is really really important we all want to get together and be social with data because we know that's going to make us better with data over time that is also the place where you can then start to connect and collaborate so data intelligence feeds all of these different groups and many more quite frankly but you know I wanted a nice you know a set of options on this slide so it is where you practice data governance it's where your folks that are designing data and and modeling data and creating new data assets go to find out if they're reinventing the wheel and if they are if they need to you know follow some standards to do that it's a place to bring visibility into all the efforts that are going on to make sure that your quality your data is at a high quality it's where the dev ops teams can go and understand what they need to do with this data as they're developing new applications and new ways to deliver that data similar with data ops especially around things like sensitive data so they know whether to you know encrypt it throughout the life cycle make sure that only the right people have access to it and that people are aware of the sensitivity of that so that they don't make mistakes like dropping it on fund drives or leaving on their their device that goes on to someone else in the organization sharing that out and even more importantly it's where you can connect to the larger organization through your enterprise architecture understand what initiatives are out there in terms of transformation you know and innovation people that are you know managing the portfolio for IT as well as the business collaborate together across the enterprise hook into things like service management and really inform the you know the the governance risk and compliance folks that are looking at more than data and looking at the organization as a whole so very very powerful capability all starts with metadata and if you're interested in looking at technologies that are going to help you do these things well just surprisingly enough Irwin by quest does offer a data intelligence suite which involves a data catalog which is where you capture that physical data landscape and all the details behind that provide insights like lineage and impact analysis and then combined with a business glossary manager in our literacy suite that allows you to build governance workflows and really provide that business and rules and policy context around that or any other type of asset that you may think would enrich people's understanding and ability to use the data that they have in front of you and then of course automation capabilities to take all that and start to help you move from legacy platforms to new platforms from on-prem to the cloud or to just basically manage your data warehouse your data lake your lake house whatever it is that you're building in a much more efficient fashion by creating automation generating code from metadata and using that to drive you know less latency and more availability in that aggregation platform so you know if you're interested in this please come visit us at quest.com we have a whole section on data empowerment which covers all of the solutions that we have from end to end that cover data protection data operations and data governance coming together really giving you that empowerment that you need to become a truly data-driven enterprise with that Shannon I am going to pass it back to you for the real presentation. That was a great presentation Danny as always and thank you so much and thanks to Irwin by Quest for sponsoring today's webinar and helping make these webinars happen and if you have questions for Danny he will be joining us in the Q&A portion of the webinar at the end of Peter's presentation now let me introduce to you our speaker for today Dr. Peter Akin Peter is an internationally recognized data management thought leader many of you already know him or have seen him at conferences worldwide he has more than 30 years of experience and has received many awards for his outstanding contributions to the profession he has written dozens of articles and 12 books the most recent is data literacy achieving higher productivity for citizens knowledge workers and organizations. Peter is experienced with more than 500 data management practices in 20 countries and consistently named as the top data management expert. Some of the most important and largest organizations in the world have sought out his expertise Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart. Now with that let me turn everything over to Peter to get his presentation started hello and welcome. Hi Shannon welcome and Danny pleasure to work with you as always and I'm sorry we didn't see you at the last event that we were at which was just last week out in San Diego but hopefully we'll see everybody out there very soon for the big enterprise data world conference we have coming up in March and if the all things line up right we might even be able to entice Danny to come and play some music with us as well. So this title here was Shannon is trying to get things that you all are interested in of course and so we started out with sort of data management versus data government programs and then I kind of went here well okay we could do this as a you know data management versus data governance here's a good master's level question comparing contrast right well that doesn't sound like much fun program data management versus governance okay now maybe we're getting somewhere so how about this data programs management versus governance what we're trying to do is achieve a common understanding because of course these two things have to work well in all instances together so not that we have lots of video here and just a quick note for the record this is my I guess eighth immersion that I'm doing here so I am at the headquarters of the department of housing and urban development in downtown DC who are just getting started on another leg of their data journey so it's going to be a pleasure working with them and we're going to talk about the same exact bits so first of all many people are wondering what is the data governance versus data management and we have to understand this in the context of something I call data debt as do a lot of other people it's not a specific term I'll go through and define what each of these are specifically on this and the real bottom line to this first section is that most people don't know or care about either of it and we do have to find a way of breaking beyond that there are some required success factors for both data management and data governance to work in place we'll talk about those from each of those perspectives but really the key is not so much how they interact together but how the two of these topics as well as other topics that we'll throw into here work within the context of the rest of the organization and it's going to be up to us as data professionals to message this correctly when we talk about these types of issues they have to understand it's not just blah blah blah you know the the teacher in Charlie Brown if you will on that but that there is some real things that we have to do that are critically important in order to get the messages through because if we can get these messages through then we're not going to be able to sustain what we need to have which is what all organizations will need to have a data program in the long term there's got to be also at least a start of a very singular focus and that is improving data's role in the achievement of organizational strategy so we'll look at that as well and then take that last chunk here to look at strategy which is to say how do data governance and data management work with respect to each other and that data's challenging characteristics are always going to play a role in this because data does not tend to follow the laws of physics and that makes it a little bit problematic we'll look at some constrained resources and strategy is a way of saying if i've only got x number of resources what do i do first and how can i make them in that the key for strategy also then is to understand that it's not about a document it's more about a process we'll get into that and then this Shannon said about 45 minutes from now we'll bring Danny back on and have some real good and take give and take on the Q&A section on this so let's dive in here and try to understand a little bit about this first of all i this is the picture i put up that had the most responses from anything on linkedin and over 6 000 people observed commented liked on this as well i i didn't put this up in any way to try and say how things are but that this is simply a diagram that i have seen in many organizations that i've worked with over the past 30 years and in every version of it they say we have data we have knowledge workers and those knowledge workers are trying to turn that data into information but the problem is that they are human beings which means we're talking about wet wear that's the stuff that's between our ears and that these knowledge workers are relying on informal communications not necessarily understood practices and they're often described as the weakest link in the chain of giving information value data management and data governance address this challenge data governance governs the activities of data management so you can tune out right now not worry about the rest of this but let's just dive in and see what's really going on here so first of all the accounting profession is over 8 000 years old these are cuneiform records from back in ancient Egypt and showing beer sales from one to the other and over 8 000 years the accounting profession has developed something that we all rely on now generally accepted accounting principles our own version of this is that we trace ourselves back to Augusta Ada King or known as the Countess of Lovelace Ada the Countess of Lovelace was the person who looked at this machine here which is a weaving loom you can see it's not really much of a machine that's a manual process but they're making a nice pattern on that loom by passing different pieces of thread through it of course if you want to get that to industrial scale you now start to make industrial sized weaving looms and Ada Augusta Ada King looked at this weaving loom that I'm showing you here and the thing that's on the right hand side of this diagram here with the holes in it is their equivalent of a punch card and she looked at that punch card and that weaving loom and said I believe that I will be able to make a machine that performs mathematics for me and this is her her diagram of a programming language for a computer that had not been made before so we really do oh an awful lot to Lady Lovelace in that context and she's just done a marvelous thing for us to start to understand but we are very much at the beginning of our profession whereas accounting perhaps is in a more mature cycle around all this and the problem is that that has led to confusion IT tends to think that data is a business problem and their attitude has been if they can connect to the server then my job is done because I don't care what goes through the pipes as long as the pipes are connected the business on the other hand thinks that IT is managing data after all what else would a chief information officer do and as a result data has fallen into this enormous gap between business and IT and this has led us to something that we need to repair working in conjunction with IT and the business and that is this idea of data debt it's the time and effort that are going to take to return your data to some sort of neutral part getting back to zero is what I call it it typically involves undoing existing stuff and requires new skills of your organization at zero you're essentially starting from scratch but that typically in a new program situation will require an annual proof of value again sometimes it's very obvious if I give somebody a hundred dollars and they give me a hundred and fifty back after the investment period it sort of builds confidence and I can do that well if you're starting out in this journey you get to do good at both almost all data challenges involve some form of interoperability and there's very little guidance around optimizing data management practice getting back to zero so we need to eliminate this data stat because it slows progress decreases quality and increases costs all the way around so when we talk about data management there've been confusion over the years as we have done this many people look at us as librarians and say you know just give us a pile of books and we'll catalog them all and everything will be fine that is of course a joke on the other hand the far side guy did one of these for us as well so you know if I just label everything it'll all be fine or worse still this was a Microsoft commercial from a couple years back I'm not sure what exactly Microsoft is trying to tell us at this point in time but it clearly isn't a flattering picture of the way you should think about data modeling activities in here so again the point is it's been misunderstood let's go to Wikipedia and see what Wikipedia actually says well okay here's some topics in data management and I'll show you where they came from a little bit it also has a note on here it says a bunch of disciplines related to managing data as valuable resources and this is clearly a very broad definition even our Dama one that we had on the website data management is the development and execution of architectures policies practices and procedures that properly manage the full data life cycle needs of an enterprise yep okay well not a great elevator pitch story here how about this one we published this a little about 15 years ago understanding the current and future data needs of an enterprise and making that data effective and efficient in supporting those activities well all of these are good definitions but the problem with most data professionals is that they arrive here through a different means and it's the story of the blind people in the elephant you've all heard this before one on the top thinks it's a fan because he's feeling the ears the one at the trunk says it's a snake somebody else says it's a tree because they're holding on to the leg somebody else says got a tail it's a rope and this one thinks it's a wall if you've only looked at data from that perspective that is what you think all of data is about so we used another definition for a long time here as well and that is the idea that data is excuse me data management is everything that happens between the source and the use of the data and while that's interesting it's not very helpful also it leaves out a really important component which is at most of the time we want to set the data up to be reused not just used and that is a different engineering problem so here's a more robust definition of data management we have some things on the sources side and there are a number of different activities that can occur generally falling into the munging category that's m-u-n-g-i-n-g you can look it up at Wikipedia it's the things that you need to do before you can actually go and exploit data you can see on the right hand side then there's data science data delivery storytelling all sorts of things around that and that gets us to the use but all in all we still need to have this formal reuse management if we don't then we will only end up pushing data in one direction and that won't help so our context here is really data preparation and data delivery given that sort of a context now what's happened interestingly is that the group that I'm the president of right now and I have to thank the literally thousands of volunteers that helped us to put this together to put out what we're talking about when we say data management so this has now become a de facto standard the US federal government is in fact using this precise definition which says that data management has 11 practice areas that we use in order to do this and once again people say oh that's terrific okay now I've got it now I just want to do one of these things well turns out that's actually not a very good plan either it's better to think in terms of these things as groups of three so you might have somebody that is looking at using data more strategically and when I say three you need to have those three-legged stools in place or most data activities are going to encompass or touch at least three of these areas so the first version of this particular organization's approach strategically involved one round of data governance one round of data warehouse and business intelligence one round of data quality management and they may alter after that and discover that they need to change something so here we have another phase of the same project but in this case we've moved from data quality over to metadata as Danny was saying it is your friend around this and now we have two x's of experience in governance and warehouse and one x in data management finally perhaps a third says that we need to incorporate reference and master data in these areas notice again we've practiced data governance three times warehousing three times but one x at metadata data quality and reference and master data management so this is what we mean by data management you can see that data governance is absolutely central to this the reason for that is because governance is a part of all of our organizations at the corporate level there's all sorts of things that we talk about specifically that talk about the relationship of the company to its shareholders typically that is a financial piece however even in our world today we are starting to see things such as maximizing shareholder value can no longer be the company's top purpose in order to do this so it's a quote from Jamie Diamond and we'll see how well they come up with this but it's actually enlightening to know that corporations feel there is a duty to society beyond just maximizing shareholder value around us so of course if we're doing corporate governance we need to talk about IT governance and IT governance is making sure that we have IT aligned with the business strategy if the business strategy is about mobile we need to have a mobile IT structure that supports it we need to provide measurable results we need to have some key questions identifying some things that are going to be really important as opposed to trying to do everything perfectly and what we can now say after doing IT governance for a number of years these are the five primary areas that you should be focusing on in here we don't have even that level of guidance with respect to data governance just yet I mentioned an elevator pitch just a few minutes ago and I want to make sure we hit that elevator pitch of course is that you see somebody getting on the elevator or you're on the elevator and your boss gets on and looks over and says oh Peter tell me what this data governance stuff is and you've got exactly the amount of time of the elevator moving to have a good answer on that particular piece and I want you to try to go in and look at these five excuse me seven definitions of data governance I'm not going to read them to you again you get all the slides on this you can go back and look at them but just try to imagine explaining them to somebody and it doesn't work so the best way to explain data governance to people is to say it's about managing data with guidance and if we are not managing data with guidance one might ask the question would you want your sole non-depletable non-degrading durable strategic asset managed without guidance probably not so people generally say okay that sounds like a good thing the higher I get up in the management food chain the more I change the definition just slightly to say it's not just about managing data with guidance but it's also managing data decisions with guidance why would that come into play well let's look at a real-life example that just happened last week there was a group of hackers that hacked a jewelry store in London they started they said you know give us bitcoin or we will start to release this information on the dark web and a couple weeks after they started releasing it on the dark web they came back and critiqued their own data governance practices and said gosh your royal highness we are so sorry Prince Mohammed bin Salim we are never ever going to make sure we've put in place good data governance to make sure that we will no longer release private data on the royal family as part of our ransomware to attack so yes it even comes up in criminal enterprises that you need to have data governance in place in order to do this this next slide is extraordinarily busy and I'm not going to walk through it now we do want to show you this is a a slide from my colleague Chris Bradley that talks about how to put all of this stuff on one particular sheet in the sense of what do we mean by data governance and stewardship so when we're defining data governance this stuff on the right hand side is typically the way we do it I again wouldn't use any of this for an elevator speech because the real challenge around it is that data is a confusing and complex and detailed topic and outsiders generally don't want to hear about it Danny and I can play music and people will listen to us Danny and I can talk about data and people might listen to us for the right conference but we'll get a lot more people for the music than we will for the data piece it's taught inconsistently and that's a real problem because of the result it is not well understood outside of our little group here so here's an example of why that's potentially a problem now I want you to imagine first of all it's wrong right this is wrong thank you Morgan Freeman I hope he doesn't ever get mad at me for doing that because he says it's so well I love using it yeah the reason it's wrong is because your knowledge workers in your organization have gone and learned about data all on their own they haven't had guidance around this and that has led to a whole series of problems in fact it led to a joke on Seth Meyers just the other night now if you can work Seth Meyers into a topic like that it's we're definitely way off the track here all right so let's talk about success factors from data governance perspective the difference between the two again governance is policy high level guidance setting directions and top down is required because most data challenges are very very problematic for example we might say that all information not marked public should be considered confidential it's an easy rule people like those sort of easy rules it may not work for your organization that's where the data governance professionals actually come into play and when we talk about data management it's the business function planning control and delivering the information it's intense it's detailed it's too complex for any one individual to understand so we need some architecture in addition to the engineering parts and delivering data to solve those particular business challenges because when we look at what's happening here poor data manifests itself as multi function challenges we need people to be able to go back and see through the business and it systems where the business challenges may manifest themselves and if we don't have a dedicated group that can look through all of these and say they are all caused by poor data I contend that all business problems have a data component to them we will never be able to master the amount of critical mass that we need so eliminating the data debt requires a team with specialized skills that are deployed to create repeatable processes and develop sustained organizational skill sets because organizations are perceived by outsiders as machine while you may have many systems that are there they're going to take inputs from citizens and others and outputs from citizens and others and essentially what that means is everything that comes in is going to be data and everything that comes out is also going to be data question is how do you determine what to manage formally if we put too much controls in it will be too expensive and slow if we do too little we're missing opportunities and interoperability is the primary determinant of value in order to do this so our data governance ideas cost millions each year in organizations in productivity redundant and siloed efforts poorly thought out hardware and software purchases delayed decision-making reactive instead of proactive initiatives and finally 20 to 40 percent of all it spending can be tied back to better data governance initiatives on this so next chart's going to spend a few minutes on here just to to get a sense for how governance works in organizations typically there will be some sort of somebody's put in charge as the data leader probably like Beetlejuice you said data too many times and somebody thought you were the data person and therefore you now are in charge of all the data for your organization congratulations and just a thing that Shannon and I noticed last week at the data governance conference the average size of non-large company data governance initiatives is one person exactly one person very few people had teams last week they're starting out doing their best and when you get started and they have a limited amount of feedback and things so we start something called data governance and people understand that that improves data over time no problem we'd like to do that but some people consider that to be very slow in other words if I'm at the bottom of Niagara Falls of data and I tell people that I have cleaned up the water problems I still have to wait for that water to wash its way through the system it will take time and that is not fast enough for some organizations so they also start up what they are called data improvement projects now sometimes this is under data governance sometimes it's under data management it doesn't matter as long as it gets done that's really the key although most organizations are now adopting a proactive role of data governance in here and so they can start to get better feedback and look at what's happening from their data stewards their data community participants that are there and everybody else who's playing in that area as well we're really good in general in our community about celebrating when data things happen yay I got it to a certain point I've moved to this I've achieved this scale I've eliminated quality problems in a major way that makes the data set fit for purpose at this point in time while those things are good we also need to get better at relating them to organizational things and I put the approximately equals sign between these two because we are still in the process of mapping this this is an area that most organizations still can benefit from improvement so while it's fun to say yes I celebrated something over here we also need to say and therefore this business outcome is now possible and I did something else over here and I did something else again lots of things that are going on here but if I can't translate that into something that somebody can see is a value we are not going to have a sustainable process and of course once you start to make these value propositions much more sustainable your organization will now be in a better position to say hey if you take you the chocolate that you're producing out of this particular exercise here and combine it with my peanut butter I can have something really good at this point in time most data governance organizations work as a part of data management and I liken it to a firehouse situation certainly we don't have fires all the time and consequently the firemen do other things when they are not fighting fires and saving lives things such as battery programs to replace the batteries in the smoke detectors fire education program looking letting kids know that LED light bulbs are much better much safer for your home than are the traditional incandescent light bulbs that are now accounting for almost 30 percent of domestic fires at this point that is a real interesting characteristic so we're going to be doing some fire fighting but we're also going to be doing some planning and fire prevention in the process in addition as your groups are getting started with the process there's a an old television series called MacGyver I'm just going to play a little clip from it harmless of course he is the jack of all trades in there and understands how to marshal the people that are involved in these exercises in order to build better in this case so when we look at the success criteria for data management and data governments we want to have one set of directions at a time very important to understand strategy from our perspective is going to be a sequential operation so never put out a strategy unless you label it version one because then they will expect version two in order to come up with it but don't tell them it's got to go up and down at the same time it's impossible don't require those of us that are non data people to learn too much we need to really spoon feed it to them and give them very simple examples so that they will be able to see the things that they want to see in it because they of course are the expert around that the organization itself gets to tell you when things are better a lot of organizations like me to come in and tell them how they're doing and it's like really you don't want me to do that you want your own organization to understand the improvements that you're making in your organizational data practices so that you're able now to go back out and do things faster better and cheaper organizationally for example we might have gotten very good at compliance by the way does anybody think that compliance burden is going to decrease in the future i didn't think so so yeah we might as well get good at doing that and let's make the entire organization measurably more data literate now the data literacy part is very important because all of your knowledge workers for the most part have not been educated to do things with the thing that they use as raw material the most the data so we need to make everybody more data literate we don't make them need to make them all data expert and moving ourselves from a refocusing of data efforts to support organizational strategy as opposed to optimizing it most organizations are doing all kinds of things with data and that's great but they aren't necessarily focused on organizational strategic efforts so we want to try to get these pieces in place make the move again pop back up here for a little bit and now let's talk about messaging because you've got to be able to have messaging at work in this context here it's no surprise to most of you i'm sure that most organizations have had some bad data decisions again the focus here is half of them have done this according to that latest survey that's there and this leads us to something that we call the bad data decisions or the illiteracy spiral and that is because business decision makers are not data literate and the technical decision makers aren't either consequently they make bad data decisions those bad data decisions lead to poor treatment of organizational data assets and poor quality data which leads again to poor organizational outcomes of course if you look at this sort of squinty you can see this is the same thing it says on the shampoo bottles lather rinse and repeat and what we want to do of course is break out of this particular cycle the most popular example that i have found and it just today found a 13th instance of it where organizations install a package called salesforce.com salesforce is quite good software it's a consumer relationship management customer relationship management software works in the cloud very very nice and they have a major challenge because i have found 13 organizations in the last 24 months alone who have all installed salesforce and then decided to clean the data now the reason that's a problem is because the customers who are using the output from salesforce.com whether it's your salesforce or your management can't tell the difference between salesforce.com not working well and salesforce.com working well but serving up bad data that's a fine point that most people are not able to get and it leads us to one of the primary takeaways from all of this everybody asked me why do i call my company anything awesome this is the little bits that i put together to just illustrate it see when you start out in this business you eventually learn something called garbage in garbage out if you don't get garbage in and garbage out there's a lot of other things that are going to become more problematic to you garbage data and the perfect model are still going to give you bad results so it doesn't matter if you've got a data leak or a master data management solution or any kind of technology piece in the center it's always going to be true that garbage data is going to lead you to some very significant problems and most organizations don't have of course just one of these but quite a number of them changing the quality of that data allows us to harmonize the data flows we oftentimes we're sending the same data around twice or three times and now we can start to evaluate the quality of the outputs because only now can we get good data after the results that come through again necessary but insufficient conditions in order to do this and just to finish the thing in today's environment the most astounding and really problematic area is is one around machine learning so we've been doing artificial intelligence for some decades and we understand pretty much what it is we've done a good job of creating what we call learning algorithms the algorithms can sit down by themselves and learn how to play chess or learn how to clean data and get lots and lots of good things around this but that what happened is and we wrote it in the the data literacy book which is just incredible we thought 2020 was going to be remembered as the year that AI ran out of data to train its training algorithms of course COVID got in the way with all that and it was but we still haven't fixed the problem where machine learning challenges are just simply stopped because they don't understand how to get the data that they need in order to train the algorithms there's several stories around about this I can put some references for you at some point in time on this but it's nevertheless a big big challenge in organizations and the primary problem with this is the lack of a programmatic approach some of you on the webinar here today I guarantee have achieved some level of status with PMI a project management institute we copped their their piece where where they called their body of knowledge the PIMBAK project management body of knowledge so we called ours the data management body of knowledge probably as DIMBAK doesn't sound quite as good as PIMBAK but the point is anybody who is a project manager is going to understand that projects have a start and a stop and data is a programmatic activity that we have to treat in a very different fashion your data program must last as least as long as your HR program from this point onwards if you think that data is going to be less in the future then you probably don't need this but I would really challenge you whether you think you're going to need data less and the reason for the confusion is because we've asked our chief information officers to focus on many many different things but we've also observed that we need to focus more on data in order to do this so we created the title chief just to make sure that everybody understood when the chief of police speaks that is what happens the chief financial officer again is the highest authority on financial matters in the organization the chief risk officer speaks to risk the chief medical officer speaks to all sorts of other types of issues within a health care organization and most importantly the chief financial officer doesn't balance the books the chief risk officer does not test the software of the chief medical officer typically does not perform surgery we're going to make progress in data management when we start to put somebody into the top data job I wasn't particularly a fan of chief data officer mainly because the first thing that happens when you add another chief to the already growing pile of chiefs by the way cdo there are more chief digital officers out there than there are chief data officers just to keep us in our own humble sense uh cdo also stands for chief dream officer I looked last time there were about five of them out there which was interesting but probably not going to get us to the levels that we're trying to get to here so you can call it what you want top data job taking the price data executive chief data officer we need somebody to be in charge and to liaison with the top it job and to dedicate what they do solely to data asset leveraging to be unconstrained by an it project mindset that can take an entire hour for us to go through as well but again project start and stop whereas data tends to be continuous and finally to get the real perspective on what's happening with data in the business we need to be able to report into the business area because if we can't report into the business area we're not going to be able to understand we're always going to be talking about going to learn business rules you know why we call them business rules because we start out in it and we go to the business if we're already in the business we don't learn business rules we learn how things are done and there's a big big difference in that so we're continuing to move our way through here again last part of this now is how the two of these work together and that's really critical because if you don't have that ability you have a very big challenge of internal incoherence much less external incoherence so the first question is all right what is strategy most people think of strategy in the business school sense which only started in the 1950s when Peter Drucker and others discovered the military term strategy and said boy we could write books on that and sell lots and lots of consulting services i'm making a little bit of fun i'm a consultant as well but nevertheless it is important component we don't want to think of strategy as a grand plan because then strategy becomes a thing instead the better definition of strategy is derived from its use in the military which is a pattern in a stream of decisions and that is much more of a process than it is a thing in order to do that i took my guidance on this one from one of our previous presidents general eisenhower who said in preparing for battle i have always found that plans are useless but the planning is indispensable now why would you need to plan for data well let's just take a look at some numbers for the entirety of 2020 zoom hosted 208 000 participants every single minute that is a immense engineering feat netflix streamed 400 000 hours of video every minute of the entirety of 2020 by the way it's down from a little 700 000 hours before that so competition is coming netflix watch out youtube users uploaded 500 hours of video every minute for the entirety of 2020 consumers spent a million dollars a minute online linked in users applied for 70 000 jobs every minute spotify's got almost all the songs added so they're having to find new songs that they can and they only added 28 songs every minute of the entirety of 2020 and amazon shipped almost 7 000 packages users spent almost four thousand dollars using mobile apps by the way these numbers are from domo.com they've done these numbers every year for the past couple of years the new one should be out just in a couple of weeks and we'll get to look at data never sleeps number nine and see how these numbers change during the 2021 calendar year but what it does get us to is a quote that our good friend michelin casey put forward it's the idea that there will never be any less data than right now and if i make a mark there and wait a minute i can say the same thing and it will still be true data is going to continue to expand because there is a demand for producing data in this context we call it the surveillance capitalism industry again chukshana zuboff has written a wonderful book in that area and we used it as the focus of our data literacy efforts around that as well the supply and demand for this even in spite of the growth we've had in data science data is outstripping our demand to our ability to generally 30 percent moral increase in the workforce is six percent and i will also consider the process that we are likely not training them the way we'd like them to be trained as well let's keep moving on this again when we talk about data as our sole non-depletable non-degrading durable strategic asset it really wins when we compare them against other strategic level assets but the problem is most people think of it as data as the new oil if you google that phrase you'll find five million hits plus on google in the way out there i don't like that term because it only implies that you're going to use the data not reuse the data so i when i see people say that i do stop them when i said let's let's think about it slightly different let's just talk about data as the new soil there's two important definitions around that one you don't just fling seeds anywhere and hope that good things happen and two you don't plant things on monday and expect to eat them on friday it takes time and preparation in order to do it but we also need to sell the sizzle there's got to be a regular demonstration of value in order to do this as such data deserves its own strategy it deserves attention on par with similar assets and it requires professional administration to make up for that bad data here's an example of data debt in 2020 forbs ran an article that valued american airlines at the market value of six billion dollars and united airlines at a nine billion dollar variation but the same analysis valued the data in the a advantage program between 20 and 30 billion dollars and united's was at 20 billion dollars this is a complete disconnect here and if those airlines could double that their market value or triple in the case of american airlines just on the basis of managing their data better you better believe that they are paying a lot more attention to this right at the moment because the main challenge around data is one of separating wheat from the chaff now we talk about this and people start to say well is well organized data worth more and we can go back before the information age to ask that question if we look at it before the information age occurred we had page numbers indices diagrams all sorts of things by the way from a wonderful book here by abby covert a ten dollar kindle download and that puts links to some of abby's other work up there she's doing a great job of articulating this if I take a spine off the back of abby's book and disperse them without page numbers the stuff disappears very quickly so yes better organized data does increase in value and it's a good argument for both data management governments but even more so to the point that 80 percent of organizational data is rot that doesn't mean it's rotten it means it's redundant obsolete or trivial and the question is where do you go eliminate this stuff so let's see how the two of these work together remember we'd start off with our organizational strategy whatever it happens to be and our data strategy can only be there to support the organizational strategy it has no other purpose how are we going to use data better to achieve the organizational objectives when we then use that as input to our data governance efforts what data assets should be doing to better support strategy and we come back on that and say how well is all that working again in peter's world data governance also has been put into the it projects that are going on so how is the data delivered from it and then eventually we get to organizational operations getting to that secondary effect once again in order to do that we're going to put a feedback loop in there and of course recognize the data is not the only thing that organizational strategy is based on but it is becoming increasingly more important as we do this I wouldn't show this diagram outside of the data community here so let's simplify it for just a little bit mad in how does data management go well this is the agreed upon improved support for the strategy that's what we should be doing with this and a couple of other points on this our data governance goals must be expressed in terms of business goals cleaning the data is a wonderful goal but selling more stuff because we have clean data is really what we want to talk about and as Danny said before metadata is your friend the language of data governance is metadata absolutely and until we take these things and put them into data management objectives we have a big challenge with how to get all of this stuff to work of course we need that feedback loop that goes back into the data governance remember they're steering up top and trying to get things to work as we go through this now from a strategy perspective the one I like to adopt is something called the weak link in the chain some of you may remember a book called the goal it's a wonderful book about Alex logo and all sorts of things that happen there but what it really is talking about is that in any system there are going to be a small number of constraints that are going to constrain our performance and the theory of constraints says find one fix it and move on to the next one it's very simple it's easy to explain and it works our theory of constraints process looks like this we identify the constraints that are there whatever they happen to be in terms of your data we exploit that constraint which means trying to make quick fixes operationally without having to major change anybody's rice bowls or anything along that if not we need to subordinate other problems to that problem and elevate the constraint to where it can go be alleviated I know that's using the two words in different format but it actually works in that context and if that doesn't work go back and repeat it until we fixed it so don't just say oh we ran out of money or whatever other piece it is that we need to do because of course our process of doing this cyclically means that over time we can increase the capacity and improve our performance in data management and change our focus from reactive to proactive so a data strategy is going to give you a singular focus on improving data's role in achieving strategy it's the highest level of guidance that's available focuses on data activities excuse me focus is data activities on business goal achievement that's really the key there and it provides guidance what face with a stream of uncertainties or decisions data strategy most usually articulates how data can be best used to support the organizational strategy and usually involves a balance of remediative as well as proactive measures in order to do that when we look at getting started with this process whether it's data management or data governance it follows a fairly similar type of pattern in order to do this we start out on the left hand side here and we say we're going to assess the content we're going to define a roadmap we're going to secure executive mandate we're going to sign the first round of data stewards only put so much effort into that because what you really want to do is get better at executing the plan on the right hand side and if we start to cycle through those it will be a whole lot more useful to the organization much faster what I see is organizations getting stuck on the left hand side of the diagram that only occurs once as opposed to practicing and trying to get better with respect to the overall so we're coming up on the top at this point here I'm just going to do a couple of quick takeaways and then invite Danny to come back and talk with us about your questions which are always the most fun part of what we do on these programs here so we've talked about understanding first of all what is data debt and the data management and data governance are both techniques that we have created in order to resolve the data debt data management is about doing things that data governance wants to govern but most people don't care so don't try and talk to them about it instead talk to them about problems that they have and how you can help solve those problems that the success factors for governance and management are necessary but insufficient conditions and there we also have to have other pieces that go into it which is understanding the existing environment and understanding that the functions that people are talking to in this case and that these two practices have to work really well with the rest of the organization that there's a big important messaging component on this but we understand that this is all of critical importance in order to get this and that your program is going to be a data program in order to do this excuse me and finally that you need to apply these things strategically in order to come up with a very good use because you have only limited number of resources again nobody's working with multi-billion dollar budgets in this space although I know several companies that might benefit from that so let's do a couple quick takeaways and then we'll get to the top of the hour again this discipline has not had 8 000 years to formalize its practices so it's going to take us a little while longer and it's okay to say we can work with some existing imprecision and uncertainty we have not developed gap although there are lots of people trying to work on that what we really need to do as a community is pull them all together so that everybody's working and rowing in the same direction but your data does require professional administration to make up for its past neglect if you want to hire a guru to do this that works out fine but most of this stuff isn't rocket science and you can definitely make use of resources such as dataversity and their associated programs in order to do this your existing knowledge workers don't have a clue how to do this because like the guy that was throwing the pink balls at the keyboard they found out how to do it and make it work it's not the best way to play the piano it's an entertaining way and while Easton makes money doing it but you don't want your knowledge workers making up their own data management practices there are good well proven best practices that we could start to use now and it's likely that your organization will require a new business focused data program not a data project the one thing that i hear over and over again is when are you going to be done with data governance well you're going to be done with data governance when your organization no longer needs an hr program it's exactly the same point in time data government data management are major data program components and in concert they must focus on improving your data but they must also focus on improving the way your people use your data because only with better data and better knowledge of people's skills how to do this can we show people how to use data to better support the organizational strategy and this can only be accomplished using an incremental iterative approach and applying these formal transactions along the way so we get off the ground we do it once and we come back for another cycle and do it again because that's the way we need to work on data around this we've got some upcoming programs here but now it's my turn to turn the table back over to uh Shannon here and invite dany to come back in and join us for a little q&a again looks on sale but that's just for the publisher i love it well peter thank you so much for this great presentation and if you have questions for peter or for dany feel free to submit them in the q&a portion of your screen and just answer the most commonly asked questions just reminder we will be sending out a follow-up email by end of day Thursday with links to the slides and links to the recording of this presentation so um i just can't risk just making shannon laugh occasions it's true and we've got a hashtag data quality matters here i love that so i think i'm gonna start tweeting that out um people have a tattooed on there right yes and there's also another comment here it's not really a question you know i've always considered data governance the quote-unquote umbrella under which all other aspects of data follow data quality data management vg etc anything else you want to add to that comment dany did you ever have any conversations with larry english when he was alive you're talking to a guy with a memory that that has a very limited capacity i do but the favorite argument the larry used to get into he would say data management is a sub-discipline of data quality and we would not notice the other way around larry data management is the over-discipline of data quality is the sub-discipline of it he could go for hours back and forth arguing that particular question back and forth um it's certainly you know they are related and the key to this is that you have a very complex process and dany you've seen i mean you've probably seen more data models and i you know i have uh as far as that goes uh you know it's a very complex piece and we've got to have both soft and hard skills in order to keep it working absolutely especially when you get to the nuance of it all right there's nuances are critical right yeah that's that's where the that's where the art comes in you know there's there's definitely a defined science but you need to layer that art on top to really truly get it to where you want it to be absolutely again our accounting cousins have some very very good practices that they've established but imagine this i was having a conversation with a gentleman at china's conference the other day and he works for one of the state departments of transportation and he said you know where can i go to get the guidance of course we said you know join the webinars and all this sort of thing but isn't it a shame that in our country all 50 departments of transportation are having to learn this on their own and i'm pretty sure the same thing's happening in canada right dany absolutely absolutely you don't have worldwide programs or global programs that say how to do these things we're all figuring it out as we go along i love it and lots of great questions or lots of great comments coming in um this person uh we i've always found the opposite to be the case of business is not understand that they own the data and that they are responsible for the data it only handles what is done with the data based on the requirements provided again by the business and data stores always seem to be on the business side only the business side and determine the valid value what the valid values are when you think about that then you want to take that one first well i don't want to throw dba's under the bus but they may have been working with dba's for too long because you know when you talk to a dba in my in my experience they own the data i keep telling them they don't own the data they own the database the database of the database technology that it runs on so uh i think that you know it's something i've seen over my 30 odd you know 35 year career now uh that that the business is actually understanding that and actually relishing that fact and and sort of now starting to to put their foot forward and i think you know we kind of scared them off with technical gobbledygook for a long long time but i think the realization is is becoming very clear and now it's you know depends on what's the legacy in your organization in terms of what's the effort to un un un bind that twine if it is the first book i wrote was supposed to be called untangling the legacy not the metaphor is perfect Danny many organizations because and i'm going to take the hit on this one we in the academic community have failed miserably on this we have taught people for 30 years that the only thing you need to know about data is how to build a new database literally that is it and so consequently people go out of here and when they see a business problem they go i was taught how to do that in school give me a new database creation tool and i'm going to now build a new database for you uh so it's no surprise it's almost by design that we've you know got ourselves into this situation but the the ownership question is an interesting one and Danny you gave a very very insightful piece right there because if you go back to it and say okay if you guys own the data show me what are the rules around this one piece whatever it is just pick something at random and it they may be able to go point to a program that process it and if they can find the code for that program they may be able to go through that code and understand it these are things that the business people know intuitively they know for sure that if you tell them you're going to reward your salespeople by the number of short orders that you produce every short order will contain exactly one shirt because that's the way they maximize their behavior business people understand these rules they work with them and the it people don't and haven't been it's just not been an interest to them although i will say there is one part of it that you can find as an example for this and that is that any large-size company is going to have a networking group and that networking group has somebody whose name is attached to making sure that people have access to the network so in other words they should list of people who are allowed to get on and people who are not allowed to get on or however you've managed that particular piece that is a data management function in and of itself and if you look at that in a 30 person it group there will be three people that will be concerned with that particular process which is an example to say look even within it you are doing some data management imagine now here's not just one thing that i want you to find out but here's an entire set of characteristics that we now want to feed into again we'll go back to the Salesforce.com example we want to put all this stuff in Salesforce where'd you go IT and IT will turn around and say that's your problem you guys here's a spreadsheet you show us where each piece goes and we'll put them wherever you want to it can't have this anymore it's just not work it's very conversational today you guys the data has to be owned and accountable to the business not IT and it's also upvotes on that added we could upvote that one we would right that'll come along yeah that'll come on when the simultaneous translation comes in Shannon that's why I think people are still looking for that one too let me throw in a question here can data cleansing and normalization or the use of standard non-clometrists a remedy the dirty data issue sometimes again Danny you can probably pull out an example or two from your experience here but just having the data clean okay so if i give you the number 42 right there's a number it's clean right anybody know what it means is it Jackie Robinson's jersey number is it the life universe and everything from Douglas Adams is it my age 18 years ago you know yeah it's clean but it's not enough so clean is a good start yeah and clean is an you know everybody has a different definition of clean but you know clean doesn't really think about the the concept of complete or or if it's in in its original state because it it it may be clean to the to the eye but if you don't know what happened to it on the way how has it been degraded is it so that's as much a part of equality is as to whether it fits a certain set of parameters in the box that you've defined to put it in right so you know and that's why i think you know things like lineage and and all of those things become you know more and more critical path and we're seeing it with our customers that it is the top you know one of the top things they're looking for and one of the hardest things they're trying to find because clean in a database is one thing but clean from a business perspective is a whole different set of questions and one may not be equal to the other indeed all right so um so many comments here i love it so how do you determine the value of data we get this question a lot peter so i love it that it's coming up again the real trick to that goes back to the one slide that i showed about interoperability we don't have all the resources even if you have one of every tool that danie's uh we're going to in order to do this and so consequently it's been very very difficult for people to put value on it and many times people simply say you can't put a value on it i disagree with that particular statement on this and the reason i'm pulling this particular slide up is you do have always a balance we could manage everything perfectly and nothing would ever get done or we could manage nothing in which case we'd probably be out of business so the answer is clearly in between those two extremes and what we're trying to do is figure out how valuable it is so if somebody says could you clean that data or could you migrate that data or could you merge that data again different pieces that go into different uh requests the question of value comes up when we say hey what will that get you in terms of the the answer so i'll relate a slight little data science story here to talk about it um a colleague was a new data scientist in an organization got to the point where something really good was happening and went running into the ceo and said hey i got a an 82 percent on this particular piece that i was trying to work out and the response was completely unexpected the the ceo's face turned red it turned around and said listen just so that you understand that this organization we never do anything less than a hundred percent now you get out of here and i don't want to see you again until you've come back right dan is laughing already because he's already figured out the punchline right just two different ships passing in the night neither one really understood what the other was talking about or requiring in order to do that but here's the the real kicker when that individual learned what the context of that 82 solution was they realized very quickly that they would have been very happy in the business if they had a 72 solution because that represented tens of millions to the bottom line in order to do this and having done that they got to the 72 solution two years ago so not understanding what was the actual requirement there but instead working on it as an optimization problem instead of a satisfaction type problem they had wasted two years of time to money in this particular organization then you want to tell a story around that value that's a good one you know i don't know that i'll tell a story but you know i do know that that we see this question all the time people are trying to to to you know achieve this and you know all i'm going to say is something that everybody on this call probably already knows is it's not easy because you know when you're trying to value something like data it's directly related to the impact of the insight that resulted from that data right and you know so and and that's a tough thing to measure and and so you know i think that the the first thing you should be be doing before you start to you know put a value on data is is first start figuring out you know what data is being used what data is you know because then there's going to be the cost side of it too right you know because everything every every good thing has has the the other side equated with it so it is kind of a bit of a holy grail i haven't seen a lot of people that are are you know they're doing it whether they're doing it right or whether the the fact that they did it is making any difference other than you know answering a number on a on a balance sheet somewhere i haven't seen a lot of organizations getting true value out of the exercise yet so you know it's it's but i would love to hear stories of people that are actually doing that because you know i think that there is a way it's just it's it's it's interesting it's an interesting problem and i'll add a touch of guidance in there danie is so correct when he says it's a tough problem because it really is if you think back though to total cost of ownership this was one of the things that we discovered in it that it wasn't just the cost of the box it was the cost of supporting the box and one of the more interesting aspects i'm a mac user so i'll have to tell this particular story came out of a very scientific and detailed study they did at nasa where they found out that the cost of supporting pcs went down as they added more macintoshes and they kept saying but we don't know how to run macintoshes how can it be going down and the answer was macintoshes at the time needed less service than did the windows machines so by adding more macintoshes to their environment they were enabled to they were enabling better support to be provided for the remainder of the group that was still on the it now we get to that with total cost of ownership so we say okay how much is it the cost of the box it's the cost of electricity it goes in the box we're probably not going to measure that the cost of a support person to do it and and maybe a cost of an upgrade when we you know drop something or whatever it is it needs to go into this even if you just find small amounts that you can add up repeatedly i took a query that was running at one of the customers that danie and i were back at one point in time and they had a you know very gnarly query it was really interesting to look at but they'd never understood the process of query optimization so we simplified the query to where it ran about a third faster okay that's nice and then you say how many times a day did that query run what was well over a billion times a billion thirty percent adds up to some very significant did we have the total cost of the ownership no we weren't even close but we were very clearly able to show that in this instance by understanding the query at a more fundamental level and re-optimizing it around that we were able to save thirty percent a billion times a day and it does start to add up uh from absolutely but now now you got to add to that the return on opportunity where else could i have put that money into that would have perhaps not been as interesting as doing a query optimization maybe firing the the hold-up guy that takes two days to get their passwords out would have been a little bit of a piece would you provide some examples of focusing data activities on business school achievement sure so when you look specifically at the dano do this when he does his modeling class right we're relating two things back and forth in a model what are those business things what are those business concepts that you're trying to do so when you talk about data governance activities if you express them in data terms you'll end up with the same situation that we ended up before of somebody trying to get from 72 percent to 82 percent not realizing that the problem is completely solved at 72 percent and and not having the the ability to get an insight into that particular area so when i say expressing goals and business activities there has to be something that you've taken the trouble to say when this data thing happens this business thing happens and that business thing either costs this money or doesn't cost this money or you know again faster better cheaper it all sums up to one of those particular pieces and it's done so many times we can now start to put a cost on that we can now start to say let's not talk about getting 100 customer data perfect let's talk about sales people being able to reach somebody by telephone nine times out of ten when they call them on the first call again just one example of saying the perfect telephone numbers may or may not be in your wheelhouse but certainly salesperson productivity would absolutely be there yeah and and you know relationships of of different aspects of of business data how many customers aren't screaming at your tech support people because they don't know that i already have that service that you're trying to sell me right and what is the the value of that you know to me it's quite high i'm i'm chuckling danny because i get first on the aws person for the university which is sort of a crazy thing in the first place but i get calls all the time peter don't you want to upgrade vcu's phone system well vcu has 35 000 employees and you better believe i am not in charge of providing things for them but they don't know this because i'm at the top of the list on the vcu employees that did come in there so i'm not sorting my last name probably never had a great idea so there's a couple comments here that i that um they're not really questions they're great comments that kind of summarize a lot of the comments in here you know i would say that by what the data is actually used for why is it captured in the first place and if you're not doing anything with the data or cannot do anything with it then why would you be capturing it in the first place we can go a little beyond that one too because uh one of the things that clive taught me over the years was that definitions are good but purpose statements are better so the definition is it's a bad the purpose statement is we're going to use the bad as a tracking device to make sure that we don't lose people in the hospitals and the tracking device on this bed will tell us what room this bed is in at all times and of course you think about that for just a minute so what room is the hallway ah what room is the elevator ah you know good plan but it wasn't going to work in the long purpose statement yeah anything you want to add there i unfortunately i i missed a little bit of what peter was saying there on my uh on my canadian hamster internet here uh worries the concept of a purpose statement you know you know seen that so many times you have a a systems designer that you know just thinks about what possibly would we want to know let's just throw it in there right what the heck uh not knowing that you know that's going to potentially irritate the customer you know blow it out your data with as you say uh not rotten data but uh but data that's trivial um uh so you know from that perspective and again i as an old data modeler i hate to say it but that's you know that's the beauty of the data modeling process of the of bring getting together with your partner in the business and really asking those hard questions and every data element that you're going to add to a data model should be why and what will this achieve and you know then you can ask the other side of that question is if we don't have it what do we lose one of my favorite stories was the capital one at one point in time was claimed to have the foot size of all their customers that sounds like pretty rotten data right uh as far as i know the capital one's never gone into the foot business and has never sold our data to any uh zappos related uh companies or anything like that but they were data people and they thought it was kind of cute to be able to have that it's a good marketing line once this is me shanning giggle day okay i love that that's funny um so uh i just lost my um question too i'm doing a business research project on data management for my mba and what i've realized is there's little academic work data management compared to professional industry industrial work on data management how are you working with academics to bridge the gap well you can see my forehead is quite flat and so we in dama uh worked very closely with a series of vendors including danie's company to try and push these things into the space i can tell you right at the moment we've made things worse rather than better right now when you go through a course in it in a standard accredited american university and those are the ones that i have experience with danie can probably talk about some of the rest of the ones around the world um we don't even mention the concept of case tools right that's just crazy i mean literally just absolutely crazy they they they learn project planning with paper and pencil but they're never taught that there's a product out there called microsoft product student project uh or better still that there's a wonderful uh piece of software that keeps track of the interdependencies on your data and keeps your data model and fresh and refined formula when i land some place that's the first thing i look at and say where do i get a hold of the the case tool that's there their students aren't taught this so the fact that students aren't taught this means they come out of here and thinking that those things are not going to be useful because of course they know nobody would ever maintain a project by doing all those calculations manually much less plan a major it initiative without understanding these things and so consequently we've we've just had no success so the next thing we tried to do was get to the accreditation boards and say what is it that you know we can put into the curriculum now you have to remember curriculum we're always zero sum game so if i've got 10 courses that i teach in here if i'm going to do a data course i got to take something back out and now you're fighting with somebody about whether they need to have this particular piece in here or whether we're going to smash two courses into one and give them half a a loaf in in this context i'm very disappointed one of the reasons that i'm working in conjunction with groups like danny's and others is to try and actually drive this back the other way we've tried through the proper channels they aren't getting it we've tried through all sorts of different ways we had one gentleman from ibm who just said the heck with the accreditation stuff i'm just simply going to visit colleges and universities and try and teach them how to do this and most of them got as far as saying okay well i guess we'll take our old operations research program and relabel it as data science and say that's done right well that's the study of algorithms that's not really the study of data management so i agree with the question and unfortunately we've been working as hard as we can and not making much success so maybe it's time we get out of the way and let somebody else come in and take a crack at it because our efforts have been less than stellar yeah no i hear you and i don't think it's just the us issue you know my son took a high end bachelor of it technology here ended up as a ethical hacking specialist and i looked at the amount of data in his curriculum and it was almost non-existent less than when i went to school you know 30 odd years ago and that data the one data course that was in my was the one that changed my life and the reason i'm here right so uh it is it is crazy but you know we keep trying to push you know and allow people to use our technology to to enable them but they have to have the impetus to to want to teach it and know that it's something that needs to be teached and you're right it's a lot of jumping up and down with not a lot of people listening unfortunately in the data literacy book i introduced a term called pit i think Todd at my co-author came up with it actually it's a perpetual involuntary data donor PIDD and if you do not understand data you're simply going to be walking around giving your data to people again just take you know a company asks a question can i get your location well you might say sure one time or maybe while you're using the app those are usually the two choices that you're getting when we've measured these things we found out that some companies are taking your location data 14 thousand times a day now if nothing else you ought to be really upset that your battery is going to get very very weak as a result of doing all that extra work for somebody else on your dime but again the the the knowledge level of the public out there is far below where it needs to be and i think this is one of the things we need to work on as an industry not just preaching to ourselves but really trying to get beyond the data people and get to really the citizen which is really where this needs to happen maybe that maybe it's pain-oriented maybe we can you know do a better job of articulating what it means to the the common man and move from there as opposed to you know trying to elevate it as a as a as a science we could we could do a joint session on that one daniel at the next conference right you got it something like that yeah they're gonna let me out of my canadian cell here at some point i'm sure you get a covid test within 24 hours we're all working on that okay the covid tests take longer than than the trip usually so it's pretty silly people are taking covid tests here in canada going to the states and then coming back and using that test proof that they're sorry that's all right we have lots of things we're learning about covid and data too aren't we absolutely okay listen it's all an opportunity absolutely all right i have to jump for a uh 330 thank you so much for joining us thank you so much this has been great uh our pleasure to sponsor it and thank you everyone for attending really do appreciate it thanks sanny and thanks to urwin for sponsoring as always always appreciate the helping to make these webinars happen we are coming right at today end of the webinar here so i will just again to answer the most commonly asked questions just reminder i was going to follow up email by end of the day thursday with links to the slides links to the recording of this session so many great questions that we didn't have time to get to i'll get those over to peter as well so um to take a look at i really love all the comments and the engagement that that y'all bring to to these webinars um so again thank you so much peter thank you so much as always i hope y'all have a great day and happy holidays saying to you both Danny pleasure to see you as always chennon we will see you in a very short amount of time when we all get together back out in san diego for enterprise data i love it such a good time all right thanks y'all cheers everybody