 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining the latest in the Monthly Webinar Series, Lunches and Data Modeling with Donna Burbank. And just to let everybody know, we will be continuing this series, although we will be broadening the topic and changing the name of it to Data Architecture Strategies next year. But also, it will remain on the fourth Thursday of each month, with the exception of November and December, where it kind of mixes in the holidays. We got these special times for December, but same fourth Thursday of each month with Donna. It will continue with Donna. So we're very excited about that. But today, Donna is going to be discussing data modeling, data governance, and data quality. And today is sponsored by Altrix. Thanks for helping make today's webinar happen. Just a couple of points to get us started, due to a large number of people that attend these sessions. You will be muted during the webinar. And we very much encourage you to chat with us and with each other throughout the webinar. So just click the chat icon in the upper right-hand corner of your screen to activate that feature. And for questions, you'll be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag, lessons DM. As always, we will send a follow-up email within two business days containing links to the recording of the session and additional information requested throughout the webinar. Now let me introduce to you our speakers for today. First, the speaker of the series, Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She is currently the managing director of Global Data Strategy, limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in America, Europe, Asia, and Africa, and speaks regularly at industry conferences. And joining Donna this month is her colleague, Nigel Turner. Nigel is the principal consultant in EMEA at Global Data Strategy. He specializes in information strategy, data governance, data quality, and master data management. With more than 20 years of experience in the information management industry, Nigel started his career working to improve data quality, data governance, and CRM with British Telecom and has since used his experience to help over 150 other organizations to do the same. And with that, let me give the floor to Donna and Nigel to get today's webinar started. Hello and welcome. Thank you very much. Always a pleasure to do these. And jumping right in, as Shannon mentioned, this is sort of the final tier in our eye session for the data modeling series. The good news is, if you missed any of the ones in the past, they are, as Shannon mentioned, all out on demand on the Data Diversity website. So you'll see we had a very, very list of topics from Enterprise architecture to BI, et cetera, et cetera. So those live out there, I think, in perpetuity. So if you missed them, you can always catch them on demand. So coming up next year, we're really excited again to try to really mix up the list of topics from everything we're hearing you guys talk about. From data strategy, pure data strategy, to metadata, to graph databases, which was really popular last year, data lakes, et cetera, et cetera. We're also throwing in a few panels. So we can have sort of some of the other thought leaders in the industry really talk about their thoughts on trends, et cetera. That will be up shortly on the Data Diversity site, so you can register for those coming up next year. We hope you can join us. So jumping right into what we're going to cover today, a whole laundry list of things that are separate but related. So everyone, I think, is familiar with data governance, being that sort of people process and policies around data. For those technical folks in the call, you know you can't do that without the technical infrastructure. And so I often get the question, what is data governance? Is it a technical thing or is it a business, you know, people process thing? And I give the lovely consultant answer of yes. But it really is. It's both trying to get that link between business definitions and technical data systems that's sort of the crux of giving data governance teeth. So we're going to talk a lot about that, of how data models and data governance strategies can really help with that. But really, what even adds more complexity in the mix is especially when we're talking about folks like the sponsor with Ulcerix, it's the idea of data prep, of self-service data prep and analytics, and it isn't just data architects doing things. So how do we keep these enterprise standards but also balance that self-service agility to really get the best of both worlds? Because it's a changing world out there. So on that note, if anyone has seen DataVersity and I put together a survey just recently came out in October on trends in data architecture, but as you'll see from our topics, data architecture is broadly related to a lot of different things, one of which is governance. So when we looked at the top reasons for why you're implementing your data architecture, more on the technical side, you'll see that some of the very top reasons were data governance, as well as things like self-service BI, digital business transformation, data science and discovery. So that almost is the summary of our abstract, right, that you have these things like governance and compliance, which you often think of as top-down rules and regulations, and then you have things like self-service BI and things like data discovery, which almost lead more to sort of freedom and expression and things like that. So how do you balance both of those? What complicates things but also is exciting is this idea of more and more roles becoming interested in data, and I think that's a good thing. I remember, as Shannon mentioned, I'm old and I've been in the business for over 20 years, and I remember the old EDW conferences where sort of the common limit was the business doesn't care about data, no one cares about what we're doing, and it's sort of be careful what you ask for, because now everybody is interested in data, and that's a good thing, but now we have to sort of create policies and tools and rules that can help everybody look at data. So when you look at the answer of who creates a data architecture, this isn't who creates reports, this is who creates the data architecture, I mean no surprise the data architect is almost a self-defining answer, but a lot of people now from data scientists to business stakeholders to data governance officers, to enterprise architect programmers, you'll see that it isn't just data architects doing it anymore, just to clarify that question, it was a sort of show all that are responsible. So yes, I think if only, I don't know, a data governance officer was only responsible for the architecture, that might be nervous, but it should be, and I think the proper answer is the data architect in conjunction with these other roles, and because everybody has an input and a need and a use case for a data architecture which leads to governance, so in that case I think that's an excellent trend, we just have to make sure that we collaborate and we collaborate effectively, because each of those roles needs to see and look at and control a different thing, especially when we start getting into the role of self-service, it might be a regular business user trying to write reports for sales, and they're going to have a different viewpoint than someone who's a data architect. So I'm going to pass it over to Nigel to kind of see how that applies to our topic at hand in terms of data modeling, governance, and quality. So Nigel. Hi, thanks Donna, and hello everybody, and welcome from the UK where I'm sitting as I'm doing this, and thanks for joining us. Donna's just said that collaboration is key if you're going to build an effective data architecture, that means the roles, the various roles in the various data disciplines that are emerging need to work together as well. But of course, what that also means is that the different data disciplines, which traditionally sometimes have been seen as being fairly discreet, have to be applied together also in a very collaborative way. We've been titled this webinar, data modeling, data governance, and data quality, and I think probably central to our contention is if you're going to get data architecture right, then you need to apply these three techniques in a very synergistic way to enable you to manage your data in any sort of strategic way. Then this very simple diagram, this is what we try to illustrate here I think, and you can see it can start pretty much anywhere in that diagram because it is I think a virtuous circle, but data modeling as we know identifies the key entities, the key attributes for example, that means that that helps them to scope and prioritize where your focus should be in terms of improving the quality of that important data that you hold within your organization. That in turn then to make data quality happen and to make it a sustainable process, business process, you then need to implement data governance because governance basically puts the business in the lead for data quality improvement and also of course therefore means that some people in the business are there up front leading those efforts to make improvements happen. And then of course once you've got data governance in place then that means that when you identify your key entities and attributes in your modeling activities you can then start to identify who the owners of that data are and who the data steward should be in turn then to drive up data quality. So we see these three things very much as being synergistic and in terms of what each of them does we spelt this out a little bit on the next slide and I'm certainly not going to read this whole slide from end to end you can read it for yourself. But I've tried there to sort of indicate really what those three disciplines bring to this big picture. So I've already mentioned in data modeling you know mapping out the relationships helping the scope and prioritise the data identify those business people who may become stewards and owners and I think also as well I think we all know that a well produced clear data model is a very useful communication tool and certainly in the last piece of work that Don and I've been involved in in the UK producing that communication tool was a great way of selling the need for a more strategic approach to data. It can also then of course help you to define some KPIs and metrics for some of your key data once you know the attributes you can say right you know what should the values of these attributes be what should the accuracy of these things be and then ultimately as Donna said the best data architecture in the world delivers nothing. What you have to do is then translate that into real data within real systems and real platforms and actually make the change that way and data quality and some of its techniques and approaches also have great value here of course. Data profiling is a very good way of baselining the current state of those key data entities that have been identified in the modelling activity. Doing that shows up to the business and to IT where some of the problems and some of the issues are obviously then once you've identified what they are they can help you to improve through data cleansing, data enrichment and also sustain those gains by automating the application of those business rules in an operational environment and of course as well an empirical foundation for KPIs and metrics with models help you identify the KPIs, the data quality techniques help you to develop the foundation and the baseline you can then set some realistic KPIs and metrics and of course as well it gives you some real empirical information for building business cases and then the way we see governance as well and I'll come to this in a second, it provides this overarching strategic framework within which these activities can take place and obviously as well it ensures the business leads on the definition of business rules on the definition of KPIs for data because after all it's only the business that really know the importance of the of the data and how good the quality of that data needs to be in order to meet the business need. The other thing I think is really important about governance and I'll come back to that later in the webinar is the way that governance is a very good mechanism for actually creating the cross-business teams that you need to tackle end-to-end data problems and issues and also of course then helping to actually present and deliver the business case. So that's how we see the three things interacting. In terms of the role of governance Don and I have used this framework with a number of clients recently and it must be said very successfully and if you're going to build an effective data governance framework then it will really encapsulate and encompass some of the things I've been talking about and the first thing you actually need is to be able to link your data architecture to your organization's goals and objectives so you must be used it to answer questions like well how does the organization depend on data now what data does it depend on critically what data is it really important to get right and how good is that data now and also of course then to think about the business of the future what does the business need to do in two years time or three years time that they can't currently do and therefore how does the data need to evolve to support that and on the on the right you need of course there a good understanding as well for your key data of what some of the key data issues and challenges are as I mentioned earlier you know if you start with a very low baseline aiming for the sky is probably not going to be successful so you have to be realistic about the goals that you set for yourself in the program and then of course you define the vision and strategy for what you're trying to do and sell that actively communicate that into the business and then create the necessary organization to make it happen which will include people like data stewards and and data owners of others devise the process of workflows that you need set the KPIs and manage the data and measure it on a regular basis and then change the culture of the organization so that data is seen very much of the heart of what it does and then supporting all of that as Donna said earlier none of that happens unless you can implement that in the real world and that's where tools and technology come into the picture so there's a lot of areas there I think where both modeling and data quality play a big role just to mention data modeling for example that can help you define that vision and strategy because it gives you the data big picture that you need to start that process in terms of organization and people identify to the key stakeholders up here there helps to identify the key stakeholders for your data and therefore links them to the data to become potential data governance participants maps of the key data relationships identifies the key data areas and also I think from its communication value helps to build a data-centric centric culture and it also helps of course then to inform the investment in technology that you need to make in other words that's the teeth that Donna talked about earlier and what's the end goal of all this well I think this analogy always seems pretty appropriate to us many organizations certainly that I've been involved in have a culture where reactive data improvement seems to be the norm and not the exception in other words they wait until the fire breaks out they then create the fire brigade and they try and put the fire out but if you go to manage data more strategically within a data architecture data governance framework the emphasis should really be on developing something that prevents the fires breaking out rather than waiting until they break out and then fixing them so the main focus of data management people within that framework is there to prevent fires in a very proactive way I think it also illustrates one last point I'd like to make as well before I hand back to Donna and the thing is about data improvement I've often heard people say well we're far too busy fire fighting to actually call the fire brigade and to teach us about how to get it right the problem with that is that of course you're not making a choice of whether you're doing data improvement or you're not doing it the choice you're making is are you going to do it reactively and rather badly and inefficiently or are you going to try and build something that will help you do it in a more systematic way and a way that avoids the problems rather than waits for them to happen so data modeling pretty key to all of that so what I'll do now is hand you back to Donna who'll talk a little bit more about how data modeling can contribute to that picture Donna? Yeah and just to chime in on that point the way I often say is if you don't have time to do it right you have time to do it again right so you're going to spend the time let's just make it effective which sort of ties into our idea of this data model and the beauty of a data model and we talk a lot about communication in this presentation because it really does translate your business rules and definitions to the technical systems because governance is my answer above is it business or is it IT? Yes it's both so and most data models can have you both that conceptual and logical model as well as the physical model that actually implements the systems and that's that nice connection or the teeth that a business person can very easily understand that you know we have a product that a product is sold to a customer and then you can start getting questions like is a customer the same as a client or is it different and so I kind of made the point even a little more obvious by using pictures of things but I have literally done that in a data model and some of the data modeling tools will let you put a picture and sometimes that's very important I have a picture of a product and that's a box and they say well actually no we're selling insurance products which are really more maybe someone else will call that a service or et cetera et cetera so it really makes clear what you're talking about I'm working for a large utility company and the idea of an asset what's an asset is it part of the plant or is it a computer asset for asset management you know all those sorts of things but you put a picture of a truck versus a picture of a computer on the model people immediately get that with that thing is because as you've heard me say in the past that that is really what a data model is it's the things of the business and it's the rules around those things and that very easily can then if you say you're doing the top-down reverse engineering create those technical structures and all the rules around it and or I always do both you can do that bottom up of how do I understand the business well start looking at the systems that are running it right and that's your physical data modeling you can learn volumes often there'll be a business process you might have used a SAP system or a CRM type system and maybe it's not working for you well often it's not working because the way the data is set up might not match your business process something as simple as I work with a customer as simple as can a customer have more than one email address right and they wanted it to have two well the system held it in an attribute so you can only have one and it caused wreak havoc on the business processes but it was just the database design error or miscommunication so that is the beauty of having those rules and then even within the technical systems to a lot of folks governance is just that do I have domains do I have naming standards do I have foreign key constraints is my model in third normal form that's a perfect form of governance I mean sometimes I get frustrated or maybe it's a positive thing you know something as simple as that support wrap is putting in a list of states and they're manually typing it in well wouldn't that be great if we had a state code list state code list that sort of gave you an automated list which as we know that's sort of a linking a table and having the right drop downs right so it's all integrated can't separate business from IT sometimes the business process could be supported by a better database structure and sometimes the database structures are not correctly planning or could be informed by business rules and definitions so that's why they fit very nicely together and data models how much hotter than ever so we just did the survey that I mentioned in the beginning over 96% of folks who took the survey said they were doing data modeling of some sort for those of you who say no one's doing data modeling anymore you're wrong so this is the sexiest job of the 21st century and I'm actually as I said I've been in the business and data modeling sort of went out of style for a while and now we're seeing a massive influx of demand not only from business people which is sort of the usually often the first folks that ask and they'll ask for the word data model folks like data scientists I want to know the rules behind the data system developers I like need to understand what the business rules are to implement my system so both tech and business both can appreciate a data model because really that data model is that when we think of governance right and I had a client that did just this actually you might have a policy or regulation I use GDPR as an example or it could just be something simple we need to not share personally identifiable information or PII or PHI if you know if you're a healthcare company or student records if you're a school right that's nice to say you can't share personally identifiable information some of it might be off obvious my name probably you know but is it what about my nickname or what about my avatar if I'm using an online program right or I'm a social media and I have a you know a meme or not me but you know my persona online is that identifiable so it isn't always obvious that there's fine lines so what's nice about using metadata which can be driven by a data model or metadata tagging tools etc you can actually go down to the field level and so when the developers looking at this table okay this particular field is PII and sometimes you know it's situational based but that's the beauty of a data model or metadata or you know whatever solution you're using is that there's this map of this actual field you're using that connects what legal might say that you can't do or can do what your policy might say the actual the implementation that you're doing day to day you know I think this is clear to a lot of folks in the call but the beauty of this is tracking both technical metadata you know what tables and columns and what their data types are to the definition of that whether it's employee is that someone that's worked with our company in the past six months or someone that works there currently you know a lot of subtleties in these business rules and we're not going to forget that there's actually a John Smith out there so especially when we're thinking of things like PII or PHI and GDPR the whole purpose of that is to remember there's a human being behind this data so please manage it that data appropriately here's some more examples that you may have seen before should be obvious but what's interesting is you can track things like data stewardship on a model as Nigel mentioned we've had a lot of customers kind of have that aha moment for governance by starting with a model because as you any of you who've done data models why we find them fun is often that definition of what do we mean by a customer and you say that to somebody who hasn't done this exercise and they say are you serious we're spending X amount of dollars to say what do we mean by a customer open the dictionary right that's not hard and then you start showing them the data model and they say well marketing uses a customer for this for people who haven't bought the product and then well support for them a customer has to have an active support agreement and we actually say a customer has an active account but if we're reporting to the regulators we actually should say the amount of people you know blah blah blah insert your business case here and then that light bulb moment goes on off with your light bulb whatever happens with the light bulb and that starts to say oh exactly how are we going to manage this who would own that data who would well exactly and and it never is black and white people you might own the data for different use cases and getting a governance team that people in process Nigel was talking about has to be the case because what what makes sense for a decision in one part of the model may not make sense to another group so that's where the data governance can come in and it's not doesn't have to be a fight over it just says I want to be aware that someone else that has a different use case for that data a big fan of this human metadata this this cultural knowledge of avoid the I just know and a lot of us can do that are you serious you want me to define a part number seriously I just know that's the part number but this is anyone who has done data modeling for more than 10 minutes you'll know or any system or done any application development there's a lot of subtlety in those definitions and it might be this gentleman is about to retire oh that used to be called component number oh light bulb goes off I didn't realize that so that can be stored in the glossary it can be stored in the metadata repository and or data model or as we'll talk later in this presentation there's a lot of more self-service type collaboration tools again a lot of the folks that know this stuff aren't on your data architecture team or on your database development team they're the people doing this day to day so as they start to do self-service BI self-service data prep that's where you're going to get these aha moments of wait a minute this data doesn't mesh so letting these people speak for themselves you don't have to have committees and go around and interview people it just happens organically is huge that's not the only answer but as well hopefully explain later it's a nice augmentation to the stuff you're already building in the model you'll get probably even more information by letting these folks chime in and see and munch and play around with the data and I've shown this this cartoon before but hey you have data modeling cartoons you have to use them and it's not funny well maybe it's just not funny but it's not funny unless you've done this again in the business of okay we're all done with our acceptance testing everything looks great we're going to launch the application this question what does the customer again and as Nigel mentioned trying to retrofit that after the fact well it might be job security for a lot of us but it's not fun you know if we got this job done up front we could be doing the cool stuff for this application and not reworking everything because we got the definition of customer wrong and we didn't get all business rules right so get that up front before you have to go back and change things get make sure everyone's in the room who might have a different opinion on how they use that customer data a big fan of if you have these definitions glossaries are great but a lot of modeling tools can actually show that definition in a model and one of the reasons I like that well what's the difference between a customer and a client well I can look at it right there what do we invite was it a broker and a salesperson are they the same you can call it right out in the model so that business people are the people looking at it technical developers who want the definition can see it right there and that's the beauty also of a data model is that it can kind of create both that business definition of what do I mean by customer and then only get with the reverse engineering on the technical side with that actual data inventory whether that cut and this is if anyone is actually doing customer data out there you'll say this is a tiny subset of what could be but we have a SQL server and Oracle and SAP system if anyone's using an SAP or other CRM I'm not just picking on this one one vendor but my little joke there is that it's a black box right often those systems are so complex and to give them credit they're complex because business processes are immensely complex and their role is to create this you know help your business processes be more efficient probably not goal one of making sure it matches with your glossary right so they just by nature are very complex order shipment you might be passing through XML data lakes your point of sale system has its own data so are they all integrated and that's really what a data model can help link up to really get this goal that a lot of folks have this idea of data lineage so this may be a simplification this is sort of the classic use case that everyone shows I have a report and I have a figure on this report where did it come from and this is kind of the classic case that came from your source system we did a staging area and we did a dimensional model or you know in the model and we now have that in the warehouse and we see that lineage well that's fine that's super helpful and that's where using some of these data models or modeling tools or metadata repositories metadata is in these systems you just often have to link them together and a lot of tools are getting better at that but then when we think of the new world and things like self service there may be other things that we manipulate here it could be done in an Excel spreadsheet it could be development in AWS where we're off putting things to the cloud or we're getting some end of things data and moving it you know from the cloud onto s3 onto the on pram to do things with the warehouse and then sending it out to the lake so it isn't obviously as clean as this so using tools whether they're self service tools or data modeling tools pick one that exposes its metadata I guess is my because once the metadata behind those movements is there you can create these lineages so having done this a lot of sites it can be amazingly easy and it can be amazingly difficult kind of depending on how you're doing those movements so it's often good to get the right tools and give that some thought how we can get that out it can move my own slides and this I like as a summary because that's the idea to me of that business meta the business rules and policies around governance and then the technical metadata that really makes governance actionable so it can take these business rules and define them into your technical implementation so you don't have developers wasting cycles trying to say what do I mean by a customer they can go to a public definition vice versa the technical implementations are clear to everybody and we don't have to argue about how many fields the account number is I mean I worked with a major brand name customer just recently where someone changed the account part number from 11 characters to nine characters and brought a system down right I mean this stuff happens people are humans and that was a pure data governance we really shouldn't do that change field names without letting people know field names right so that's the technical data standards and then when you have that link you can do that audit and lineage between what is PII and what data would database fields have it or how did I get the figure on the report right so it's sort of the complete story through metadata from your business governance down to your technical implementation I'm going to pass back to Nigel to talk more about the data quality aspect. Yeah thanks Donald I touched on the governance the importance of the governance framework earlier Donald's just I think exposed very well the value of data modeling so I thought it would be worth just focusing briefly as well on the role data quality has to play in all of this and as Donald said ultimately this is all about making real data improvements within real systems and this is where I think data quality has a key part to play in creating and maintaining data improvement within the strategy that we're talking about and I bother with data quality well there's some fairly well-known statistics I think I'm not sure about the first one there are 2.5 quintillion grains of sand on the earth if you don't know what a quintillion is by the way it's one followed by 18 zeros and I know there are 2.5 quintillion grains of sand on the earth because Donald got me to count them all that's research for this webinar and when you compare that with the amount of data that's now being created every day well three times as much many bytes of data are being created every single day than the grains of sand on the earth nearly all of that has been created in the last two years and that means the Moore's law is now being exceeded and data volumes double which says they are every 1.2 years I think now probably that figures a couple of years out of date and it's probably now every about 10 or 11 months so the bottom line of all this is if you don't have your data under control now and an understanding of how good and how stupid for purpose that data is it's only going to get worse and in fact it's probably going to get much worse and just some recent evidence that we've come across from a survey done by Nagel Redman and Salmon recently published in the Harvard business review I thought this was quite enlightening here's a piece of research where basically it's very simple methodology they are 75 execs to identify an eyeball around 10 to 15 critical data attributes in 100 randomly selected data records from systems within their organization and then they said tell us how many of those records are error free so in other words all those 10 to 15 critical fields are actually fit for the business purpose for which they were intended they're rather shocking conclusion from that was only 3% of those records were actually error free or contain less than 3% errors so that is really weird it obviously goes to show again that unfortunately in many organizations poor data quality is the norm and not the exception and the other thing I would say as well there is a difference between legacy data and newly created data but only to the extent that 47% of the newly created records that we examined were also error free so it's a bigger problem the older the data becomes which we'd expect anyway because the normal rates of data decay but it's not a problem that's being resolved by new data and the impact of some of that on some of the things that we want to do in our data architectures I think it demonstrated here the bottom left one this is a fairly old piece of research now but every piece of more recent research I've done displays this in not quite a significant way but the US economy on itself loses over $320 a year because of some of the issues report data quality and it's not just in the traditional world that these things become a problem and you'll see the other three facts that we've got from various recent surveys show that it's impacting the new world of data as well so you're talking about data science and the analytics space and all the rest of it that you know companies are employing very very expensive data scientist analytics specialists to do a lot of this data analysis work and in reality what's happening is they're spending up to four days a week of their five day weeks doing nothing but doing some very basic data preparation and scrubbing activity before they get even start to implement and start to gain some insight from the data that they're looking at and I certainly come across a couple of companies where they do a lot of this stuff by literally by eyeballing it on spreadsheets there are much more efficient ways of doing it it's a kin I would say to you know try to clean the dirty floor with a toothbrush there are much more effective ways of doing it and data preparation tools or something definitely such as things to work thinking about so the scarce resources are not doing the job they're paid to do so why are all these problems in existence and why haven't companies sorted them out I'm not going to go through this again in great detail but this is why sometimes attempts to improve data through governance and quality actually do fall over and one of the big things is the lack of business leadership and commitment and there was a recent survey done in the UK and they found that 23% of the people they interviewed they were mainly data management professionals said that the biggest barrier they face was the lack of business leadership and commitment to data improvement and to creating data and sustaining data of data architectures and you know it's quite a shocking statistic that there's still so many people in our businesses many of whom claim to be data driven who still don't get how important their role is in leading data improvement activities and one of two others there as well failure to focus on the data that really matters if you boil the ocean you'll fail if you focus in on the key data that you know is important which techniques like modeling can help you identify you've got a chance of getting somewhere and having some success another thing I've noticed as well is that sometimes there's a lot of emphasis on data monitoring and not enough emphasis on actually improving the data monitoring data actually is a pointless exercise in itself because all you're doing is showing each month how bad your data is well what the focus of governance and the focus certainly of data quality should be is demonstrating continuous data improvement and then finally as well I think as well that you know if this is a culture change and therefore some of the techniques of governance some of the tech some of the artifacts that models data models generate must embrace everybody uses data across an organization it's not set the data management experts that need to be educated it's pretty much everybody across the organization so there's a lot of reasons there why it's quite hard to do this stuff the second reason why it's quite hard to do this stuff is because of course data rather sort of being conveniently doesn't follow normal normal organizational structures and what the next slide shows very simply is that as we all know this if you're going to succeed in solving problems with customer data you cannot do that within the sales department or within the finance department what it requires is for those departments to work collaboratively together and that includes the business people in those departments the process owners the IT specialist in those areas and also the subject matter experts for data in those areas because you know effective data management data quality improvement requires the organization to work collaboratively and horizontally across the organization to solve the end-to-end problem I've had lots of examples of this in my experience certainly in doing that and on the next slide I will demonstrate why these things happen many of you familiar with this it's normally called this little executive toys normally called Newton's cradle named of course after the famous English scientist Sir Isaac Newton it's also called Newton's balls but I think Newton's cradle sounds a little more dignified personally and if you take the example of earlier of the customer data I came across I've come across several examples in several companies where the quality problem is really caused data input it's you know a client either rings or has their online application processed by some sort of front end marketing sales department they are not as careful as they might be about capturing all the customers details correctly and they feed it into the system job done and it's really only much further away in the organization in the customer life cycle where some of those problems start to emerge I mean the best example I can give for my own experiences is that when I worked in BT our front end staff were not very good at capturing customers addresses so the customer ordered a product like broadband addresses were sometimes hastily scribbled down because the people at the front end were more concerned with getting on with the next call than they were to capturing the data correctly given Newton's cradle what was happening there they didn't really feel the impact of that but when the guy in the van went out to try and install the equipment in the customer's house on a disturbingly large number of occasions because the address was poorly recorded that engineer in that van could not find that address that was really of course badly impacted the customer experience it wasted engineers time and of course it wasted money for BT at the time so it was only by getting the engineering workforce and the sales people to work together to solve the problem did we get a resolution of that so collaboration is most definitely all so what's the sweet spot of all of this then we're bringing back to the theme of data modelling data quality and data governance again so again what we're trying to demonstrate here is that these three disciplines need to work in harmony together you identify your core data through your modelling activities you ensure that it's properly owned and stewarded through data governance processes and then you use data quality techniques to focus on improving that data and that's what I would certainly call the core data sweet spot and this can then deliver real and substantive benefits to an organisation but that doesn't imply either finally that all data is equal and this is where I think we come back to some of the new uses of data in the big data world and in the analytics world and in data science that not all data is equal we developed this pyramid which we think is quite a useful way to demonstrate what you need to do in organisations is do just enough data governance so the top layer of that pyramid the reference and master data we'd expect to be at any organisation I think we'd expect that to be very vigorously tightly controlled with clear owners with clear data stewards and with clear plans for data improvement and data management and in many cases what's the threshold for quality for that probably needs to be done only at 100% as you can possibly make it and then moving down to sort of core enterprise data which is not necessarily master or reference data but they could be things like financial transaction records for example then again the quality of that data needs to be pretty high and therefore it needs to be pretty highly governed and I'm pretty carefully controlled functional operational data at the third level a lighter touch is probably good enough for that because simply because the volumes of data begin to grow and increase and at the bottom in the area where analytics really is a lot of that stuff demands the lightest touch of all some policies need to be adhered to but basically it doesn't have to be controlled anywhere near as vigorously as the data in the top three levels so in other words your solution needs to be proportionate to the problem so what I'll do at that point is hand you back to Donna to talk more about those bottom layers arise of sort of bi analytics and data preparation Donna great thank you so yeah as Nigel mentioned I mean there's certain things in the organization like master data like reference data that should be very closely monitored very closely modeled you have a data model someone asked a question in the in the comments where does the MDM fit you know this is one place where it does fit and you should definitely have them data model for MDM you should definitely have data governance for MDM you should definitely be tracking the data quality for every master data element very very closely right but what I've seen some companies do is take that a little too far and then you know we've been very lucky that all of the data governance projects that we've been working on in the past few years people have been asking for the business people have loved it and because they understand like the effort that Nigel mentioned if we have the address wrong we're trying to send someone out in the truck they're going to go to the wrong spot you know it's just obvious that we should update the governance where you get the people rolling their eyes and not loving data governance so much I think is when people take that too far and as Nigel mentioned getting that sweet spot of what to manage very closely and what to leave alone is critical to your success of data governance so everyone should be looking at the same core list of vendors if that's what we're mastering but when someone's trying to do some exploratory self-service analytics leave them alone right so yes they should be using the core established reference and master data but if I'm trying to download some weather patterns from some open source and do some exploration don't over govern that at the same point of course I think it's obvious the point we've made don't under govern your master data but I think the point that needs to be made a little more strongly is this idea of the rise of self-service bi analytics data prep so if you think back to earlier in the presentation all of those different roles so not everybody a lot of people when you get master data right for example because that question came up I don't even know that there's master data I just know that my point of sale system the list of menu items on my menu is correct right because someone has mastered that in the background when you get it right people don't notice that's the beauty of it right or the address is right because we have the right master data for customer but so that works for the standard data but one of these setups non-standard so A it may not be relational I might be getting some internet of things that were social media sentiment analysis or you know open source datasets I wouldn't agree or Google maps all this different kind of thing how do we handle that so if you think of this new type of user that we're seeing more and more of for many reasons part of it is I'll go back to my previous slide several reasons one is the tools are just slick I mean what folks have to do before I mean I'll go way back on the main frame right to just integrate two datasets but you get something like some of these self-service data prep tools it's just very easy to integrate some of the data and manipulate it and report on it so you know the self-service data prep and then the self-service BI in both of those tools have some very nice things the accessibility of data as I mentioned that the amount of data you can just download off the web from scientific research and integrate on your own and do some cool reports even things like R you know or Python is amazing right and I think there's two things A the tools are easier for business users but business users themselves are a lot more tech savvy and even though we're all using it I sort of hate that term business and IT like they're two completely different things and you know IT is part of the business and we sort of make the assumption that anyone who's in business couldn't possibly understand technology and data structures and of course they do I think it's some science finance you know some of the best data scientists I know or I mean we came from other you know scientific roles that are business or finance that is business right so this new type of user is very savvy and she loves to live in both worlds so it's not an either or so one of the business authors wrote a book and one of the quotes I liked was that I mean beauty of and in the tyranny of or I think we like to create these false dichotomies that you either have data models or you have self-serve and those are two camps and they don't live together they certainly live together so this type of person that's doing analytics self-service data prep of course if there's public definitions that are right and I can just get the list of customers in their current address thank you very much because if you think back to the statistics that Nigel showed you know this lady's probably spending four days out of her fives that she has to do work just cleaning up bad data and she wants to make discoveries from it right so yes if there's standard master data if there's standard glossaries to know what this data means if anyone saw my I think it was February where I did a business intelligence webinar where I did that myself I downloaded some open source data got one of these new bi tools that slick to use and tried to report on it but there was no metadata from this open source data set so I had no idea what those columns meant and that's what a lot of people see when they start looking at some of this enterprise data what does this date mean and how's the context right so those people love the stuff we just talked about but they also want be able to do some self service I want to take those standard data sets and integrate it in myself to do some exploring and analysis and modeling the other kind of modeling right I'm doing some statistical modeling and some you know data visualizations and all this exciting stuff and she has a lot of stuff in her head that she'd love to share so I see this is slightly different from what and they're integrated and when we could argue all day of where that fine line mixes because some of the metadata tools are becoming more collaborative and some of the collaboration tools are becoming more governance focused so there is a emerging line there but they have a lot of information in their head that they should share that it isn't just living in the data architecture team so there's this I kind of see two worlds here I kind of see the encyclopedia versus Wikipedia right so in the encyclopedia world it was a few academics that sat in a closed room and this is the definition and now shall consume this truth that has a place I think if we're talking about those standardized enterprise data sets yes those should be locked down it shouldn't be just a few people in the room it should be the right people in the room from all of the different areas of the business but then yes that by definition is sort of the encyclopedia approach but there's also the Wikipedia approach and I'm old enough to know when that got hopefully some of us on the call were when that first came out and there's a lot still a lot of skepticism how can that be right it's just a bunch of people editing stuff but if you look at it can often be very helpful it's that idea of sort of eventual consistency that if enough eyes look on this we'll eventually found a better source of truth by this constant effort and I think that's almost the beauty and the definition of this idea of self-service data prep self-service data governance and analytics in some ways is a different viewpoint so what a lot of these tools offer is this idea of high harnessing tribal knowledge if you think of it so that lady who's writing the queries she might have almost gets back to the classic definition of seeing people argue what is the definition of customer and what is the definition of total sales and it depends and it depends in a good way so what you might be reporting to the street versus what you're using for forecasting and so if you publish these these queries and see who else is using it oh the sales team uses this and there's six queries out there but the one that everyone's using for reporting is this one that speaks volumes or you are reverse I had a customer that reverse engineered one of those ERP CRM systems and they are daunting so what he did and said what tables are people actually hitting that's how I found out again it's not a perfect truth but nothing's a perfect truth what are the glossary terms that everyone's looking at you know so that kind of usage ranking can speak volumes helpfulness ranking which especially when we're doing things like sharing definitions sharing algorithms for my models sharing queries again what is the query of total sales that is you can say that shell right and that has a place but often the devils and the details of what people are actually using so a lot of these you know tools can actually have that helpfulness ranking the usage ranking and then what I think a lot of the value is this idea of collaboration and crowdsourcing so if you think back to the example earlier that you know part number used to be called component number before the acquisition so that guy's now he has this little avatar right he's gone to the new world and he sees this definition and you can have alternate names that's why it used to be component number but it says it's an eight-digit alphanumeric field and he says this used to be a 10-digit field you know it didn't have letters what's wrong and this other lady jumped in yeah no no it's the same thing I had the same problem we actually have a program that parses it out and converts it click here and you can get a copy of it awesome right so either if we just did the encyclopedia approach you would have either forced a definition on folks or maybe the definition is right but people didn't have the context and even better this lady has a solution that you can link to because she's been in the data and been doing self-serve and this is the beauty of that collaboration and crowdsourcing and maybe she's in London and he's in Helsinki and they never would have met and so that beauty of being able to get that collective knowledge through these tribal knowledge is really the best of both worlds so again I don't want to say it's an and an or between the encyclopedia and Wikipedia you know often you do have the standard definition out there but just let people comment on it and that's the beauty of both worlds of really getting that information that you might not have found before so to sort of start summarizing it is a balance so in the modern data landscape there are certain areas of your data again state of standards reference data master data that really should be based on these data models and standards and steering committees and formal really strict governance and then there's also this idea of the collaboration and based governance where you have yourself service data prep you're doing analytics you're doing discovery and some of that discovery can feed back into your standards base while we found that if we track social media accounts for our customers we really can get better sentiment so we can we add that to our master data and that's the beauty of both worlds that you really get that super set of working the two well together so a don't ignore both sides don't say oh those people doing self-serve they shouldn't be we're going to lock that down of course they should be right that's their job they're doing discovery so give them the standards they could listen to what they're saying and on the self-service side don't say oh those people these old school people doing models this old school why do you do that anymore maybe look at some of the standards you know who gets excited well maybe some of us on the call gets excited about country code lists and things like that right but if they've already solved some of those problems use what they've developed so again that's I think the beauty of this both self-serve model and more of the governed strongly governed top down which goes back to this pyramid that we mentioned know what to govern in what way to summarize so that we can open it up to questions because I know we're getting close to the time data governance is yeah it's the yes right it's the all of the above it is the people it is the process it is the tools and then taking those tools and technologies to fit the right use case so am I crowd sourcing some metadata for my my analytics queries and the data you know preparation around that or am I locking down for my more structured master data type information so just quickly before we had opened up for questions this is us Nigel and I do this for a living so if you need help let us know we have offices both in the UK and US you can help you worldwide hopefully you guys can join us for some of our events next year we're really excited about the new broader focus I was with Shannon in Chicago a few weeks ago at the data architecture summit and had a lot of the excitement was really nice to see all of the new things that are being done in data architecture that we can hopefully hit on just quickly those two papers I mentioned we have a trends in data architecture we also have last year but it's still relevant trends in metadata management they are both available both on dataversely.net and our global data strategy pick your pick your pleasure and you might find some interesting things there so at this point Shannon we can open it up for Q&A Donna and Nigel thank you so much I love it and thanks to our attendees for being so engaged I love the chats that's going on too so just to answer the most commonly asked questions I will be sending out a follow-up email for this webinar by end of day Thursday I almost said Monday because I'm used to the webinar being on Thursday by end of day Thursday with links to the slides and links to the recording of the session and anything else requested throughout so lots of questions going on here so I'm just going to start right with the one I'm looking at immediately how do you differentiate data catalog from metadata the reason I ask is because the vendors are muddying the waters in this space I'll take that first and Nigel if you want to chime in there's always muddy waters in all of these spaces and part of it is legit in that there is a set of a fine line how I see some of these data catalogs is often more on if we think of the Wikipedia versus in fact they're more in that Wikipedia side and it more is this tribal knowledge we're often there cataloging kind of for some of these analytics and just take a look when you're looking at the tool a lot of good tools for different sources a lot of them are really aimed at these business analytics self-service type folks and if that's what you're doing a lot of they're awesome more of the metadata repository type is often that more enterprise wide more set to some what we were talking about earlier on the call of I'm trying to get my data model and my relational databases and more of the traditional enterprise wide structured metadata repository and we can go on what's that difference between that and the data dictionary and a glossary you know there is some overlap but I would maybe just often I hear the word used as that how I describe but just to ask the next leading question to the vendor in terms of what their use cases are and where their scope are is and see if it matches your use case because you know I've seen people go both ways I've seen customers try to get the enterprise wide repository when they really just needed the cataloging tool and they were frustrated is almost back to our pyramid and I've seen people try to do full enterprise wide repositories for the one of these cataloging tools and it just wasn't meeting their use case so I would just get back the names and see what the functionality is and see if it misses your needs matches your needs Nigel any other thoughts on that or no I don't think that's a pretty accurate picture of where things sound yeah Don I mean can you go back over the light bulb going on and off I probably was stream of consciousness there when was I talking about the light bulb oh I think the light bulb moment I was talking about was when we were talking about data models and actually I had one of those light bulbs go on this week where the person actually asked that we don't we steer away from data governance that it was too unwieldy and it just took too long and we focused first on the data model because there was a realization that we just needed some of those business rules and then when you started to look at the data model maybe this one that I'm showing now is an example of well one group called the customer and one group called the client and can we make those the same and what's the difference between we started to see the overlap yeah my analogy I use a lot that's probably overused but it works is you know the seven men looking at an elephant and one looks at the tail and I think it looks like a rope and one's looking at the legs and it looks like a tree and one looks at the trunk and it looks like a snake in their own right and when you try to explain that to sometimes a sponsor in terms of the governance it just seems so academic seriously you people are up in a room and arguing about that and then you start to show the the actual model or process flow diagrams or any enterprise architecture and they start to see those groups that are conflicting and that's the light bulb that goes up ah so how are you going to manage those conflicts and how data is used in different ways and then they sort of get the need for governance because that governance is getting to that super set of the definitions I think that was my light bulb comment anything else and Nigel on that or no I mean the other bit of light bulb I think with with the client we work with early this year dollar as well was was they were trying to develop data standards and really were struggling as to which data standards to develop and why and it wasn't until that we produced I think the conceptual model and then at a lower level certain areas logical data models that they began to understand where they needed things what they call control lists which we would regard as reference data lists and also where they needed data standards because these were you know these are areas of data within the organization that were pretty much universally used so it would be the top end of that pyramid if you like was where they needed to really focus their efforts and I think we are from that they had a light bulb moment and suddenly realized haha that's how we decide what future data standards we need to build yeah yeah and I think we have time for one more question here how do you measure your governance framework is progressing what kind of KPIs are you recommending to know that you're being successful I'll let you get that one Nigel yeah I love ago that Donna I mean well obviously you know what you're doing here is when you create a data governance framework you what you're trying to build is a is a plan for improving data and like any plan it needs to have tasks and owners and deliverables so many of the many of the things that you measure in terms of success are KPIs in terms of you know did we meet the deliverable dates have we delivered this deliverable when we said we would how many and then some of them are quantitative so they might be things like you know we've identified we need 17 data stewards in these key data areas how many of them are currently in place you know how many did we say we have in place at this point and then of course you know from the more data quality perspective you can then start to think about a bit like the the Harvard review that I mentioned earlier you know how many of our data records and I'll fit for purpose how many what percentage of them are and therefore what you're trying to demonstrate for using things like dashboards in those areas or how to how to show continuous improvements so that the governance that you're delivering and the data quality improvement that results from that is actually generating real business benefits the one thing I would say about that is when you're measuring data specifically is trying somehow to connect that to the business drivers because saying that you know Donna mentioned addresses earlier on in BT when we did that piece of work that I talked about earlier you know we we reckon that the the the the accuracy of addresses when we first looked at them was I don't know around 80 percent I make these numbers up it was quite a while ago and what we wanted to do was drive that accuracy up to 90 percent so you know every month we would we would we would put those addresses through a data profiling tool put some basic rules around it and then produce some reports so you because actually demonstrate then you know month on month that we were actually making a difference and making it better but at the same time what we were doing was then we came up with a figure it's too long to go into the explanation as to how we did that as to what a bad address was costing the company so that we could only therefore we could translate that improvement in data to a reduction in what we call the costs of failure so it's very important whatever you can to try and put some sort of financial value on data improvement. Yeah I'll just add one quick thing to that if you'll let me just quickly on the idea of the pyramid is to touch on what Nigel said is we all start with those few KPI's around the few data elements that everyone's using and then if you can tie it so my obvious one might be customer email address can we get not 99 percent right and then the some of those you can easily tie to a business KPI if we get 99 percent of our emails right our marketing campaign effectiveness might increase by five percent so aligning some of your technical quality KPIs with your business KPIs is a really nice way to show value of your governance down the road. Well that does bring us just past the hour Donna and Nigel thank you so much Nigel thanks for joining us this month and Donna thanks as always for another great presentation love it a fantastic topic and thanks to our attendees for being so engaged in everything that we do I just love all the questions sorry to have time to get to all of them today but again as Donna mentioned we will be changing up the series and broadening the scope a bit of the series topic to data architecture strategies we hope you'll join us next month for that you'll be seeing the links for that soon and I hope everyone has a great day happy holiday thanks a lot bye cheers