 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining the latest DataVersity webinar, Conformed Dimensions of Data Quality and Organized Approach to Data Quality Measurement. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right-hand corner for that feature. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or, if you'd like to tweet, we encourage you to share highlights of questions via Twitter using hashtag DataVersity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Dan Myers. Dan is a principal educator of Data Quality Managers, excuse me, Data Quality Managers, an e-learning company focused on providing information and data quality learning material. As an information quality practitioner, educator, and thought leader, Dan conducted a robust comparison of key IQ authors' lists of dimensions of data quality and proposed a way to align the data management community by using common definitions. In 2016, he proposed a standard first set of dimensions based on his research and called it the conformed dimensions of data quality. Previously, Dan worked as an applications developer, data modeler, and manager of data governance and data quality. And with that, I will give the floor to Dan to get today's webinar started. Hello and welcome. Thank you very much. Really excited to be here first time doing this, having done the conferences many times, but never the webinar, so look forward to this. Here's the agenda here, and I wanted to just cover some of the basics of the dimensions of data quality. I'm sure we have a broad range of different individuals attending today. So I'm trying to cover all of those different areas starting out with some of the basics. And then, of course, if you're joining for the statistics and a little bit of explanation around the white paper that's come out in the 27 annual report on the dimensions of data quality, then, of course, we'll get to some of that at the end. And then if you have any specific questions, we'll cover that in the Q&A session. So what are the dimensions of data quality? Probably my guess is most of the people on the call have heard of them before, so let's just revisit that in order to make sure that we're on the same page. The definition that I typically use to communicate is that the conformed dimensions of data quality, so already jumping kind of to the conformed dimensions, but the dimension of the data quality in general also are categories used to characterize data and its fitness for use. Now I understand there's a little bit of nuance to that with some of the authors also applying categories as groups of dimensions, but for the layman, the terminology category is often best used to describe the dimension of data quality. And the application, they can be applied in any industry to assess, major, track and communicate information and data quality. And the real goal of today is to show you some information that should help your information quality initiatives further communicate the quality of the data that individual consumers of the data are expecting, as well as the producers of the data and the way that they present that. So here's a slide with just some of the definitions kind of thrown out there to give you a feel for what we're talking about today. So why do I need the dimensions of data quality? I mean, let's go back to the fundamentals of organizationally, is it something that I should spend my time focusing on? And that kind of depends on what role you're in, both in the business side and the IT side, having the dimensions of data quality offer a lot of different things. So let's go through a few of those. So the act is a quick reference, a checklist, and a guide to quality standards. So when you're implementing projects, you can reference this checklist. Oh, did I cover XYZ? Is the data that I'm asking for out of this new system that I'm developing? Is the way that I'm modeling it going to represent the data in all of these dimensions of data quality? It kind of gives you a checklist too often as humans where we go through things fast and we just forget items. So I frequently travel at least once a month, and I'm always referring to different checklists that I've developed in order to ensure that I have everything. And that's just a good practice relative to data management. So then also they can be used as a framework to segment your data quality efforts. So it doesn't mean that you necessarily focus on one without review and use of all of the others. But honestly, everyone can have their fair opinion on this, I believe. But I espouse using completeness as one of those fundamental dimensions that you start with. Because if you don't have the data there, then there's a lot of other things about the data that are going to be difficult to diagnose and work with. Now, I understand there's different situations for that. I'm sure that we'll have a lot of discussion around that later if we get into it. But that's something to think about. How can you carve out your efforts and prioritize things using the dimensions of data quality as a framework for your efforts? And enable people to communicate current and desired state of the data. So I mean, you can basically look at this in context of things like Lee, Papino, Funk and Wang. So these are initial authors that espoused some of the early dimensions of data quality. And in their book, Journey to Data Quality, they have the comparative approach, which basically allows you to take subjective information, your survey of your information consumers, and join that with the same kind of data analysis that you would do in an objective way. So you may have some computer programs or profiling results that you profile and analyze your data in an objective way, but then also using those same dimensions to reflect on the survey results and joining those together so that you get both sides of the coin. And reuse of existing categories and definitions enables faster implementation times. So we'll talk about it a little bit more, but during the interview of two of the respondents that had responded to the survey and their interviews are available in the white paper, one of the gentlemen said, you know, hey, listen, the conform dimensions lays it all out for you, and it really prevents fistfights. Because at the end of the day, people can get pretty emotional about the way that you define different things. And if that's going to prevent you from moving forward in a timely manner, then why not use something out of the box that has a certain level of rigor to it? And we'll talk about that rigor a little bit later. And then matching the dimensions against a business need and prioritizing which assessments to complete first. So, you know, I really appreciate this from Danette McGilvery. Her book in 2008, you know, kind of lays out the dimensions of data quality according to her methodology. And she puts forward in there the fact that, you know, you can look at how you want to approach your project and or your program, and looking at which dimensions of data quality, where your focus will be, can determine different outcomes. And it really helps you prioritize things with the business in terms of business goals and objectives that you want to achieve. And understanding what you will and will not get from assessing each dimension. If you're new to the dimensions or come from a background that isn't as heavy in data, then communicating your needs and what you expect to get out of each enhancement on an existing legacy system or development sprint in an agile methodology, communicating with the dimensions of data quality facilitates an environment of communication that is what we really need in order to ensure success. So, where are they used? And we'll get into a lot more of the detail of that as we go through this in the next slide. But we use them to define, you know, measures and scorecards, dashboards. And IQ International actually has a seminar in about a week or two on developing dashboards. And you'll find a lot of the dimensions of data quality discussed within that. If you're interested, I would check that out. And then also, you know, almost most importantly, at the end of the day, dimensions of data quality are just about communication. They're about naming scenarios, naming the way that we look at our data, the way that we want our data. And so the way that we have conversations can either be challenging if we're talking over each other, not understanding the meaning or getting confused because of the terminology. So if we have, as I propose, having a conformed set of dimensions, then conversation becomes easier. At least you don't need to necessarily agree on the way you define things, but at least if you understand how each other communicate those and then reference those accordingly, then at least you're one step ahead. And you can also embed these in the instructions or forms and other parts of your application. So if you're explaining to non-technical, non-data-orientated people how to do data entry, clarifying what you mean by completeness and or integrity, validity, these things like that can help facilitate your data quality improvement efforts. And also in terms of validity on forms, obviously the underlying concepts or the sub-concepts that are espoused in the conformed dimensions enable you to develop your requirements and design documents for IT in a more cohesive manner. And then also if you're in a data provision in some sort of organization that sells data, provides data and or within a business intelligence context, obviously we all have a boss. The boss sometimes is our data consumer. And in that context you have service level agreements oftentimes. So consider using the conformed dimensions in order to better communicate and keep your definitions consistent throughout time. And then as we'll talk about in the next slide, you can really integrate the dimensions of data quality into the software development lifecycle. And so I'm going to switch over to that now. So if you think about it from an ideation perspective, ideation being the first step that you do within the project lifecycle. These names may be slightly different than yours depending on how your project management team at your organization has defined the steps. But generally you have some step of ideation where the innovation starts with the customer and the data. And most importantly it's the knowledge of your data and using the quality lens to ensure that success. And new ideas really need data to be executed and to major success. So I live here in Silicon Valley and a lot of startups are dependent on how well they execute. And a lot of times the focus can be on a functional perspective. But if you don't have the quality data then no matter how good your UI is, your data that's driving that can impact the functionality and confuse customers and or drive them away. So there's a high need for data quality and the measure of that. Also profiling data aligns well using dimensions of data quality and allows you to understand the usefulness even at the ideation phase. So I don't know that most people kind of have gone through every phase and analyzed, okay, am I using data at this phase and then am I using information quality principles and methods of measurement at each of these phases. And one of the biggest things that the effective implementation of new projects really requires a leap in balance of connecting data, different dispersed data together in order to develop some sort of knowledge or some sort of product that customers haven't had in the past. And you can only do that when you have high information quality, when you have the quality that's required in order to make that join or that connection. The next phase is conceptualization and initiation. And in this phase, you know, it's about improving the customer service. So it depends on if you're in a slightly slower industry, say for instance, insurance. And there's not a whole, whole lot of innovation. There are some innovative things that are coming online. But in terms of the actual product that you're providing, there isn't as much innovation. Rather, the customer service aspect is of critical importance. And how can you better serve that customer primarily comes down to the way that you treat them and your customer service representatives, but the data that you have about that customer and the transactions of that customer so that you treat them well, you treat them the way that they expect to be treated. And then innovative products, as I mentioned before, in conceptualization, need to connect the dots. So understanding your quality up front, the beginning of the project is so, so important. And then Dimensions of Data Quality offers that to you. So requirements phase, right? So in the new application development, it's really eyes wide open about what data is available. If you don't know what data is available in your system, if you don't have that metadata. So having the metadata is actually part of representation and documentation of your data and it fits into the Dimensions of Data Quality. Maybe you haven't used it in that context before or your organization hasn't adopted that to that context. But as you'll see, it's definitely within the conform dimensions of data quality. And a discussion about why you need that data and why the design of the system needs to be done in a certain way in order to facilitate the collection of that data in that manner, in that high level of quality. So the design level, of course, data models and the level of abstraction, it really helps you accomplish things faster. But to the extent that you abstract things and build your models in a more abstract way, data quality and specifically metadata around those data quality requirements and integrity that either you build into the application or into the database itself are so important. And that can't be really done well without the Dimensions of Data Quality because it's really a communication framework for explaining your requirements. And also the strong focus on error handling inevitably has benefits as well. So then the build phase, if sample data is available, the unit testing outcomes are improved. So one thing to point out here is that, you know, a lot of times developers having developed data that is representative of the real world. So we have data security requirements that allow that the data cannot necessarily be the production data in the development environment or even test environment. But to the extent that you use the Dimensions of Data Quality to characterize the data in your production environment and then replicate something that's representative of that in your testing environments, even including your build phases for your test cases, can be extremely important. And to the extent that your developers, programmers, understand and are able to communicate using Dimensions of Data Quality, your organization can be massively more efficient than other organizations that may ignore that completely. And in the test phase, similar to the build phase, for instance, test cases, I know a lot of different developers or testers who, in developing their test cases, rely heavily on metadata and existing data models and understanding of how the program should work, how the system should work, and other discussions with the customers. To the extent that you understand the customer's needs better with the Dimensions of Data Quality, you can understand how to better write better test cases. So it's really important that you understand these and then also one way of engaging business users is to make sure and use their data with their own Dimensions of Data Quality to overlaid on them within the UAT process, a user acceptance testing process. And then, of course, go live and support. So go live is focused on the customer and improvement in the prior steps relative to the data quality. But this means the faster response to the customers if you're able to understand customer issues in terms of Dimensions of Data Quality. So imagine if all of your issues were understood in terms of data language in your customer service division and that tickets were filed relative to the way that they heard the customer say those things. In other words, they've already kind of translated the issue using the Dimensions of Data Quality. So the question then kind of becomes, well, okay, you've got me sold on the Dimensions of Data Quality, but which one do I use? Because I know there's a lot of them out there and obviously the DMBock is out and has a lot more information on the data quality. And it's a good place to start. But let's look at some of the other things that are available to us in the field. So we can look at Wang and Strong, their 1996 paper, and then obviously the others they contributed in 2002, and then Journal of Data Quality as I cited for 2006. They're all very good beginnings. They're solid, but there is some confusion between timeliness and currency that I think is better articulated in the conformed dimensions, which I espouse. We'll get to that. And then McElvry's 2008 book I think has a very practical, business-focused approach to naming the Dimensions of Data Quality. They aren't probably the most typical dimensions. She extends those for her methodology, and if you're using her methodology, then the use of those makes a lot of sense. And then you look at Larry English 2009 or 1999 and Redmond 1996. They have a strong technical and logical basis. Redmond has really his discussion around views. It's really helpful to conceptualizing the Dimensions of Data Quality if you're getting into it to a deeper level and you want to understand all the different ways that individuals have thought about it. I really like that one. I recommend a read of that if you haven't done so already. And then, of course, if you have manufacturing or other ISO connected applications or constraints, perhaps you do business in Europe requiring some level of ISO, the way that you communicate things in terms of ISO standards. So ISO has the 25012, 2008 Dimensions of Data Quality. So I have a paper coming out soon in the Information Quality Journal discussing the conformed dimensions versus the ISO standard. I think that there are some strong documentation aspects to the ISO standard that are very helpful. It does lack a level of hierarchy that is available in some of the other frameworks and overly is context sensitive, meaning that it relies on the individual customer to apply that dimension in a different way each time, which can kind of make it difficult to make it actually harder to standardize or harder to repeat the same thing in different situations. And so that doesn't quite work for me either when I started looking at the landscape. On the left-hand side here is the title of the academic paper that Brian Blake from Axiom and PhD candidate at UALR. And I have just finished and will be presenting that at the International Conference on Information Quality in October in Little Rock, Arkansas. So if you're headed out there, consider attending the presentation that we have there. The title is on the Evaluation of the Conformed Dimensions of Data Quality in Application to an Existing Information Quality Privacy Trust Research Framework, which is the work that he had previously done and kind of overlaying the work that I've done with the conformed dimensions and his work in the Privacy Trust Research Area. So about 2013, I sat down and started, as I was working on my IQCP, the Information Quality Certified Professional Credential, and reading all the different authors on this and trying to map it out in my notebook. And so I started kind of looking at this and started to diagram it out. And the way that I came up with doing that is to basically create a column here for every dimension, just like the single word dimension. And then identify what is it that all the authors refer to within that dimension. So making little bubbles down on the green section for each of the concepts. And then to the extent where it was mismatched, like for instance, let's take a look there at definition of metadata. So definition and metadata oftentimes is cited as a dimension of data quality. Well, things that they included within that, that's the second to right most column, is clear, easy to understand definition. Does it include the measurement units? Does it include the values consistent with the definition? And then it has complete and available metadata. But one author might include the concept of complete and available metadata within the dimension called metadata. Or another one might say that complete and available metadata is an underlying concept within presentation and identifiability, a different dimension. So at the end of the day, there's a little bit less argument around what are the dimensions of data quality, meaning what are the names of the dimensions and the top row there in the gray. Everyone can kind of get to maybe 6, 12, 15, some sort of number of dimensions. But then the question goes, it comes, well, what do you include within that? And that's really where my work has begun, is identifying all the underlying concepts that different authors have espoused and then trying to reconcile that into some single version of the truth. And to the extent that we're able to, we try to say, well, how many authors think this billing belongs to validity? How many authors believe that this concept belongs in accuracy? And to the extent possible, it's kind of a democratic approach. It's saying, well, there's four authors that believe it's an insight of accuracy. Let's keep it there. And there's only two that do otherwise. So my paper, the series of articles that I wrote in InformationManagement.com titled The Value of Using the Dimensions of Data Quality, really stepped through that in terms of six authors comparing all their different definitions of the underlying concepts and saying, hey, here's a way that we could get to some reasonable conformed. And I say conformed in the business intelligence way where we have a star schema and then we have dimension tables on the outside. And we try to reuse those if they're conformed and reuse those dimensional tables. Then they have some standard and they're reusable because they can be used in different contexts or to answer different business questions. So in the same kind of context, I said, well, hey, the dimensions of data quality need to be conformed. There's just too many things floating around out there. So that's why I proposed this name called the conformed dimensions of data quality. So then in 2015, I said, well, wait a second. You know, this makes sense to me. How come nobody's picking up on your very few people are picking up on this? And so I said, well, maybe there's a survey that I can put together to kind of understand how many people would be really interested in a standard and how would that work? And the reasons to agree upon the standard, what would be the reasons? What would be my elevator pitch for someone if I saw them just for a short period of time? And really, to me, the primary thing comes right down to communication. It provides language to communicate your data quality requirements. At the end of the day, it's all about how I'm talking to my coworkers about what I'm trying to achieve. And recently I saw this article around how naming is the hardest thing that you do. And you probably don't think naming is really, really important. But I mean, I ever tried to do it. So I had my son, my wife, and I had our son five years ago. He just had his fifth birthday. It was really hard naming my son. I mean, there were so many really, really good names. And then when you get to computer programming and reusable code, you want names that a spouse, engender, the kind of idea that you're able to conceptualize what that program does, what that module of code does, what does that entity, that table, what does that column include, and how should I use that? It all comes down to naming and the definitions. So to me, communication is the number one thing. We have many, many different definitions for the dimensions of data quality. Then inevitably we're going to not be communicating in an efficient manner. So efficiency also is one of the keys here relative to implementing things faster. And as we now have the agile kind of methodology rather than waterfall, how fast can we do things? Now we could say we're going to put the data out there in the Sprint 1. And then Sprint 2, we're going to start looking at the quality of it. You know, that's fine. You're going to decide when you bring the quality lens to it. But once you do, wouldn't you want to be on the same page? Wouldn't you want to have the same definitions of that quality across your team? And with using the conform dimensions, that's what you achieve. It discourages repetitive philosophical arguments. As I mentioned before, one of the interviews respondents of the survey that is interviewed and has a section in the white paper on it. He says it just eliminates the fist fights. So everyone's going to have their take. Maybe you slightly disagree. Even you'll disagree with the standard. But at least we can join at some focal point, some hub of understanding. And there's been a lot of work by myself and now Brian and others to vet out the conform dimensions of data quality. And one thing also I might say is that the living standard, it's an open standard available for free for other organizations to use. But myself and continuously I steward it and others contribute over time is still alive. We're updating it and fixing it. We have release versions available on the dimensions of dataquality.com. And there isn't any other set of dimensions of data quality that I know of that is comprehensive and alive, literally improving as we move along like that. I mean, all of these other published works are static. They're finite. They do not move. And that just means that they're very, very good maybe, but maybe not to the extent that we have within the conform dimensions and they won't ever improve because they're not moving. They're not alive. And having said that, the measurement, I mean, if it isn't measured, it can't really be managed. We always say that in data management. But having consistency between organizations really enables comparison. So you can benchmark within your organization. Even when you have turnover within an organization, if you stick to the standard, stick to the naming and make sure that it's communicated within your culture, that the conform dimensions are the way that you use it, then, of course, you're going to be ensuring a level of consistency that is desirable. And then providing a framework to define more detailed measurements and associated sub-concepts. So a lot of people, as far as dimensions of data quality, and they have some free-form paragraph about that dimension, but they don't really define it. That's one of the biggest challenges I found with trying to normalize all of the author's content was they often described them in a textbook, but they didn't really provide definitions that were concise. And they didn't articulate each of the sub-concepts, which really kind of lead you to metrics or individual measures that you may define within your organization to track that. And that's really where the rubber meets the road. And also, lastly, the teaching. It provides a solid framework for teaching. I can't tell you within the e-learning efforts that I've done within DQ Matters, having a real structure around what are we teaching and having that basis of the dimensions of data quality offers you a framework to begin and explain different concepts, especially individuals that are new to data or laymen who know good data quality when they see it, but they don't exactly know how to explain it. And so offering them the dimensions of data quality with examples, which is why this year in 2017, I started the blog, the Conforming Dimensions Data Quality blog on the website. In order to just do that, we've published about seven different articles over time, which talk about each of the different aspects of the conformed dimensions. So in our last survey here in 2017, if the question was asked if an industry standard set of dimensions of data quality was available, how interested would you be in using that at your organization? And it was pretty revealing that, again, most people do indeed want to have some standardization. So over 50% are very interested and these levels have stayed relatively consistent over time, as well as then somewhat interested another 30%. So right there you have around 80% of individuals saying, yes, we really want some sort of standard. So really quick dive into the website dimensions of dataquality.com. Where do you find things? I just wanted to throw this out there, the conform standard menu on the top left. If you look underneath there, you see about the standard. It tells you more about the effort that's gone that we've done so far. List of the conformed dimensions, and that really is at the summary level, meaning what you would see in a textbook and definitions for each of those dimensions. And then the detail level is really at the underlying concept level, which is that each of those smaller components that goes into that dimension. And the blog, of course, is under the news and blog menu item. And then you can see actually the screenshot itself is from the blog page, and the blog archive lists those by each of the months. Usually I do one, maybe two a month, but that's the nature of it so far. And so then Shannon has the white paper and she'll be distributing after the webinar for those attendees. It's been developed and completed for 2015, 2016, and 2017, so the prior years you can get those from the website. So let's take a look then at the actual survey results. So how was the survey conducted? It was a web-based survey over a one month period of time, so each of the years it's been conducted typically in the month of April to coincide with Enterprise Data World. And oftentimes I present, actually only one time of the three so far, I've presented on this topic at Dataversity Enterprise Data World, and it turned out really well, got a lot of individuals participating in the survey that way. It was also advertised on LinkedIn, Twitter, the conformed dimensions website, and through referral and prior year sign up. So last year in 2016, we started collecting individuals' names and email addresses in order to offer them the ability to answer the survey year over year so that we can start tracking some of the changes within their organization if there are any, and also ensure a stronger repeatability of the survey year after year in order to save time and effort and marketing to people to try to get them to take the survey. So I think when you read through the white paper, you'll see a lot of valuable information, and it's just amazing how many different things you can pull out of the survey with the data and the questions that we ask. And so I really highly encourage you to take the survey next year. There's links inside of the survey, inside of the white paper itself to sign up to be reminded about the survey next year. So please join that opt-in to get that survey next year. Now, I do have to say that as we go through the survey results, there's some response bias, we say, given that the response bias would be that the people that are responding to the survey behave in some way different or more or less than general individuals, general organizations in the broader context. So my assumption is that there is some level of this. We haven't quantified this specifically, but to the extent that you're aware of the dimension of the data quality concept and maybe even believe that there's some value in it, you're more likely to take the survey. So when the survey says that X number of organizations use the dimension of the data quality, it's more than likely that the people that are taking the survey have used them in the past and know their value, and therefore they're going to be answering that they use them. So the numbers you might expect to be biased a little bit high in terms of their association and comfortability using the dimensions, and therefore the answers bias a little bit in that way. Now, how could we resolve that in the future? Obviously collecting more data would be the number one way to do that, collecting more data across a broader landscape of individuals in business and IT that use data. So data scientists maybe not necessarily the ones that are collecting the data through IT systems, but rather data scientists on the consumer and do they understand the dimensions of the data quality and use those to communicate to IT departments and or business customers about the levels of quality that they need and then could even be answering the survey. So the goal is really to gain more respondents in order to eliminate some of the response bias that exists. So diving into some of the results and how often, so one of the questions that we asked was how often does your organization classify data-related defects using dimensions of data quality? And so the percentage of respondents for the 2017 was on an ongoing basis, 40% of them. So I mean, again, given that bias, say that the number is a little bit inflated because the people that are answering already do this. That's right, they were attracted to the title of the survey. But wow, 40%, that's pretty good. But then again, on the other hand, why is it that 8% only used it once, 29% or considered it but haven't used it and 15% of endeavor to their knowledge used it and that 8% really didn't even know about it or they're totally unsure whether or not it was used in their organization. So if our goal is to maximize information quality within the community, then we should look at this and say, hey, wait, if dimensions of data quality is valuable, yes, we think they're valuable. Whether or not we use the conformed dimensions, let's just say dimensions in general. But the dimensions of data quality are valuable. Why wouldn't the pie chart say 100%? So everyone's not using them. Something's going wrong here. So my guess is that everyone's describing and communicating data quality issues. They just aren't using what they consider to be the dimensions of data quality. They haven't uncovered the resources that are available to them. You could do even better to use a standard like the conformed dimensions of data quality. But this is just kind of the survey's take on the survey communicating what level of improvement we still have to go. So we have a long ways to go. 60% of this, I mean, eventually we would hope that 100% of organizations are using the dimensions of data quality in some way. So does your organization have a method of categorizing data quality issues using the characteristics of the data and its fitness for use like the dimensions of data quality? So how do they frame this within their organizations and how well-governed is it? The question then becomes, yes, there's one, and it's quite actually amazing that in many organizations, nearly 30% of the respondents, so again, the question is, can you extrapolate on this and say that within the industry as a whole, 30% of companies are using dimensions of data quality, and there's only one, like standard within their company, right? So it can be done in a smaller organization, really hard to do in a larger organization, but is it really 30%? So probably not, I mean, given the survey bias a little bit with the response bias, but wow, there are companies that are doing this, and they have better communication among themselves because they're doing this. So if your organization isn't doing it, then you really want to do that to step up. And then, yes, there's one, but it isn't well-defined, 30%, again, right? And then 20%, there's various methods across the landscape. And if that's where you have to start, that's fine. It is hard, as you know, in data governance, to change anything once you've started it. So if you're thinking of picking it up from scratch, maybe consider the conform dimensions. Or if you're going to use something else, at least use it consistently. So drumroll, which one is the most used? Well, probably kind of figure that out, and it's pretty logical that most people, at the end of the day, probably due to some of the confusion around what aspects fit within accuracy, they listed accuracy as the most important or the most used. And this is bounced around a little bit over the years of doing the survey, but we see accuracy right up at the top, and typically completeness in the top two or three in terms of usage. It's a lot of fun to kind of look at this for the first time, but if you're like me and you've seen it a lot of times, it kind of gets old after a while. The question kind of becomes, well, how have these jumped or changed over time? And is it really representative? Is my sample size with the survey at 48 respondents, is that really large enough to say, yeah, there's something going on when something jumps? So I like to do this. I mean, you need to be careful with inferring too much around this. And some of the summary here on the left-hand side posits some of my theories on that. I'd be really interested to hear from you guys as well. So shoot me an email and give me your theory on this later on if you have time. But accuracy was reported to be the most used dimension in 2015, and back now in 2017. It wasn't in 2016 as you see there. It really confirms our earlier observation that these two dimensions are really at the heart of most organizations' data quality efforts between completeness and accuracy. And accessibility jumped all the way from tenth to seventh. And we don't obviously have a total picture, we don't have the full picture of this, but I kind of loosely associate this with a larger focus on data lakes and having more data available in one place. It seems that the big focus right now is around big data, data mining, sorry, algorithm design, AI, deep learning, and so forth that often requires access to a lot of data in not only in volume, but in veracity and variety. So these different sort of dimensions of big data as we call them. And where accessibility is really the key to understanding that. So that's my take on why 2017 has such a jump in accessibility, but I'm interested to hear what you guys have to say about that. And consistency clients back to the third position from the fifth position. In a lot of what I do for clients and day-to-day basis is setting up controls that balance between different systems, whether it be from a more accounting perspective and a top-to-bottom or a cross-system balancing where you're ensuring that the data from two different systems reconciles and so forth. That's really fundamental to the consistency dimension. So yeah, that's interesting. There's a comment here on security and the dimensions of data quality. A lot of times, so the way where I put security is not in it as its own dimension based on my research, rather as part of accessibility, the security components come into that. And we discussed that a little bit in this paper, but the actual, though, the white paper with Brian Blake that's going to be coming out in ICIQ has a lot more of that discussion. So shoot me an email if you want to discuss that more offline. And then, Tennessee, please choose in which industry organization is categorized. One of the kind of the problems or concerns that I had and is brought out in the white paper is such a lack of use of the dimensions of data quality in certain industries. So not only do we need to get 60% more organizations using the dimensions, but specifically certain tiers are just not represented at all in the survey. And again, that could be due to the response size. But what I did is I said, well, rather than tell you all the different industries that were in the survey, which you see there on the right-hand side, it's just too many pie pieces. So I said, well, let's do this. Tier one includes the finance banking and accounting, and I split those out in case you're in one of those categories and you want to understand how your peers are doing relative to yourself. 21%, 10%, 11%, all make up tier one. So these are the guys that use them, and they're pretty using them pretty decently. It's a highly competitive market environment and very much a for-profit environment. And then the tier two is industries that only make up 6% of the responses in the survey. And that tier two is the government, state government, retail, manufacturing, software development. You can see those actual industries listed out by tier below. So if you're in one of those industries and you're willing to talk to me about it, I'd really like to understand what are the things that are inhibiting your organization from implementing the dimensions of data quality at your organization. And it might be part of a broader data quality or data management as a domain in total is hard to get by in from our leaders in the education industry. You know, whatever that is, you know, that would be interesting for me to understand and be able to include that in the report next year and then handle that bias in the survey by soliciting more responses from organizations in those industries so that we have a better representative sample. So with that, I'm done and want to leave time for questions. It's my professional profile. Follow me on Twitter and LinkedIn. You know, connect with me so that you can get the latest and greatest. I usually present two or three times a year at various conferences and have other professional events and what forth that I participate in. So I'd love to get connected. And Shannon, I'll turn it back over to you to facilitate some of the answers and some of these questions. Dan, thank you so much for this great presentation. And just to answer some of the most popular questions that come in, I will be sending a follow-up email by end of day Thursday to all registrants with links to the slides, links to the recording, and anything else requested. And one of the first things requested is where can we find your blog? So if you send me that link as well, Dan, I'll make sure and get that in the follow-up email for this for our attendees here. So how many, regarding the survey, how many incomplete survey was there? How many incomplete? Incomplete. So the problem with the web surveys is that there will be like thousands of people that want to know what the survey is, and so they'll call me, maybe not thousands, but there'll be a lot of people that click through to just get the introduction to the survey and then drop it having not taken anything. Most people are that way. There are very few that actually started it and didn't finish it. I don't remember off the top of my head how many of those were, but there were some that had to be disregarded. For the general rule, I think there were like maybe two responses that were not completely finished. It was like the last two questions. So I was able to salvage those two, but it was a very minor number. Sure. That makes sense. I've certainly experienced that with ours. So how is accuracy measured? How is accuracy measured? So for that, why don't I go to the end of the presentation and I threw in the conformal dimensions. And at the dimension level, accuracy measures the degree to which data factually represents its associated real-world object, event, concept, or alternatively matches the agreed-upon sources. So this is the definition at the dimension level. And then you have things called underlying concepts called agree with real-world and match to agreed source. There's only two underlying things within that. And then if you go to the next slide, you get a little bit better picture when you look at the underlying concepts, the definitions. So all of this is included in the presentation for reference, but please don't necessarily use this. Use the website because that's where we keep everything up to date. And accuracy, so you see the two sub-concepts, agree with a real-world, which says degree that data factually represents its associated real-world object, event, or concept. That's the most used definition. But of course, a lot of times when there are things where, say, I buy something on Amazon and there's an event tracked, that event is the system of record for that. I can't go back in time to review that event. The only mechanism that exists to record that event is within Amazon's operational systems. And so that's matched to agreed source, which is a major of agreement between the data and the source of that data. This is used when the data represent intangible objects or transactions that can't be observed visually. So, you know, while you're there too, the question came in, you know, what is the difference between timeliness and currency? So you've definitely displayed it in the previous slide in this slide. Anything you want to add to that? So I think that's where weighing is strong and the original work in the dimensions of data quality had a little bit of confusion. And I think the conform dimensions clears up. To me, the way that it works and then based on, you know, most of the authors that I read, a clean way to separate that is that timeliness is the expectation of when it gets to me and how well do I have access. It's more of an availability issue. Is it available in the time manner that I need to get it? Versus currency is, you know, how well that reflects the real world. So even though I get a report daily, that information, even though I get that information daily, it might have been recorded weeks before. So maybe I have a biologist collecting data in Russia and they have to, you know, ship it on some, literally on a ship or some boat or there needs to be some analysis done to it. So the data is no longer current and the currency really measures how it is to the real world. Timeliness is how quickly I get that information. It's confusing. Well, you know, another good topic that we've got going on here, too, with that is, you know, what are your thoughts on security as a dimension of data quality? I've seen it here and there on lists, but not often. However, security is often core to determining if data is fit for purpose. Yeah. So that's really where I... So here's the problem within information quality. We have this domain called information quality and we try not to look too far abreast and other things like Infosec communities and how they define things. And that's why working with Brian on that paper that's coming out in October has been really fun because he's taking kind of a different approach to it and he's able to find the controls and descriptions of the things that he... the way he looks at it from that Infosec perspective under access control because security is really measuring what level of access someone has to that information. And, you know, obviously this probably can be expanded a little bit more in terms of refine. The question is how many tiers do you want in the conformed dimensions? Do you want to have, like, a dimension level and underlying concept level and a sub-concept level of detail? And so the real goal of the conformed dimensions is that we keep it as simple as absolutely possible but not ruin the veracity and the real conceptual strength of it at the same time. But I'm completely open to exploring ways to enhance this, especially if there's enough academic rigor to support, you know, adding another underlying concept and articulating the existing concept in a better way. I also have a paper coming out in IQ International that discusses the ISO dimensions of data quality compared to the conformed dimensions of data quality. And there's some security-slash-system-specific ramifications there. So if you're interested, subscribe to the blog. My best solution and my best recommendation is everyone who's subscribed to the blog, so that when I publish these papers, you'll get them and or you'll get the summary in the blog so that you can stay up to date. Awesome. Love it. So are there differences in dimensions for unstructured versus structured and any suggestions on standards for metadata for unstructured data? Yeah. So actually, Batini and Scana Capito, I'm not saying that second author's name very well, but Batini, B-A-T-I-N-I, is... So look at their book. So it's Carlo Batini, 2016. So their 2016 book has a decent coverage of the dimensions of data quality with respect to unstructured data like maps and web documents. My goal is to stabilize the conformed dimensions on mostly relational constructs first and then move into those areas. My wife works in geographic information systems and I come from a little bit of that background as well. So I'm really... I'm already salivating over spatial data quality concepts that I want to get included eventually, but there's only so many hours in a day, especially when you have kids. True. So could you comment on the dimensions in relation to the information steward application? Information steward application. I'm sorry, I'm not sure I understand the context of that, the meaning. So maybe we can get a clarification from the questioner. So while we wait for that, let me move on to the next one because I'm not sure. Do you apply data quality dimensions by report or at the source? Both. So I think the question will be on the report and really Olga Medanczyk, who's presenting, who's her... As far as I understand, her husband's also wrote the book Medanczyk in 2007. I think it was on the data quality assessment. Both of them are really great at data quality and so I encourage you to attend her IQ international webinar in about a week and a half. But where do you apply that? She's going to talk about dashboards and reporting through dimensions of data quality. The problem with dimensions of data quality only in a report is the business sometimes doesn't understand them to the extent that you use them in your report. Who's your report audience? They need to understand what they're reading. You need to define those in the context of your customer then that's more appropriate. Having said that, there will always be somebody in their organization that needs to be looking at it from a conceptual dimensions of data quality level, which a set of standard reports along those is really important. So another kind of exciting thing that I can't promise anything on, but I'm starting to work with some vendors to see how they can include the conformed dimensions and their tools so that as we're comparing data quality tools such as a data profiler, we have more apples to apples comparison of each of the vendors and each of their vendors functionality. Can they perform XYZ underlying concept of the dimensions of data quality and so forth and so on. And I've had actually a generally positive response. So if you're a vendor on the phone, let me know if that's interest to you. I'd love to explore some of those options. Love it. Lots of questions coming in. So where can I find definitions for the data quality dimensions listed on slide 15? And if you maybe again send that to me, I can get that out in the follow-up. Yeah, 15. Well, so yeah. So the survey is all based on the conformed dimension of data quality. So in the white paper, in the appendix, it lists out all of those dimensions of data quality and the definitions that were used in the survey. But that's stagnant. It's fixed. It's finite. And it isn't up to date. So you want to use the dimensions of data quality website. So right here, that QR code, if you just scan the screen, scan that QR code or go to the site, dimensionsofdataquality.com, then that's where the definitions are. That's what this page is explaining where to go to get those definitions. I love it. I think that's the first time we've had a QR code presented. That's awesome. Well, usually the paper base is better, but I figure if somebody's going to scan this screen, so it might as well. Just one question. In your inquiry asked about a unique data quality index. How can we build one with all dimensions normalized? I think I would need to unpack that. There's a lot of things in that that I need to understand what the person asking is, what's the context to asking it. I mean, you could do, so I have actually some clients that are implementing the conformed dimensions in their organization and they are developing this survey at their organization just for them to kind of do that baseline. And yeah, I think that's what they mean when they say index is basically reusing this survey. And I don't mind at all giving out the language to this survey for other people to implement within their organization. There's nothing proprietary about asking, using the language of the conformed dimensions. So just ping me and I'll help you get what you need. Well, Dan, that does bring us right to the top of the hour. Thank you so much for this great presentation and education and thanks to our attendees and community for being so engaged in everything we do and all these great questions that I've come in throughout. Again, just a reminder, I also have a follow-up email by end of day Thursday to all registrants with links to the slides, the recording and all the additional things that have been requested throughout the Q&A here. So I hope everyone has a good day and Dan, again, thank you so much for this presentation. We just really love it. Thank you, Shannon. Have a good day. You too.