 Hello and welcome everyone. My name is Eric Fransen, stepping in today for Shannon Kemp, who is on vacation this week. We would like to thank you for joining us for this month's installment of the Monthly DEMA International Webinar Series. The Webinar Series is designed to give our Enterprise Data World Conference attendees educational opportunities year-round. We are excited about the upcoming Enterprise Data World 2016 event, which will be held in San Diego, California, April 17th through 22nd. Registration is open for that event, so be sure to check out the details at EnterpriseDataWorld.com. Today's Webinar will be presented by DEMA Publications Director, Laura Sebastian Coleman, and she will be discussing big and little data quality. A couple of points to get us started. Due to the very large number of people that attend these sessions, you will be muted during the Webinar. If you would like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right-hand corner of the screen to enable that feature. For questions, we will be collecting them via the Q&A section, which is different than chat. That you will find in the bottom right-hand corner of your screen, or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DAMA. As always, we will send a follow-up email within two business days containing links to these slides, the recording of this session, and any additional information that may come up during the Webinar. And now a few words about our speaker. Laura Sebastian Coleman is data quality and data standard center of excellence lead at Cigna. She has worked on data quality in large healthcare analytic data warehouses since 2003. She has implemented data quality metrics and reporting, launched and facilitated data quality working groups, contributed to data consumer training programs, and led efforts to establish data standards and to manage metadata for large analytic data warehouses. In 2009, she led a group of analysts at Optum in developing the original data quality assessment framework, which is the basis for her book Measuring Data Quality for Ongoing Improvement. That was a Morgan Kaufman publication. An active professional, Laura, is Dama Publications Director. In 2015, she received IAIDQ's Distinguished Member Award for her contributions to the International Association for Information and Data Quality. And with that, I will turn things over to Laura to start the presentation. Laura, hello, and welcome. Thank you, Eric. I appreciate the introduction. And thanks to everyone who's attending. I know that all of us are busy, and so making a decision to take an hour out of your day to attend a webinar like this is not always easy. I appreciate your being here. I want to just add to the notes that Eric shared in the introduction that a couple of influences on how I think about data quality, and these will come into play in the presentation. So my thinking on data quality has been influenced very strongly by the basic challenge of how to measure the quality of data. And the concept of measurement itself has led me to think about data in a different way from how I used to think about it. Problems of measurement are microcosms of problems of data definition and collection. So the other thing that's influenced me has been the demands of data warehousing and particularly the challenges of data integration, where we have data defined in diverse ways and we need to bring it together so that data consumers can understand it in a similar way. I think as we move into the world of big data, the challenges are changing, but they are evolving in ways that we can, in some cases, anticipate and plan for, particularly when we talk about integration. So with that, I'd like to get started on the content. And hang on one moment. Here we go. So I want to review the problem that we're going to be talking about today. We know that we live in a world of intense technical innovation, and we can capture a tremendous amount of data in an incredibly short time with great speed through instruments that will tell us something about the content of that data. More importantly, we really want to use that data. We want to get value from the data. However, we still face some of the basic challenges that have been around since people recognized the impact of processes on the creation of high or poor quality data. So we have not mastered those fundamental changes. Now, within healthcare, despite the fact that we have technological advances, much of the data that's collected is still collected through person-to-person contact. And when we talk about this data, it's not big data at all. It's little data. So I'm going to be walking you guys through ways to think about data in old and new ways, and I hope that this will improve your approach to how to improve the quality of data in your organizations, regardless of whether you're dealing with big data quality or little data quality. So first I'm going to walk through some of the challenges within healthcare, so what we really want to get out of big data and what obstacles are in the way to getting there. Then I want to step back from that for a moment with that context in mind and talk about the concepts of data and data quality to shed light on how to solve these problems. And then I'm going to make a couple of suggestions of things that I think can happen within healthcare to make good on the promises that big data implies. So when we talk about data in healthcare, as in other areas, we are developing ways of both collecting and using data that are brand new and that change virtually on a daily basis. One of the big developments is that there are tons of applications, mobile apps and the like, that are aimed at helping people manage their health. And these applications make big promises. They promise better decision-making, higher levels of engagement, better levels of compliance. They offer ways to actually change how we live, how we manage our lifestyle choices, and therefore they imply that they can improve our health. They can prepare us to manage illnesses and help us manage those illnesses. And then for the system overall, they also promise that we can do things like manage demand. So if you're headed for an emergency room that is already full, you might be redirected to an emergency room a few miles away that could help where you would get care faster. In making these promises, there's the implication that we can actually transform healthcare. And many of the articles on this subject imply that the train is already rolling along and that we must get on it. So Pricewaterhouse Coopers has published a very interesting white paper on healthcare delivery in the future. And they talk about the fundamental imperative to have digitally enabled care. This imperative and the availability of data, the ability to collect it, have created major changes in how healthcare is being delivered and how it can be delivered. And this sounds almost like science fiction when writers say that digital technology bridges time, distance, and the expectation gap between consumers and clinicians. So that's a pretty big promise. However, the reality of healthcare is that oftentimes the system is not working the way that we want it to and there are big obstacles in the way. So about six months ago there was an insurance summit in Hartford and healthcare executives were trying to understand why innovation in healthcare leads to higher costs. So in other areas we have technological innovation and that becomes a means of saving costs, but in healthcare it often leads to higher costs. One of the executives estimated that about 30% of healthcare costs were due to inefficiencies. So we think about technology as a means of gaining efficiencies, but if we are spending 30% of our money in an inefficient way, then we are not realizing the benefits associated with innovation. So another one of them lamented that many of the applications and drivers are encouraging people to take better care of their health, but it's easy to say that people should take care of their health, but it is not easy to actually make them take care of their health. So healthcare has the possibility of being transformed by bigger data and technological innovation through healthcare analytics, but there are a lot of obstacles in the way. So the promise is based on the assumption that high quality data is available and that it is truly high quality, that it does represent the interactions between providers and patients in a meaningful way. What are some of the technological limitations on the healthcare system right now? One is legacy systems. So most of the large healthcare companies have been around for decades, and they work based on older systems. These systems can differ widely within organizations, and they certainly differ between organizations. A second driver is that the medical profession, providers, your doctors and such are still not very technologically enabled. So when we look at how we're trying to transition away from paper, we realize that this has been a slow process. Much of healthcare is still paper driven. And we look at in 2008, which was only a few years back, only 4% of providers had electronic health records systems, had a full electronic health record, and only 13% had a basic system. So when we think about that, we realize it's a slow transition to actually get the providers out in the field to be doing things in an electronic way. I was at the doctor just last week, and I found myself filling out paper forms that were exactly the same paper forms that I'd filled out the year before. So other factors that have influenced the challenges with healthcare data quality is a lack of standards. Now, there are plenty of standard codes. There are plenty of directives for standardization, but people have applied these inconsistently or they haven't had enough information to apply them in a way that improves the overall quality of the data. And there haven't been any standards actually directed at data quality itself. There are standards for other aspects of healthcare. I want to give an example of this and talk about a group called Academy Health. So about a year and a half ago, I was involved with a meeting run by Academy Health, who is a group of academics who are looking to improve information around the quality of data that is gathered in nonclinical settings. So what they want to do is understand if they are using this data to understand a phenomenon within the healthcare system or to understand the details of a disease, how much confidence should they have in the data. And they recognize that as more data is collected and is made available through research data warehouses, they could potentially have a rich source of data that they could use to do analyses that do not necessarily require clinical trials. So clinical trials are expensive and they want to understand how to get at some questions through other means. So because academics and researchers are using these sources of data, most of which come from large commercial payers in the healthcare system, they're trying to develop standards where they can numerically express the quality of data. What they found is that there are a number of factors that can cause a misrepresentation of certain clinical events. And these include systems that are inflexible in their design, so you cannot capture the events consistently. Coding practices within hospitals and other provider settings where people would make different choices about how to represent those events and then gaps in the standards themselves. So they gave an example of screaming for high blood pressure in children. Clinicians were directed to screaming for high blood pressure, so that meant they actually were taking the blood pressure of children and trying to understand whether there was any risk of the children getting high blood pressure. And they were asked after the initial screaming to continue to monitor the children. After this data had been collected for a period of time and researchers looked at it, what they found was it appeared that children were increasingly at risk for high blood pressure and they were surprised by this finding. But when they actually went back and did a records review, they showed that it wasn't that children were actually at higher risk, but that there was a misuse of the hypertension high blood pressure diagnosis code. Because there was not a diagnosis code for considering hypertension, the doctors were unable to record what they were actually doing and the data appeared to show that something different was happening from what was actually taking place. So this was just one example of the kinds of gaps that can evolve when we do not have the standards or when the choices in how to collect data are influenced by factors that may lead to confusion. So we'll think about healthcare. Healthcare is about taking care of people. Those people who are in healthcare are driven by their concern for other people. So we have a lot of room for interpretation. And those interpretive factors may influence how symptoms are read and how we represent them. And as those choices are made, we might reach different conclusions about what story the data is actually telling us. So I'm going to go back. I'm going to step back from having set up the problem and talk about characteristics of data that we might be able to use to improve the situation. So most of us assume we know what data is, but it's always good to start with definitions because when we explore them, we can see other facets of the topics that we're thinking about. So when we go back to the beginning, data has a Latin root. It is the past participle of to give. And if you guys are in math or engineering, you recognize that data is equivalent to your givens. In modern parlance, we talk about data as facts and statistics. And we think of them as used for analysis or used for reference. So data has a level of objectivity to it. Of all these definitions, my preferred one is the ITO definition because it also acknowledges the level to which we rely on information technology and systems when we start to think about how data is created. So ITO defines data as, quote, reinterpretable representation of information in a formalized manner, suitable for communication, interpretation, or processing. I realize that's a mouthful, but it is really worth thinking about each word. So data needs to be interpreted. It is representing something. It has a formal structure. And the reason for it having a formal structure is so that it can be used to communicate, so people can interpret it in a consistent manner, and also so it can be processed. It can move from system to system. When we think about data conceptually, we understand that it's trying to tell the truth about the world, but we also should understand that in order to get to that truth, we have to make a set of decisions about creating that data. So we choose what characteristics to represent. We choose the form in which we will represent them, and we need to be very clear in what those choices are so that other people can understand the data itself and what it represents. We also know that the driver of collecting all this data is so that people can use it, and especially in this day and age, as people realize how reliant we are on data itself to run our businesses and to learn more about our world. So the uses of data also imply a set of expectations about what conditions the data should be in and therefore about its quality. We also should keep in mind that in today's world, we call practically everything data, and we do not create data with usage in mind, and that becomes a problem. So if we think about the history of measurement and the historical context of data, we realize that our ideas about data are driven by two different places where data is created and used. So in science, we create data, and in commerce, we create data. So we have different goals and different approaches. So we think about science. Science is focusing largely on measurement and observation as a means to create knowledge and advance knowledge. When you're a scientist, you actually plan for your data, and you test that data to see that it's accurate and complete and properly calibrated and that your results are reproducible. That's the foundation of science. So it's always testing and retesting. In science, data is truly a product, a product of the process, so the process needs to ensure the quality of the data. Otherwise, you lose credibility. Commerce, on the other hand, creates measurements to achieve goals. So if you are the baker in town and you want to sell your bread at a fair price and make a profit, you need to know how big a loaf you're going to sell at a fair price and you need to know what your competitor is going to sell for. So when commerce is using measurement and creating data, it's less planned and more accidental. The goal is to sell and make a profit, and the data supports that goal. So in that sense, data is a byproduct of commerce. Historically, the quality of data has not been an end in itself when we think about data in commerce, whereas the opposite is true in science. So if you try to visualize the differences, we realize that a scientific approach is both more fluid in terms of what it's going to look at, but then ultimately more strict once it has reached its conclusion, whereas commerce assumes that data is fixed and it doesn't actually want to do the work to make it fixed. So most of us are in organizations that take much more of a commercial view of data than a scientific view of data. That said, our ideas about data are usually based on our scientific notion that we are somehow collecting facts about the real world and that in doing that, we have a connection to reality that should be stable and should be reliable in which we should be confident. So we get frustrated when data is of low quality because we assume it should be of better quality. When we think about how we create data, there's this cyclical process. We create data because we're observing the world in one way or another and we make choices to capture those observations and represent the world. That results in data, a condensed form of our knowledge about the world. We then want to use that data and we assume that it should be fixed for our purposes of use. Why do we assume that? Because if the data represents the world, if we know that we're taking measurements of the mileage between cities or the temperature of the water, we assume that our process for representing those things is defective. And in many cases, it is because we can all agree on what a temperature is or what a mile is. The challenge is coming if we're missing some of the pieces. So if we know what the temperature is but we don't know what scale it's in, then we have a data quality problem. If we know the distance, if we have a number associated with the distance, but it's not a unit of measure and we misunderstand what unit of measure is used, then we have a data quality problem. We can also make bad choices about representation or the knowledge of those choices might be lost. In all those cases, we run into a conflict between what we expect and what we actually have. So this problem becomes more complicated when we start to talk about big data. And even in science, people are struggling with inconsistencies in data, unexpected inconsistencies. So about a year ago, Science News reported the challenges that researchers, scientific researchers were having with using big data. And they reported, among other things, that just tracking the data is challenging. And then sharing the data is challenging because oftentimes people don't know the origin of the data or the ways in which the data was collected. Now, in science, this is fundamental to the process because if you do not know where your data came from then you cannot replicate your own, you cannot replicate your results. One of the things that I found most interesting in this article was it recognized that even the way we look at data will influence the ability to replicate results. So the scientists reported that the tools that are used to analyze complex data sets are as important as the data themselves. And choosing one computer program over another may in fact end up bringing about different results and different conclusions, which is even more to the point. So the inability to reproduce results has been a challenge in science. And scientists have begun studying the factors that influence reproducibility. And they talked about what they call the butterfly effect, which if any of you are familiar with this concept, it basically says that you can have very small changes in your beginning conditions, and those changes may have very large influences on the resulting conclusions. So for example, the story that's always told to illustrate the butterfly effect and why it's called the butterfly effect is that if a butterfly flaps its wings in Brazil on a Tuesday, this could cause a hurricane in France on a Wednesday. So it's that idea. Small changes can have big effects. So researchers have been looking at the doctor-patient relationship and tried to understand how this butterfly effect works on that relationship. So earlier I said any doctor-patient interaction is a series of interpretations and recording those interpretations. They looked at 12 different factors that can influence clinical decision-making. They're listed on this slide. I won't go through them in detail, but you can see that they fall into these four categories of decision features, situational factors, characteristics of the decision-maker, and individual differences. So within each of these, you can have different initial conditions. When they looked across these, they saw that there were over 20,000 combinations that could represent the initial conditions of an experiment or an interaction. So that's a huge number. That's a huge number. And if we think about that, then the problem of accuracy becomes totally different because when we're creating data within the healthcare system, there are essentially thousands of opportunities to influence the process of decision-making and how to treat the patient. And if we have that combined with older technology that doesn't adapt to new protocols or conditions, or for which we don't actually have clear standards or those standards are applied inconsistently, then there's a lot of room for variation within the system. And when we have variation within the system, we have different levels of quality and different levels of reliability for the data. So how do we move beyond this? I'd like to propose three different facets of an approach. And I'll talk in depth on each of these. But the first is recognizing that there is variation and that it may be possible to use this variation to improve the system. The second is to think about how do we reduce variation within the system, variation that causes noise or actually interferes with our ability to act in our own best interest. And then finally, I'll come back to one of the fundamentals of data quality, and that is to recognize data quality as a product of the system and take steps to plan for higher quality data throughout the system. So the first one, recognizing variation. So usually when we talk about variation within the system, our urge is to say, oh, we must get rid of the variation in order for the system to run smoothly. However, based on what we look at with respect to how clinicians look at patients, some of that variation may actually be good. If we have two doctors and each of them brings a different perspective to a problem, then they can learn from each other, and they may together be able to come up with a better solution than either one of them could come up with individually. So variation may have meaning. It may add to our ability to understand the situation. And if it does, and we can gain knowledge through those differences, then that becomes a mechanism by which to provide feedback to the system and therefore thereby improve the system. When we think about why we want to take advantage of healthcare analytics and evidence-based medicine, this is exactly what we want to do. We can see if we look at large numbers of claims, medical claims related to the same diseases or conditions, and we can understand how the different kinds of treatments produce outcomes, then we might be able to find within those various approaches, we might be able to find which ones work best and which ones do not work as well. And that can become a very good way of improving treatments overall. So variation itself may be helpful and can be a starting point for improving how we do treatments. That said, there are also reasons when we can understand variation, there are reasons then to reduce variation. And in areas where variation may cause a quality problem, we would want to take those steps. So one of those steps is enforcing data quality standards within healthcare. As I alluded to earlier with the example of Academy Health, their exploration of large sets of claim data has shown that there are significant differences in the way that we record events and significant differences in the way we process claims and the like. So it's hard to tell, given all of those things, how good the data is. And we could and we can improve it. So when we think about this in relation to big data, we see that one of the advantages is that more data can be collected through instruments rather than relying on human interpretation for things that can be objectively measured. So if we think about things like snippets or other devices that allow us to collect healthcare biometric data, and if we know the device itself and if the device is well calibrated, then we can reduce the ambiguity of the data collection. We can collect it in a much more consistent way. And despite the fact that in many respects healthcare needs to be more technologically adept, the provider's offices could up the game and the like, there's certainly been a movement for the use of apps and the use of devices to collect biometric data that is really improving the collection of that data. So for example, in Pricewaterhouse, Coopers reported that 28% of consumers have at least one health-related application on their tablets or smartphones. So people are engaging with these applications and that had increased, you know, more than doubled in three years. And I'd be interested to know even now, you know, a year later what the statistics on that are. As importantly, about two-thirds of doctors said they would be willing to prescribe an app. So that means they would scrutinize what they think, you know, the best app is to provide guidance to their patients and they would be willing to prescribe that. If we can be collecting this data and then integrating it into the electronic medical records, then we have a big opportunity to have reliable, well-defined data focused on particular patient needs included in those electronic health records, which then could move from physician to physician as people look for different kinds of care or different kinds of treatments. So this process and these kinds of devices enable feedback about the data from both providers and patients. This is another advantage. So if you are looking at a paper medical record and everything's written down and you don't have your own copy, you're not going to be able to find errors. But if you have your own electronic medical record and you see an error, if you see a problem or something missing, then you have the ability to provide that feedback. So that can reduce errors within records and can also enable consumers to have a better view of their health. So these are some of the things that will allow us to reduce variation where variation is causing problems and also will be truly taking advantage of these new means of managing data, collecting data. My final point is, again, harkens back to some fundamentals of data quality. So in 1996, Richard Lang published an article on data quality in which he said, we should treat data as a product of processes. And that rallying cry is still important. We have to recognize that if we're going to take advantage of data to learn more, then we have to have trustworthy and reliable data. And particularly within healthcare, data is not a secondary thing. If we don't have data, we give up the opportunity to improve outcomes. So we really need to rethink the system and focus on ensuring that throughout the healthcare process we collect accurate, complete data. Whether that data is our standard healthcare data when you go to the physician's office or whether it is biometric data that's collected through some device. And we do have an understanding of what we mean by high quality data. It is clearly defined. We know what the data represents. It represents unambiguously that characteristic of the real world. It's consistently collected through reliable processes. So if we have a device to collect the data, then that device is well calibrated and it is sending data at the proper frequency and the like. If we have a process, a more manual process, then the people who execute that process are well trained. They understand how to apply standards and the like. And then the standards themselves are comprehensible and usable. So the processes and systems that we build to create healthcare data need to be designed from the beginning with the data itself in mind. And the industry needs to recognize that even though it is a people-to-people industry in many ways, the people on both sides of that equation are relying on the information that they exchange to get to the place where they want to be, which is to help people lead better lives, to be healthier, and to have a greater sense of security. So that's my pitch for better data quality. Within the slides, I've also included a set of references to some of the articles that influenced my thinking here. I would like to draw your attention, especially for those in the healthcare industry, to the Place Waterhouse Cooper's white paper on the healthcare industry, and also to the Consumer Reports article on healthcare and how consumer choices are influencing healthcare. So with that, Eric, I'll bring it to a close, and I'd be glad to start on the Q&A portion of the webinar. Well, Laura, thank you so much, and thank you for the thorough presentation. And while we wait for some more questions to come in, I do want to remind everyone, please use the Q&A box, that's the lower right portion of your screen, rather than the chat window. Also, if you're madly trying to write down all of those URLs in front of you, please know that we will post those in a clickable format on the DataVersity.net website when we post the recording of this session and the slides, which is a good opportunity to remind everyone, we will be posting the recorded webinar and slides to DataVersity.net within two business days, and all of the registrants of today's webinar will receive a follow-up email to let you know how to access that material and when it's been posted. Also, just a quick reminder that you will be able to meet Laura in person at Enterprise DataWorld 2016, which will be held in San Diego, California, April 17th through 22nd. Registration is open for that event, and details are available at enterprisedataworld.com. All right, so let's see if we have any questions coming in yet. Not yet. All right. You've thoroughly filled everyone's brains, Laura. Wow. Here we go. Here's a question. Quality data depends upon the participation of medical personnel. Given competing priorities on their time, how can data collection become more streamlined? Yeah, that is a very good question. And part of what I see as an opportunity is to collect more data through devices. I'll give an example. When I was in my 20s, I had my first EKG, and the process of setting up the EKG took, it felt like, about 15 minutes, and then, of course, the process of getting off of the EKG was equally burdensome. I had an EKG earlier this month, and the improvement in the way that that is administered is largely a technological improvement. The device is smaller, the process for setting it up is much quicker setting it up and taking it down, and the results themselves are much more reliable. So when we talk about certain kinds of measurements, I think the opportunity is for better technology to collect data. And I think many of you who have gone through the doctor had measurement procedures. You probably can cite examples in your own lives where that's the case. On the downside is the danger of our physicians being overworked in some cases and having very high demand, and not only the physicians themselves, but many people who are part of the provider offices and the like are under pressure to do a lot in a short period of time, and this raises challenges. I think the only way to overcome that problem is first to recognize, again, that the data that we're collecting is extremely important to improving the system and changing the orientation from data as something that you collect in order to submit a claim to data as something that has a life beyond just paying that claim. I don't know exactly how to solve that problem because I don't work in a medical office, but I do know that we need, in a sense, to turn that ship. And it may be that getting a different kind of participation from the medical community may come about from within that community themselves if they can make the connection between the data that they create and the ways in which that data can be used. So I don't have the answer, but I do think it's a custom worth exploring. Yeah. Next question. Can you speak a little bit to the maturity of the healthcare area in general in using big data to get useful insights to improve people's health? Is there any country already advanced in this area? So I think overall there is a good deal of maturity in using big data to get insights. I know about five or six years ago when people first started talking about big data and data lakes and these different ways of pulling together disparate data in order to understand, in order to gain different insights about the healthcare system, that was five or six years ago people were kind of struggling to figure out how to do that. But now most of the healthcare payers have teams of data scientists who are looking at big data. There are university programs in data science with the healthcare specialty. So there's been a good deal of movement in this direction. I think the challenges of actually having confidence in the quality of the data are being overcome somewhat. Again, when we have data that is collected by devices and the definition of what it represents is very clear, you can really dive into that data. When we look at medical claim data that is collected in more traditional ways, then it needs to be used with caution. So I've seen changes within the field. I'm not directly doing healthcare analytics myself, but I've seen changes within the field of healthcare analytics that are very exciting. The data itself, I think we still need to work on, especially the data that's collected in more traditional ways. Part of the question is, is there one country doing more of this than others? I can't speak to that, but I do know that there's a lot of international conversation about how to use this data. So for example, the Academy Health Reference that I talked about, the folks that are part of that are engaged in conversation, not only within the U.S., but across international borders. So I think there's a lot to be gained, and people are truly trying to take a scientific approach to it. Okay. Are you aware of specific organizations that are developing data quality standards for healthcare? The questioner is looking for industry guidance on how we can confirm that we have quality data or surface data quality issues. Yeah, so the Academy Health example that I gave is the one that I'm most familiar with from a healthcare perspective. And there's an organization called Type 4, I believe, and I'll follow up and make sure I've got that right, who are part of driving that forward. So, and I'm not going to remember the individual's names, but I will look them up. I know one of our IAIDQ members is very active in this process. So I can send that to you, Eric, that information, and have you maybe share it when you share the rest of the webinar materials? Sure. Sure. Are you aware of any efforts to create a data model for healthcare? The questioner says, in any area, a data model can be a foundation upon which to build the analytics and also serves as a way of seeing which data are available or not and helps think of getting the missing data, etc. Are you aware of any efforts like that? Yeah, so I know several of the warehouse vendors have created industry standard models. So, for example, CARA Data has an industry standard healthcare model. Within the two organizations for which I've worked in healthcare, Optum and Cigna, each of those organizations has created an overall model. What I'm not familiar with is a vendor-neutral healthcare model. And I think I agree with the person who asked the question that a data model is a really good beginning point for establishing expectations for quality and establishing data quality standards. I'm a firm believer in having a data model as a way to have the conversations and then also as a way of capturing the results of those conversations about quality expectations. So, while there isn't, to my knowledge, a vendor-neutral industry standard, I think anybody working in this space should have a working data model if they want to improve the quality of their data. Earlier, you mentioned high-quality data. From your perspective, what are the thresholds of describing data sets as high-quality? So, I could talk for another hour on this question. No doubt. So, I'll talk a little bit, I'll talk about two things. One, my definition of what it means for data to be of high quality and then my thoughts on thresholds. So, high-quality data, and I'm going to just go back to my slide on, representational effectiveness and fitness for purpose. So, most discussions of data quality start with fitness for purpose and they say, you know, the quality of data is judged by the consumer of the data and whether the data is fit for the purpose the consumer wants it for. That's been a kind of standard way of talking about data quality that comes from Richard Lang and Tom Redman and Larry English, other thought leaders in data quality. And I think that is the place where the quality of data gets judged. But I also think that we need to talk about, we need to consider representational effectiveness. And for those of you who are familiar with thoughts on this, Malcolm Pauly talks about this concept of representational effectiveness. So, why are they connected? They're connected because if you're going to use data, you're probably going to use it because you want to explore what it represents. So, if you're going to study global warming, if you want to reach conclusions about global warming, then you need data about the temperature of the earth and changes in weather patterns and all of those things in order to do an analysis of global warming. If you want to improve the treatment of diabetes, then you need data about the treatments that have been used and the outcomes that have been derived from those treatments in order to do your analysis. So, there's a direct relationship between these two things. And our expectations of fitness for purpose are directly related to what the data represents. So, when we talk about then measuring that, we need to understand both pieces. We need to understand how well the data represents the thing it's supposed to represent. And then we can say, ah, it's fit for my purpose. There can be many different purposes for the same data. The classic example is address data. If you have all of your customers' addresses and you want to do a physical mailing, then those addresses need to be accurate at that time. If you have all your customers' addresses and you want to do trend analysis on, you know, your sales five years ago and your sales six years ago, then you need not only the addresses as they are today, but you may need the historical address. So, when we start to think about that, we get in this conversation about what do you expect from the data? What are you using it for? And then you can set standards for levels of completeness, levels of validity, levels of accuracy, however you want to approach that from a dimensional point of view. So, when I think about high-quality data, I think about data that you've had a conversation with your customers about what they are using that data for. You have objective measures for the data, and you can share those measurements with your data consumers so they can then judge whether they can use the data or not or whether it's fit for their purpose. So, you need a lot of pieces in order to get to a threshold. There isn't just one threshold. You need, in a sense, knowledge about the use, knowledge about what the data represents. All right. Laura, thank you so much. It is now the end of the hour. I'm afraid that is all the time we have. Thank you all for your questions, for tuning in today. I hope you all have a wonderful day. And hopefully we will see many of you at Enterprise Data World in April. We look forward to seeing you there and online at dataversity.net and in future webinars. Thank you so much. Thank you, Eric.