 And I'm good to see all of you here. I'm so glad to be able to talk about recent work I've done to formalize a data quality assessment framework for the chief government data steward in New Zealand government. I recently started that office was recently initiated in the past two years as well as a chief digital steward focusing on data, looking at capabilities across agencies. And they were looking for someone to come bring experience and industry knowledge into their advice bureau. They have a website where they keep lots of different kinds of advice on data governance, data management, and particularly things like Open Data Program, our flagship advice bureau. So this program or assessment framework was done in a year, a deliberate write-up of my work at the Transport Agency and modified and adapted to an all of government context. Further to context, I plan to talk about this introductory material three aspects to bear in mind in the design of this particular framework. It's meant as an exemplar, not as a to-do list. So it's a abstract from a data architecture perspective. It's a process model. Essentially the topics, you could say it's biased in the sense that it's quite data governance oriented, but discussion here of data governance and data management and the differences in roles and responsibilities, which is practice-based. The second area is that it's important for this to be realized that this is in the context of data quality as a program, not as a particular project. Mainly it's a journey and developed in cycles of capability. That's because we recognize across agencies that they're at all different levels of maturity in both data governance and data management. And that lastly, in particular, as I say, it's a process example. It's meant to show the highest level results or outcomes of developing these processes. So all agencies are doing some sort of data quality issue management and remediation. They probably call those activities by different names and they're likely performed by IT. There's not necessarily a recognition that there should be business stakeholders involved in remediation or people involved in the process of delivering critical information for the agency's mission. Here is to bring forward an example process which is based on standards that articulate best practice for such a process. In order, the data governance data management conversation really is to highlight, there's an important separation of concerns here. Data management, make sure that information is managed properly. Data governance ensures that data is managed to achieve business goals. What does this mean? It means that governance is really focused on oversight and goal setting. And management is more focused on provisioning or should be focused on provisioning and monitoring. It's often a lot of finger pointing as to which part of your IT business organization should be fixing data problems, data quality problems. And that's often to do with a loose recognition of the different hats or different roles that are necessary to really have both sides of lifecycle managed appropriately to assure good data quality. So governance is responsible for resourcing, authorizing and creating policy where data management is more about delivering system projects, performing monitoring and providing the technical talent to data projects and possibly provisioning, working with IT to provision data platforms. But they don't have a lot to do with the business mission or understanding the outcomes or the opportunities that are possibly there for data and new products and new services that are supported by quality data. It does make a few assumptions as well in terms of fundamental capabilities. And some of our agencies are quite low in their capability. So they will have a hard time understanding this but basic catalogs need to be there in terms of, I guess with data and scope as we put it. A lot of data catalogs are there as lists of data but we also need to manage data as an asset and understand the risks and business services associated to that data to make it more conversational with the business owners, process owners. Also a capability to manage and the production and distribution of data products resulting from good data, treating data as a valuable input, meaning the whole life cycle, not just capturing the data but what happens when it's put into a warehouse. What happens when it's turned into an analytics product? What happens after it hits a dashboard and influences behaviors in business partners or business functions? How does that full life cycle get addressed? And really it's to say that there's this important conversation between the two sides of the house, the governance side and the management side. These are emerging fields of course. So everyone will have a different level of capability. The second point data quality looking at it as a program rather than a project is quite important because of all those coordination activities. I mean, my program does outreach and training work to get people on the same page, even have the same terminology for data quality concepts and to get issue reporting done in encourage issue reporting. For many years data quality problems are left on the cutting room floor and people have given up on where to turn because IT data management don't know how to resolve things and there are no standing governance groups to have conversations with. So sometimes you have to bring up these basic services and get issue reporting working across different groups. And I guess what I'd say too is I developed a pilot project to go through the whole cycle and learn and everyone in that process learned and I use that they are ambassadors to the process for other teams to get going on their own projects for data quality improvement. Really important through all that was to build the right relationships and those relationships have to be enduring from one data quality process to the next one because many of the same people are involved across the enterprise and many are specialists to just a particular project. So the person who's the quality program manager has to be able to create and manage relationships across a wide variety of stakeholders all those shapers of the data and reuse those relationships hopefully to the next quality projects. Next to the data quality is a process concept. Here is the process by the way this is the data quality framework. So I'll talk more to it in detail shortly. Just as a high level idea of this process there's three core components they're the colored boxes with feedback built in to ensure sustainability. This is important stuff that I learned actually in my informatics work at Landcare Research about building sustainable systems ensuring that there were actual outputs from a process that became inputs to another and work through all the people who are involved and who can build and improve those feedback loops is just as important as those who can build the fundamental processes. Each one of these three are complex. However, at the highest level you want to be able to enable prioritization of quality issues I don't explain that shortly. You want to be able to actually drive quality improvement through good requirements, data quality requirements. Now measuring without requirements is sort of a lot of busy work of measuring quality understanding the requirements in the context of the business will narrow the level, narrow sorry the number of measures that you have to do make them more relevant and keep them in use to actually report the health and status on data sets. So these mechanisms here are kind of the outcomes of three complex layers of processing that happen kind of across the data quality program. These are based on international standards I'll just point that out upfront the data governance ones which talk about three layers where you evaluate this is why this assessment framework they're evaluating priorities you're evaluating value and risk and so forth. You direct activity through requirements direct measurement activity in particular and do the monitoring, continue the monitoring in order to get that trend analysis and you'd be able to tell if getting better or worse at quality hopefully better as we are measuring toward the requirements that the business specify. This was also informed by industry models really important 3T industry models I use the 10 steps framework by Danette McGillfrey which was published in her second edition last year. Her 10 steps framework takes you through really important things about understanding the business context of a problem working through the environment considerations like the data platform how the data itself is managed and passed from one process to another whether or not there's more sophisticated ELT or ETL or however you wanna call it processing going on moving data from primary operational systems to analytics platforms or data lakes or data warehouses and all of that lineage stuff is really important to understand where the data has been and how it's been transformed. A really great framework as well to sort of enable agencies to start where they are and recognize the roles that they're already playing in this framework is called the non-invasive data governance framework by Rob Seiner. It's very helpful to listen to Rob a few times he's at the university. He goes through what he calls you know everybody's a data steward discussion and it talks about the wider involvement of your specialists that work in analytics and work in business analysis as well as those that crunch the numbers and do the measurement of quality. And that's a really simple but really important message we tend to keep the data quality stuff in the background or down to the individual's responsibility and it does involve some checking with your peers and other perspectives on how you're doing with the measurements themselves like using consistent measures for example or using standard measures for example, very important. That's where the conform dimensions of data quality come in with Myers work. He took 16 different data quality dimension type measures, brought all the definitions and terminology together and came up with a list which is very helpful. I think 28, you know, you don't need them all but you can have quite a good choice amongst very well researched and utilized data quality measures. This resolves an issue that we have witnessed that agencies are using lots of data quality measures but they're not very comparable. They're not using the same terminology. They're not using the same formulas. They're not using the same concepts. It's confusing to be fair. These concepts have developed over 25 years but everybody picks a flavor and they sort of ignore the rest. It has been a dialogue within its own discipline for some time and Myers has helped to rectify that into a single set of useful conformed measures. So I'm going to move now to what I promised, talk about the three components in this model and a bit of the feedback loops. At this point though, are there any questions or comment? More clear as mud, cool. Right, next then I'm going to shift to these three components. Just remembering we're talking about prioritization of issues, improving requirements and performing standardized evaluation and reporting. As I've mentioned, each of these is kind of complex. So the sentences here, I wrote these sentences here for your takeaway. So I'll say them briefly, which is the data quality issue management, impact analysis and value assessment, each have their own processes, which you can have as much or as little process as your organization can tolerate based on your capability. It's important though that you recognize that you have to have some fundamental way of tracking information about quality issues and tracking those issues and working them through those people who shape the data who can help resolve those issues. There's been significant research in again, 25 years to show that 90% of the data issues are resulting from conflicting business processes or from outdated data structures that have simply been acquired and moved from one business to another without redesigned, you know, coping with what is rather than reshaping and redesigning things to meet new needs. These are all environment factors on the data which need to be analyzed. Mostly you can get a lot of information out of your architecture artifacts, but really it comes down to planal systems analysis work. Basically, we get the reports and the logging going, then root cause analysis, which as I said has, there's some good literature included in the framework here, detailing 15 typical root causes and looking through, you know, who you an interview would actually get to, the facts and knowledge of your particular data environment. Getting through a business impact assessment which means you're talking to business people or people are responsible for the mission of the organization. And yeah, lastly, promoting your issues for remediation recommendations to your governance groups who have the authority to resource and make necessary changes to the operating environment if those are in fact the cause of your data quality problem. Working through the requirements is really important, the second one. And it talks about this section and there's a whole paper on this asking the right questions about data quality requirements as well as system requirements. When systems are built essentially requirements are gathered but it's typically about the user interface or the reporting needs and not a lot about data quality issues such as, you know, onward sharing issues of resolution, issues of how to aggregate certain fields properly, solving it sort of once for all. These kind of things and get solved ad hoc much later and then varying fashion. So the requirements gathering aspect is quite important. It says here, you know, typical uses for the data tolerances for completeness of precision in the data and requirements for currency, you know, for monitoring or decision making, all of these requirements could be gathered in the papers written for business analysts who are next to the business and can get the interviews working such that they get these kinds of answers that are relevant for onward use of the data. And evaluation and reporting mentioned before 19157, that's a data quality center for geospatial information which lays out a framework for evaluating quality and then a specific set of reports that consistently report how you measured the quality, the context, you know, the scope of the data used, the purpose, the reason you needed to measure the quality, you know, any trend, connection to trends, obviously you need a bigger picture, you know, then a couple of measures to talk about how the quality is currently or what we call how well the data is conforming to requirements, which requirements, et cetera. So this area here, consistent measurement, consistent evaluation and consistent reporting, this is the biggest problem in general when getting at any trends of data quality and libraries of tools exist to assist us. So all the paper does is bring those forward and you can choose what is relevant for your particular organization. Next, talk about these feedback loops. So again, this is the model. My dialogue or my conversation box is in the halfway in the way of my slides. So I'll do my best here on the right side, but the really important conversation at the outer loop is between what I call the business domain governance group or the data governance group and the enterprise data management team. For the reasons I said before, you know, there's a separation of concerns there, the governance group should be providing the improvement plan for the data, including resourcing. The data management team should be responding with the communications plan, showing how they're achieving the plan, sorry, progress to plan, is it where? It identifying the stewards and keeping the stakeholders informed. And it's a very simple feedback loop, but it's amazingly difficult to achieve this level of clean communication. And this alone has mitigated an enormous number of quality issues in our crash system that transport. We had all kinds of messaging going back and forth between parties that thought they could fix things, never really a clear plan, never really a clear sense of progress to plan and quite a bit of confusion on the stakeholders side as to what exactly was going on to fix the problems. A lot of noise, in other words, not a lot of signal. So this helps quite a bit to clarify what is planned, what's been done and who needs to know. Likewise, I should say inside, I'll call them the inner loops in this framework are far more interesting. Essentially the prioritization of issues can help inform and refine your requirements as well, okay? So that's helping to develop an evidence base that's a bit more effective. You develop your evidence base of issues by logging, doing your impact analysis, having those discussions at the business impact level will help a lot in prioritization. This will help inform your requirements. Better requirements means you have more relevant criteria and metrics which assists in effective and efficient performing of standardized evaluation. And outcome from that is also, as I put it here, enable and comparable assessment with performance of standardized quality measurement in turn will enable more robust and relevant health and status. And that's what we're looking for at the management level, in the governance level, they wanna know what's the state of things, how healthy is this important data set? You've got us to identify as a key asset to manage. What am I looking at? I need to look at the health and status of it and I need to understand the articulation and articulate the value and risk of it being non-conformant. The value of it working for us on the risk if it doesn't. These are the business decisions that the governance group needs to have in focus with the help of the business analysts. I note they're in this, the entities that are conversing here. So that in the first loop, the business domain group talking with the enterprise management team and these secondary loops, so the inner loops is quite important that you're involving the newly arrived data analysts and data scientists. For many government agencies, this is new stuff, right? Another whole team messing around with the data. Those people are finding because they're creating new data products every day, they're finding your data quality issues. They're the ones that need to be involved in reporting issues to someone and they're not sure where to turn. They can also run these metrics. They also are doing it every day. They're running metrics and doing quality assessment on data to see if it's suitable for their projects every day. Knowledge is going on the cutting room floor because they have nowhere to report their bumps and bruises with the data. Any problems they have keeping to themselves. So giving them these mechanisms to report information to be involved in the conversation of metrics and the involved in the conversation of requirements is actually utilizing their knowledge and experience more effectively and more effectively. Likewise, business analysts have a relationship usually as I mentioned, has a relationship with the initial stages of a data resource being developed because they're building new systems. They're involved in the design and the collection of requirements. They don't often get to see the round trip. You know, what happens after the system's been built, how the data gets used in new products and services. So this will involve them also in terms of the priority conversation, involve them in how the data's meeting the business needs and give them the opportunity to describe and work through some of this health and status information to make it more effectively understood by all those middle business process managers as well as the overall data owner. They're the interpreter between those various business services and the data management function. So far, any questions? All good, basically. All quiet? Yeah. Okay, we're very close to the end now, right? So two more slides. So I just want to point out that I heavily use standards to develop this process. I'll back up a second to this process. Just to say, you see, it's highly, this is quite abstract for a reason. All the standards point to the same three capabilities you need, how you develop them, how you deploy them is based on the capability of your own organization. But this is the outcomes of having good logging, good analysis of issues, so you can prioritize. The outcome of requirements, I should have probably put something like improved with a passive voice. If you want improved data quality requirements, you really want current data quality requirements. Currency is maintained through knowledge and relating use of data to the measurement of quality of data. And likewise, being able to do any kind of standardized evaluation is critical for trend analysis to understand how the data is doing. These standards here, I provided and they're in small print because there's a bit of information. But importantly, they've influenced the design of the principles and components of the modules, of this framework. I used a couple from the governance, particularly the data governance standard. Came out in 2019. Three from the data quality suite in ISO 8000. Two from the geospatial domain. One is mentioned when 9157, which is our data quality general requirements recently revised and available now as this, which means public comment. Probably going to full, I don't know what it means, completely revised by the end of 2022, ready for purchase, I suppose. And I used this other one, which was the quality assurance of data supply. And I use that because in transport and in many public sector agencies, they actually don't produce their own data. They buy it. And they don't really do well explaining what they need in the purchase agreement in terms of quality. Especially if it's ongoing, we have traffic loops, we have, gosh, we have ongoing data collection, which is subcontracted. We have hundreds of thousands of dollars involved in those contracts. And surprisingly, apart from saying it needs to be to us at a certain date, there's very little quality requirement built into the contracts. This helps to actually help you create some of those words, those phrases that need to be in your supply contracts. Let's see. And also the toolkit coming up. So toolkit wise, a lot of people ask me about this. So many architects like this less than my practitioners I'll say. But a lot of people want tools. I created the simple set of tools to come with the framework. So the framework involves the diagram, explanatory notes for the diagram and rationale for the design, as well as essentially people in process oriented tools, not technology-based. We can always buy technology stuff to do data quality measures and automate, but we really have to get the fundamentals right in terms of people in process. For people, it's record keeping support in terms of templates for how to do your data quality issue management and the guide for data governance groups on how to have those effective conversations to get relevant policy developed and remediation plans agreed and supported meaning resource to repair important data quality issues. In the process area, this metrics or measures standardized that customized, I'm sorry, but consolidated group of measures provided by Myers. And there's a whole set of measures as well for the geospatial information through 19157. And finally, a guide for the relevant standards which details in this previous slide, how exactly these standards informed the framework. There are some considerations, I guess I wanted to wrap up with, to say, what did I learn out of this? I think it was important in terms of principles to maintain a flexible approach to realize that this was an exemplar and not the perfect framework to make it outcomes oriented rather than very specific itemized process lists. You can come up with your own process lists from the standards as well as what you're already doing. The really important thing is to realize where you already are and your capability and that you can map your capabilities currently into this exemplar process. The goals, very important principles two and three was really about it must do a lot to improve the remediation activity that's clear up the communications, that's make the roles right, that's make sure measures are associated to requirements, make sure that the health and status information flows back, back up to the data quality, data governance group, make sure that the data management group is comfortable with their new relationship to these business data scientists and business intelligence people. I mean, there's a whole lot of improvement there for communication and remediation that the framework addresses and enable or make sure of comparable assessment as a very important principle to this. Hence the sticking to industry defined standards for measures. The design goals here is already outlined with people centered approach rather than technology. Yeah, continuous improvement is a critical aspect because the first time out with a pilot you'll get a few ways around the ropes around those links and then you realize we're a little bit short in this capability we could have done better over here. So it's always good to view it in the context of building up better capability and that it's a shared responsibility that all these roles have an important point to play and that data management and data governance don't have to be at odds. No, they just have to realize where they're sharing where they're sharing the responsibility for the outcomes of the data quality process. I found it helped a lot to start with standards and to start with industry practice. You're starting with your peers as well. The challenge I thought was to get some data governance groups off the ground and to get them to take responsibility for remediation plans because this was something that was new to our organization so they had to find a way to not only make the time to meet but make the time and the effort to look for resources to actually invest in the data and invest in remediation plans. I'd say for anybody taking on a data quality program you need to think about your own organization's culture and the governance activities that would be practical so you can make real improvement but look at it from the business process point of view rather than just the technical aspects of data creation, data storage and so forth. All these people are shaping the data as your organization has moved into analytics and other data products, data sharing arrangements. And yeah, try to keep your, if it's possible making data quality a program rather than the project was quite important as I say because I was able to retain all these stakeholder relationships and redeploy those relationships and not wear everybody out to look at every problem. I carefully balance the talents against the problems and try to make it realistic for people to have in their work scope. So this will be published soon on the New Zealand Government Data Capability resource website, this is the address data.gov.nz. We'd love to have feedback on the framework and to say that the actual guides and stuff will be shortly available. Data lead at stats.gov.nz will be the home of the formal documents, currently it's in soft launch. So it'll be formally published soon. And if you wanna get in touch with me at any time, happy to speak to this again in your own organizations or over chat, over video. Here I am at Transport. Thank you.