 Hello everyone and welcome to our next EDW session called how to data catalog forms the how a data catalog forms the foundation for data governance which will be presented by Kim Wanzerl the chief data officer and Amber O'Connell the director of data governance both with the state of Tennessee. All audience members are muted during these sessions so please submit your questions in the Q&A on the right side of the screen and our speakers will respond to as many questions as possible at the end of the talk. Please note that there's a linked form at the bottom of the page titled EDW conference session survey. This is where you can submit session feedback and we encourage you to do so. So let's begin our presentation now. Thank you and welcome Kim and Amber. Hi everyone. Thank you for the opportunity to present this to you today. Again my name is Kim Wanzerl I'm the chief data officer for the state of Tennessee and with me today I have Amber. Hi I'm Amber O'Connell I'm the director of data governance also for the state of Tennessee. So today we are going to talk about how data catalog forms the foundation for data governance. A little bit about our business here. We are the state of Tennessee 36 largest state. As you can see here there's a list of 38 different agencies within the state of Tennessee. This is like having 36 38 different companies or 36 38 different product lines. Tennessee welcomes you. Our governor is Mr. Billy. A little bit more about Tennessee. What are we known for the foods amazing Nashville hot chicken smoked meat. A number of whiskey distilleries of course country music rolling Hills dirt roads and as you can see some very fine desserts you can find in Tennessee so when things open up y'all come visit us. So what was Tennessee trying to solve when we introduced a data catalog to the state. We are looking for a couple of things with data governance. We want uniformity in the process that we're doing to say that we are doing data governance. We're all doing it in a very similar fashion or framework. And when we say we're doing it you know we are doing it completely. Like everyone else we'd like to prevent having audit findings when it comes to data quality or data governance type issues. And we want we are have a big push on sharing data across those 38 different agencies so in preparation for being able to do that we need to understand what data is key or critical and get it into a governance framework. And like everyone else out for that faster better data driven decision making. So real thing that the state employees at Tennessee are solving for is what Governor bill Lee says to see the needs of our neighbors around us every single one of them and commit to serving them so how better can we serve them by making better decisions to impact them. So late last year we implemented a data governance framework. Some of you may be familiar with this framework we spoke about it a couple times in different forums but we call it the eight elements of data governance. And what this framework is is a methodology and some templates some recommendations for how to do data governance across the entire state. Chances are much of the state is already doing a lot of this informally so this just formalizes that process a bit. And the framework is built for reuse and completeness so one agency says they are fully governing the set of data and another agency says they are fully governing another set of data. We know that there that is a complete process. So within this data governance framework we're here to talk about how important data catalog is this so under these eight different elements really box two, box four and box seven have usages for a data catalog. Amber. So I want to talk a little bit about why the state of Tennessee came to the idea that we did need a data catalog and kind of what how we got there so when I was promoted to the data director of data governance role three years ago, I was the first to serve in that space. So it was a little bit of the Wild West and between convincing people that we did in fact need a data governance program and actually building the program. It became really apparent that it's really hard to govern data if you don't know what data you have. We didn't have an inventory our data kind of looked like that picture in the slide. There were some places that had better controls and a lot of areas that didn't, you know, we didn't know what we had so we had duplicate data over and over again we've got new data on top of old data and it's just it's kind of a mess. We were giving different sets of people access each time we were duplicating our data. Yeah, so the first bullet point really kind of sums it up. We had a lack of visibility availability and accessibility of state data assets. And so what I wanted to do was inventory and catalog what data we had so we can help people discover what data is available to them. Also I can really help drive that culture change across the state to be more data fluid. No pressure right. And the second reason for a data catalog was that we didn't have enterprise standards for data management. You know as Kim alluded to earlier the state is comprised of dozens of agencies that each operate as individual businesses so everything to do with data is really across the board. A few of our systems have data dictionaries. Some of them are accessible. Some are not. Even less divisions might have business terms defined but that's definitely not the norm. And I'm not sure if you've ever tried to justify why a business needs to spend hours creating and agreeing and defining a business glossary but it is a really really hard sell. I want to talk a little bit about the Department of Mental Health. They have several different divisions and they have something called subacute what does a patient mean when it's subacute. One division would say oh that's somebody who was in a bed for less than five days. One division would say oh that's a patient that's been in a bed for less than 30 days. This isn't the same department. So if you've got a department had asking hey how many subacute patients do we have you would get a wildly different number depending on who you asked. And so then the head of that department just kind of throws it out the door they stop asking they stop believing in data they stop trusting their own data. And then they make decisions that aren't based on data which is the exact opposite of what we're trying to do with what we really want to do. So if we have these standard definitions and agreed upon terms that will really help us improve our data quality and our trust in our data like I talked about. So of course there's always tons and tons of reasons that organizations decide to get a data catalog these were just some of ours. So you know once we made that business case yes we need a data catalog yes this is going to help us do data better in the state. You know we went through the traditional you know procurement process and we ended up with elation and so we we went live with elation I think mid January so we're about three months into our elation journey right now. And so we've definitely got some things that we have already implemented and then some things that are on our roadmap so I'll go into some of those now. So Kim mentioned the elements of data governance and what you're looking at is box for standards and definitions. So the fourth element of data governance is implementing those standards and definitions. They are the guidelines that inform how we design data solutions, interpret information, evaluate performance and ensure quality for our data from source to consumer. So the goals of this phase of data governance are that they're documented and required for all fields in the data set and that safe sources of data are identified. Data definitions literally just describe the meaning of a data element in a way that is readily understood by a data consumer. To my point of subacute in one division the data definition was patient in bed for five days. That's a data definition it's readily understood I said it you understand it great let's move on. Standards are documented specifications for a specific data element or sub element standards are derived from data models data schemas naming rules and business rules. Standards may include things like value values and acceptable ranges like an age is probably somewhere between zero and 120 or specific formatting rules. Never ever ever do free text fields minimum data quality levels you must have you know 93% of your zip codes filled out any determined required fields first name last name usually required fields. You want you want to know what your safe sources of data are in case you've got conflicting sources and then of course formulas for deriving for any derived data that you've created. If you make if you do a report once a year and one year you go back man how did we get there and you do it differently. That's a problem and that's a quality issue. So standards and definitions really help us get there. And so they're usually housed in a data in a data dictionary the work is typically performed by a data steward and then reviewed by the data owner. You'll note that the templates and tools section is completely taken care of by elation elation holds our business glossary elation holds our data dictionary you can import a data dictionary if you have one and they can export it back out of it if that's something that you need elation literally is a metadata repository. That's what a day catalog is. And then of course elation really helps us build and understand our data so we can know what our data models look like. And elation really helps with that pesky F name issue. So a human sees F underscore name or F dot name or straight up just F name and the human knows okay yeah that means first name, but the computer does not know that the computer does not know that F underscore name is the same as F dot name. However elation knows an elation can make that connection for us make it happen automatically. So while we still need standards it kind of reduces some of our reliance on it and it really does help. Next slide please. So I wanted to show you actual screenshots from our elation instance, and it show that we can flag for data classification and regulation that is super duper important working in government. We have something called public data where data is free and open to the public will put it online will release it no worries but then we also might have really really restricted data like HIPAA data or PII. And that's something that really needs to be controlled really really well well if you don't know what data you have, you don't have it classified, then you're really afraid to give out any data in case you've given out the wrong data. Elation really helps by automatically not automatically but it has a place where you can say yes this is public data or yes this is restricted data. It puts it elation does know automatically what the last update what the top user is of certain data we have a space for the valid values and ranges that I talked about earlier. Next slide please. And then of course the glossary so I said that elation really helped us with our business glossaries. I wanted to show off our privacy glossary because honestly I'm really proud of it. Our privacy officer was super duper excited to help fill out this privacy glossary. So she's defined she's created all these terms and defined them and put them in elation and now anybody who logs into elation can go and see okay these are all these are what the privacy regulations are this is what they look like this is the official definition. So of course you still need to do the work of defining and agreeing on terms, but it's super helpful to see what's already been done and then copy it to the peer pressure. Like what if mental health defined a term and then the Department of Health looked at that and said hey, that's what we do. Let's just let's just take that. And then once health did it maybe 10 care did it. Well, now you've got an enterprise standard that I didn't even have to talk to them about they copied it themselves. They said okay great now that's done box tech tick easy. So that's what's really really great about elation and glossaries. Next I'm going to talk about audits. So the final element of data governance is audits and this is a little bit unique to our data governance framework because I've never seen audit happen in any other data governance framework and it's super important that the process is self auditable as well as prepared for external audit. And the process behind the elements of data governance will everything will be fully documented in one place. I like to think about it like a big three ring binder. Everything that we've done the organization that we define the policies would find the procedures we've written definitions and standards we just talked about all of that stuff you're pulling it all together and maybe you're putting in a big three ring binder in real life you're probably not it's probably online in your computer somewhere but think about the binder. And so the purpose of the internal audit is to set up a regularly scheduled audit that every data governance process is being used policies are being enforced and the data quality metrics are being used towards impactful action. Because if you're measuring it and you're not doing anything worry bothering to measure it. And so of course the goals of this phase include actually working through a self audit process. You want to document any gaps in your data governance process this is really good that way you know okay this is a problem let's address it now. And then you want to report back on the results. So typically the data owner determines what needs to be audited and what that schedule should be. And they're the ones actually responsible for performing that audit and then reporting back. I find this is super helpful because it can remove that curtain that data owners to feel more knowledgeable and feel like they have ownership over that data process. They feel more trust about it and are better able to enable sharing either within or outside of their organization. So elation absolutely helps us with our audits next slide please. It's really a big piece of that giant data governance binder that I talked about. So you can see here a little bit of our self audit checklist and really elation helps with all of that it's going to hold our policies and our procedures are standards are definitions are data flow diagrams through the lineage. All of our data is classified like I talked about our owners and stewards are identified and they're written down in in elation. And then I really wanted to point out who has access. Who has access to your data can sometimes be a tricky thing to answer and it's a little embarrassing but elation it's super duper easy because elation this is a screenshot of our groups in elation so you've got the different groups you've got the people who are in those groups. And then so that's the standard elation groups at the top and at the bottom you've got the custom groups. And so you can see okay these are the DHS sources these are the people that have access to the DHS data. This is the TDoc sources these are the people that have access to the TDoc data. Super easy super easy to screenshot if it was in a binder I would take the page out and hand it to my auditor and say okay done boxed it. And then the last the last element of data governance I wanted to talk about was policies. Policies are the second element of data governance and this is something that's upcoming on our roadmap we don't quite have our policies in elation yet but we want to. So policies are going to be the foundation of any data governance program and the goal of a data policy is to provide guidance on how we manage a data asset over the entire life cycle. Some inputs to consider for your policy are going to be terms and definitions if you already have them data naming standards business rules quality standards and of course information security concerns. You're going to have different level of policy depending on the different types of data that you have. And so a data policy may also authorize and define the data governance program committees and any roles associated with data. So you may, if you remember the whole framework started with element one that was organization and people and then policies is number two. So element one is where you identify your people you identify what your organization structure looks like and then you're going to write it down in your policy and you're going to make sure that that's approved throughout your whole organization. Because you can't really do much else unless you have those two things in place first. And so there are five key focus areas for data policy development that you want to make sure you're addressing. Authoring the when how and by whom data may be created changed or deleted access which people or systems are authorized to see and get the data usage. What are the authorized uses for the data and how are they mapped back to the authorized users maintenance how is the data maintained in the source systems and backed up for recovery. And then retention and storage how long must the data be kept in what format and any defined lean times for retrieval. The level of policy definition required will vary depending upon the business value of the data and the associated risk identified by the data owner. This is where I talked about the difference between public data that's got very little policy around it and maybe HIPAA data or CGIS data that has very good business value but also has a high business risk associated to it. So you're going to make sure you have really good well defined policy around that stuff. So what we're trying to do what we're going to do what's on our roadmap of course is putting our data policies in elation that way we can link it directly. If this is public data we're going to link our public data policy. If this is CGIS data we're going to link our CGIS data policy or different departments each have their own data policy agriculture data going to be linked to their policy mental health data going to be linked to their data policy. And that's really what we're trying to get to but I don't have screenshots because like I said we haven't done it yet. Okay back to me. So like Amber said we're fairly new at this although we feel like we've got a pretty solid plan. We do have a number of agencies lining up to participate. They've seen the value and they're ready to go. So this chart which I'm not going to go through the whole thing but this chart on the left is talking about the data initiative status since we've started doing some new things. New technologies, new techniques with data management across the state. So the rows are actually the agencies or groupings of agencies that have signed up. And then numbers one through five is kind of talking about the status or the stage they are in whether they're just reviewing these solutions. Are they actually planning for it or are they actually deploying or deployed. So the first column is the data governance framework that is the eight elements of data governance. The second column is the elation data catalog. So these are the agencies that are either planning or in or in process. And then the third thing that we also have available now across the state is a data lake. We purchased snowflake. And we were very delighted to learn that you can plug elation right into snowflake and catalog it as well. One of the main reasons we want to be able to do that is I have built data lakes in the past and it doesn't take very long for them to turn into a data swamp. If you don't know who the owners are, what the purpose of the data is or exactly what you have in there. So we are going to have a lightweight data governance process running against everything that is in snowflake and it will all be documented in the elation data catalog. So that's it. What we covered today was the high level eight elements of data governance that the state of Tennessee is currently deploying and has adopted. We talked about how elation data catalog plays a big part in that. We've talked about how the data lake connects in with the data catalog as well. Future items that will be coming into the catalog are going to be things like reference data. So within the next 12 months we will have a consolidation project to see if we can put some governance and catalog that as well. So I think, Jim, I think maybe we're ready for questions. Well, so I had a couple of questions. I know that Kim, thanks for answering some of those that we were going. So the first one that we haven't answered yet is from Bill, if you can see that in the little side chat. And then we also have one from Margie. Okay. So Bill is saying, what is the funding model for this investment year over year? How many staff hours will you need? The funding model is an annual funding model and it's based on the types and numbers of licenses that you purchase. We originally bought a one-year license or preparing a three-year license coming up. As far as staff hours needed to deploy, the deployment seemed fairly easy. We did purchase the Elation Quick Start or Jump Start services. That went pretty quick. But we are in the process of hiring a data librarian. So we think it's going to take about a half an FTE to basically help onboard the agencies and onboard the data into that. The other half of that person's time, they're going to actually own our open data portals, so the public facing data. But again, we're taking a look to see if we can catalog that as well. The way that we're charging back then to the agencies is if you're a viewer, no charge because we want to promote people to come in and use it. But if you are actually a cataloger, so somebody that's curating the information within there, they'll just pay for their licensing. Question from Margie. Individual catalog per agency or a single catalog? A single catalog. So Elation's got a very robust security model. You can basically pick and choose how much people can see. Like Amber showed, there's groups so that you'll be able to identify grouping people who maybe can see everything. Even a test data set or not. Amber, do you have anything to add to that? No, not really. Yeah, it's got a really robust who can see what data. And so we definitely have some agencies that want to keep some of their data private to just themselves. And then they're keeping some of the data open to the rest of the enterprise. But we're trying to encourage as much data across the enterprise as can be seen. Amber, you want to take the next question from Dominic? Why did you choose Elation over the competition and which other vendors did you consider? So we, this was like pre-COVID that we did this little procurement process. I don't remember what exactly the criteria was, but we needed something that would work as automatically as possible. Which is the word that people like to bandy about. So Elation really, really is good at that. The two other top competitors we looked at were Calibra and Informatica. I don't remember the exact name of their tool, but those two were also top three as well. Question from John, how have you ensured adoption of the catalog? How do you make sure analysts and engineers or developers are using it? Right now, we put a change management plan around it. So we had a little internal marketing campaign going on. Like you're going to love this. You can't wait to see this type of thing. We've done a lot of demos and examples and tried to help it resonate with each of the agencies. But the state of Tennessee, like the agencies are starving for this. Most of them have, at least the large ones have something in the way of a catalog. It's not easily searchable. It goes stagnant very quickly. So they're looking to get out from under having to maintain that. And also they're very much looking forward to seeing what everybody else has. So what better way than through a catalog? So I guess that's yet to be seen. But so far, great uptake here at the state of Tennessee. Question from Seth, do you have anything else interesting to share about your open public data initiatives? Yeah, we're getting ready to publish one. And once we publish it, we will have our first ever kind of data management, you know, tn.gov website, which is going to help people maneuver through and get to open data. Some innovative things we'd like to do with that is, you know, be able to even share with municipalities, share with the federal government, and potentially maybe catalog the data as well. Question from Brian, are you able to track viewing metrics for your glossary entries? Amber? Yes, yes you can. Lation's got some definitely built-in reports, and then if those don't meet your needs, you can get your, whatever your local Lation admin is to create your reports. Or honestly, Lation probably wouldn't as well, but the answer is yes. I'll let you take the next one from a different Kim. Are you using tools for lineage and if so, what tool? We are not. We are using the whatever lineage comes available in Lation. So far it's not, it is there. It's not as robust as, you know, some other specific data lineage tools, but considering it's all rolled in one, it meets our needs for now. I have a lot of questions. Some of the automagical functions of Lation. I love this because when you connect Lation to your data source, it just works. You see the data, you see the schema, you see the table names, you see the column names, and then you can see the data, and then you can press the search bar and you've got Google for your data. I don't know if you can tell, but I'm really excited about our Lation instance because when we first connected it, we connected the Department of Agriculture. And I was like, okay, let's, you know, this is before we had even done any curation, any input. I was like, cool, this is great. Now I'm going to search for county data. We had 660, no, 763 tables that had county data in it. And Lation knew that immediately. The search happened and there it was. There's my 763 tables with county data. So that's what I think of when I think of automagical. Next question is from Andrew. Any thoughts on how best to create, gather, review, approve content included in the data catalog? Yeah, use the eight elements of data governance because that's what that framework tells you. Behind that framework, there is a guidebook and a couple playbooks. But pretty much, you know, you take that, you modify it to what you need to get done and it steps you through the processing. So that's how we've chosen to do it at state. Question from Dawn. Does your state have any data privacy regulations in the Hopper? And how might you work with this? Yeah, so data privacy, as you might imagine, is huge for a government entity. Because the last thing you want to do is release information about your citizens into the wrong hand. So, you know, there's a lot of data privacy regulations, laws that we already work under. But we are looking at those and making sure we stay within those guidelines as we start offering up data sharing. So I think we're out of time, but guys, great questions. Thank you, everyone. Yeah, we are out of time. There was another question that came in that we're not going to have time for. But you'll be able to come back into the session and see that if you do care to respond. That being said, thank you so much for this great presentation, both Amber and Kim. Thanks to our attendees for tuning in. Please complete your conference session survey on this page for this session. The next sessions will start in about 10 minutes. Please also remember to stop in at the sponsor booth before 1.30pm Pacific time today. Thanks, everyone. Thanks.