 Hello, everyone, and welcome to our next DDW session called Automated Data Lineage. That will be presented today by Mark Horstman, Manager of Data Governance at Alberta Motor Association. All audience members are muted during these sessions, so please submit your questions in the Q&A window on the right of the screen, and Mark will respond to as many as possible during the talk. Please note that there is a linked form at the bottom of the page titled EDW Conference Session Survey. This is where you can submit session feedback and we encourage you to do so. Also, there is a small icon to the lower right of the screen which will enlarge this window with the speaker and slides. So with that housekeeping out of the way, let's begin our presentation now. Thank you and welcome, Mark. Thank you very much and welcome, everybody, to my talk on Automated Data Lineage. And given this wonderful platform that Enterprise Data World has provided us, I would like to try and keep this as interactive as possible. So please be active on the chat, ask as many questions as you like. We'll have a couple of wonderful art-filled slides that we'll pause on to address questions as we can. So without further ado, we'll get into it. Oh, and if I'm slightly more entertaining than normal, it's because I got my COVID vaccine the other day and I'm a little excited, but I can also feel the side effects a little bit. So what we'll cover. So we can't just talk about automated data lineage without talking about how you get there and appreciate it. So we'll talk about what we've done in the past. And this is largely from my experience at the Northern Alberta Institute of Technology. I just recently moved over to the Alberta Motor Association and we're having lots of fun and doing wonderful things there as well. But for right now, we're gonna talk about starting and creating a focus for data governance, how we arrived at lineage being a core component to our governance solution, how that impacts literacy itself and can impact data strategy too. And then, which I'm sure everybody on this call will be thrilled about is to learn a little bit about what kind of ROI you can expect from an automated solution. And we'll also get into the solution that we selected at Nate a little bit and we'll do a couple of plugs for our sponsors here. And then we'll have some case studies as some specific examples of things where we used our lineage solution to save a significant amount of time. And then we'll talk about where we can go from there. And on my slide decks, I like to do key resources and call outs to things that I just love reading mostly from folks that you may have seen here this week. And I encourage you to follow those links and write down some ISBN numbers. All right. So when we start a data governance journey, what we really wanna do is understand the challenges that the business faces. When you come in and somebody says, stand up a data governance program, that doesn't really mean anything until you can start to understand the reason that they brought you there. What are the issues that people are facing? And how do you catalog those things and how do you appreciate and understand those things? So everything you wanna write it all down. It's a catalog of issues faced. You want to build a common data matrix. Now this is an artifact from Bob Siner's non-invasive data governance. I'm sure I've got the ISBN number on the next slide. But what it really does for you is it builds for you that mapping of what are the domains of data that we deal with? What are the roles around the organization that interact with that data? So we're talking about high level domains of data like student or customer. And we're talking about subdomains of data within those areas like demographic information, which would be your tombstone data. And then those subdomains each have actual individual critical data elements that would have their own definition. Now each role within the organization has a relationship to that data, whether they produce, use or define it. And so this is right all from Bob's book. I definitely encourage you to get it. I like I said, the ISBN number is gonna be on an upcoming slide. So part of starting to do that work is understanding all the silos and the different business units within your organization and understand their challenges. I like to say, and I've been saying this a lot lately, it's a lot of coffee. And I've been joking with a lot of friends of mine that I think one of our measures for the engagement in our data governance program would be how much coffee I've drank by noon. So it's really about building those relationships and being empathetic towards the challenges that individual departments and business units face with their data and learn how they go about their day-to-day decision making and the kinds of information and data that they use. So all issues are interesting. When we start talking about data this, data that and we know exactly what bugs are and what the variance is for this field. And this is why we're counting 2,522 students today and tomorrow we're gonna count 2,552. And it's different by 30 because this happened in this app. Interested in that? And sometimes the business perceives an issue where an issue doesn't actually exist. But that is just as interesting as an issue itself. If the business perceives a problem, the problem exists in their mind for a reason. And we as professionals need to be empathetic towards why they believe that. So all issues are interesting even if you do not believe they are real. So technical issues are interesting. Process issues, very interesting. Data quality, like that's our bread and butter here preaching to the choir. And again, simply perception. And when you start talking with people and then this is always the favorite saying that I hear all the time and it just breaks my heart every time I hear it. It's, oh, well, this is the way it's always been. Well, what does that mean? If you've been doing it wrong for 40 years then it doesn't need to be that way. Or if that's the way it's always been and it's great let's document it. What's great about it? Let's understand what that is. And if it's an issue then, whoof. So yeah, here we go. I've got Bob's ISBN number. Write this down very slowly. I love this book. It's on my bookshelf. It's one of my business books of the day. So the common data matrix itself as I said is that organizational wide matrix of who produces, who uses and who defines data across the organization. Bob says this thing and I love it because it always sparks up conversation whenever you talk to folks is like everybody's a data steward get over it. What that really means is that we're recognizing that everyone has a relationship to data and everyone is accountable to that data regardless if they produce it, use it or define it. And like even somebody looking at data on a screen they're accountable if that data is protected by legislation that it's not shared inappropriately. When people are putting data into a system if they're mashing in some address in a field they're accountable to making sure that that's correct. And broadly the definition of data is really interesting being accountable for what it means to be a customer what is a customer anyway is a customer somebody who has bought something somebody who has yet to buy something do we have 10 definitions for customer? Why is that the case? And then we can start to think about those things as well. You can't really dive into these issues until you write them down and the common data matrix is a great repository for that information. So I also say here it becomes a foundation for communication training all sorts of things and I'll give you a couple of examples of that as well. Let's say you've got issues with customer address data from a certain business unit. The common data matrix instantly tells you what job roles are responsible for producing data. And then you can do a communication or training course for that group of folks and say, hey, this is why this data is important and this is why it's important that it goes into our enterprise systems correctly. And this is how you entering that data helps us as an organization. We really gotta tie all that together and make sure that everybody understands their critical role in the supply chain of data. And the data matrix is a great document that can help us understand and document that sort of thing. And just communication too, if everybody's using data one thing that a lot of systems enterprise systems are doing now is to recognize gender identity. So if we have a list of folks that are communicating with and customers and we have gender identity in our systems our users of data need to understand what that means. We need to say, hey, when you see X in the gender field this is what you need to do. This is what you need to watch for. You need to look for preferred pronouns. You need to look for this, that or the other thing. So this document becomes key for that activity as well. All right. So this is our first question stop point and I'll talk a bit about starting data governance from scratch again. So this was a talk that I gave at Enterprise Data World in April, 2018. And it really kind of dives into the things I just sort of talked about. So you design from scratch every single time. Data governance always fits the culture of the organization you are working with. You cannot just take somebody else's data governance plan. You can't just say, oh, I saw this, I liked it. That's what we're gonna do here. Here is a different place than there. And what's going to work here is going to be different based on the culture of your organization. And the entire governance program needs to respect and work within the cultural norms of your organization. And that's absolutely critical. And then when we start talking about filling out that common data matrix, one of the fun things for HANA to draw were these data dragons. What I mean by that is identifying those situations where folks have to track information in a spreadsheet or an access database or some kind of weird flat file that's sitting on an old server collecting dust under somebody's desk. But it's got critical business information in it. So a lot of those things start out as humble little tiny dragonlings that are cute and adorable and maybe serve a purpose or wonderful for analytics. But the big dragon gets born and can take over and be quite disastrous. So with that, I'll peek over at questions. Eric, do you wanna grab a cup of coffee? Sure thing. Yeah, so Andy Everett asks, how do you determine the business domains of an organization? That is an excellent question. And David Marco, who talks at this event quite a bit, I think he's doing a talk later today, actually. He talks about what are the domains of data in your organization. My practice in the past has always been to discover that through talking with the folks in the business units, describe to them what I mean by a high level domain of data and then just get them to list things. And I'm gonna go away and take that and aggregate it. So at the Northern Alberta Institute of Technology, we can talk about students being a domain of data. Well, what about classes, courses and things? So as a class, a class, if there are no students in it, well, no. So is a class a subdomain of student? Well, maybe. What about programs, the types of things that we offer to students? Is that a domain of data? Well, it could be, but again, program isn't a program unless there's students and instructors in it. So how does that all work together? And that's something that you will discover as you talk to your business users. They will be aware of the things that are important to them and the highest level of what those things are. And as you start to discover those things, that'll start to lead you down the path of creating a business glossary. One of the things that we did at Nate was, everybody talks about students, well, what is a student? What does it mean to be a student? Are you a student if you're in this program? Are you a student if you're here for a two hour class one day? Are you a student if you're taking a class in a remote location? Are, what is a student? What is a learner? Are you a student if at the end of your program you don't get a degree? What's going on here? So is there a taxonomy to that sort of content? All right. Second question, Yasu Shibata asks, how do you make data governance a thing for executives that feel each department is regulated enough? Yeah, and that can be a challenge. And really from organization to organization, it's going to be a bit different. And regulations are different between different business units. At the Alberta Motor Association, there's lots of fun regulations that play there. But I'll go back to speaking about Nate specifically because Nate is a higher education institute we take in international students. And so being able to talk to executives about what GDPR means for us offering content to students is fascinating. And we can engage with them in a way that makes sense. We can engage with our executive using business terms and talking about risks and opportunities in a way that's meaningful. And sometimes when we're talking data governance and we get on our high horse about, oh, we've got some business glossary over here in Wikipedia or in this wiki thing that we did oh, go to SharePoint and learn about why we're counting full-time equivalent staff this way. That's not very engaging. When we talk about, oh, this is binary, it's either yes or no. Executives don't want to hear that. They want to hear the value statement that you bring with your governance solution. You want to explain to them the kinds of things that they're going to save money on or that are going to be more efficient or prevent risk as part of implementing a governance program. Any other questions? Yeah, you want to take one more and then we'll move on. Awesome. So Abhishek Agarwal asks, any examples of metrics you define to measure progress or success for such a program? Yeah, so there's a couple of things I've done in the past and this is always something that's evolving in my mind. So I don't have firm thoughts or opinions on this. You'll see here on Hannah's thing I talk about measuring success and I talked a bit about data screens. So Kimball data screens where we're pining for gold and the good stuff stays in and the crap data falls out and if you zoom in on it, that says crap data under there, which is hilarious. I'm sure Hannah enjoyed drawing that as well. But there are a number of things that you can do and there are a number of resources that give you hints of things that you can measure. What we did at Nate is we measured how close we are to finishing those common data matrix for each business unit. We had 20, 22 common data matrix documents to represented each business unit at Nate. It took us, oh, I want to say nine months to do the first go round on it and to consolidate it together. That's close to that. And but the measure for success along that is, hey, we're done this one. We're done this one. We're 80% done this one. We're 50% done this one. And then you can start to measure the success of getting those key documents done. What we moved on from there, we're documenting critical data elements that forms your data catalog and to some degree your business glossary, but mostly your data catalog, where we talk about, okay, classes are important to us. So class code is a critical data element. Well, what is a class code? What does it mean? We've defined it. Okay, great. So that's a one. And so then you have a measure of how many critical data elements have we documented in this month? And then next month, how many did we do this month? And then we always have a goal of how many more we want to define. So that's something that you can look at. There are all sorts of things. Data quality is great for all sorts of measures. Like we've caught this many rows with our screens now, we're catching this many rows with our screens. And I promise you this all relates to lineage. So I think we should probably move on, Eric. Sounds good. Great questions, folks. Keep them coming. We will take another Q&A break. Great. So as we're kicking off and doing our data governance stuff, we need to define our focus. Now this content and defining your focus is really a concept from the Data Governance Institute. Normally I would include a URL to the Data Governance Institute on here, but the last time I checked that that webpage was down. So I'm going to include a couple other URLs that I find very useful. And that's the Data Administration Newsletter and Dataversities website. There's always fantastic content on there. So I encourage you to look there. But defining our focus, really why this gets back to why the business asked you to stand up data governance in the first place? What is the issues? And then what is the focus of the Data Governance Program so that everybody can start to see the value of the Data Governance Program soon? Are we struggling with data quality? Are we struggling with compliance? Are we struggling with data warehousing? So find what that focus is and really zero in your program on helping to solve issues in those spaces. So again, alignment to the business is critical. So if you've got some hot activity going on at your organization, tying data governance to that business initiative is a real great way to meet people, meet the right people and see success right away. At the Northern Alberta Institute of Technology, there was a financial sustainability program that was kicked off like months after I got there, just a couple of months. And they were like, oh, we need to create some measures to measure financial sustainability. And boy, howdy, we were in there like a dirty shirt. So what are you measuring? How does it relate to the business? What are the business rules and definitions of those measures? How are we architecting putting data in front of people's faces? And then it kind of gets into that lineage piece. So what is the supply chain? And I've heard this several times this week and I'm kind of, do I love this term or do I think it's a little bit pretentious? I don't know yet, but it's the data provenance is what people have been saying. And that's kind of an interesting way to think about lineage. It's like, what was the cradle to grave situation for the data that's in front of my face right now? What happened to it? And I guess you could intonate that to be both positive or negative. Oh, what happened to that data? Or, hey, what happened to that data? It's supposed to get going either way. But there needs to be an appreciation organization wide not just from us data folks to appreciate what it means to put data on a report that actually means something to the business and aligns to a strategic goal. But what does it mean for our executive? Did they understand what the supply chain of data is from when it went into a system when a student registered all the way to when the student showed up on a financial sustainability report. So that lineage component becomes absolutely critical. And I think I've got a Hannah slide coming up. I do. So mash some questions into chat and we'll talk a bit about socializing data governance. So at the Northern Alberta Institute of Technology I've talked about this common data matrix. And when I talked about financial sustainability that's where this fellowship council came in. Actually I called them the Measurement Superfriends at NAIT. So we had the collection of us that were working with business units across the organization to help understand, define and pick out what metrics were important. But when we put metrics in front of somebody how do they know that they can trust them? And so what we had done is we created this approval stamp process and what the approval stamp represented. We had four different grades you could get bronze, silver, gold or platinum. We had a rubric that scored each report out of 20. If you got a perfect 20 out of 20 you were platinum. If you were 15 to 19 you were gold. If you were 11 to or 10 to 14 you were silver. Yeah, that's what it was. And one to nine you were bronze. So what that artifact was is it was something we'd mash on every report. It'd be in the top right corner. It'd be a very gaudy stamp. I think if you check out Teedan and I wrote an article called The Gamification of Data Governance you can actually just steal my art. Like it's posted in that in that article. So feel free to thief away. If you just want to take my stamps and use them that's great because if I see them somewhere I'll be like, yeah, that's awesome. But what those grades were, what that rubric did is it scored things based on the kinds of activities that we wanted to see out of our data and relate to our data governance program. So we gave scores on, hey, this data is documented in the common data matrix. The business has provided us definitions for all this content and we required that. The business is in control over who gets access to this report or this content. We required that as well. But we also had automated data quality. Is there an automated data quality component to this? Is this stored or served out of our data warehouse? Is this supported by a Kimbell Star schema? Was something that we put points towards? And we also had a fifth of the points on data lineage. Is the data lineage completely documented for what we're talking about? We had a science fair process. This was, I got this from a book as the pump book from Stacy Barr. So pump is performance measurement practice, I think, performance measurement and process. Anyway, what she does is she's like, we should get everybody in a big room, put some stuff on the walls, get people to think about the content that we're doing, have some coffee, write critiques and thinking about these metrics that we're going to deliver. And so we can include that in our requirements analysis. I went one step further because I thought it was hilarious. So my kids were always bugging me at the time. They were quite a bit younger now that they're 14 or my youngest are 14 now. But at the time in 2019, they'd have been like 10, 11 at the time. And they used to always complain about art class. Dad, what are we ever going to use Bristol board? This is so stupid. Dad, whenever nobody needs to color anything is so silly. That's like, actually, we're going to do that at work and we're going to make our executives do it. We're going to do a data science fair. They're going to put their metrics on Bristol board. They're going to have snacks. They're going to chat about their metrics. They're going to explain them to folks and people are going to have some feedback. Yeah, this means something to me or no, it doesn't. And this is where we want it to improve. We knew that this was becoming successful when I heard a provost say in a hallway, I want a gold stamped report. And then I knew we had something that was going to work. So with that, are there any burning questions from the audience? There aren't yet. I had one. So with this kind of gamification approach and you're getting the engagement from people like the provost, did you feel like you had to put any additional oversight into place to make sure that the game part wasn't overtaking the business objective? There was a little bit of that and they were like, oh, why do I even care? And it's like, is this really? But that's where your data governance steering committee comes in. So if your program's got a strong steering committee, they're going to have influence over the direction of your program anyway. And if your points are based on your program, it's going to be organic and just kind of work out. That's been my experience with it. We'll see how my experience changes over the years, but yeah, we had quite a bit of success with that. Okay. Well, I think we should keep rolling. Yeah, so, boy, howdy, I've got some great jokes for you. So I'm going to stay on this slide so I don't spoil anything, but I don't know if you can see behind me and my lighting's kind of goofy, but I've got some wonderful business books back here. So I've got telling your data story from Scott Taylor, I highly recommend it. I've got my copy of the DM box first edition because I'm too lazy to buy the second edition and I should just totally stop being lazy to buy the second edition. Bob's Not Invasive Data Governance, which is an excellent book. I encourage everybody to read it. I've got one of Ladly's books. I've got Catherine O'Keefe and Dara O'Breen's ethical data and information management book. But most importantly, I've got Fun With Jell-O because everybody needs to have some fun sometimes. But really the most important book as it relates to data lineage and the most important business book that you could read is The Little Red Hen. So I want you to think about lineage and your experience in your organization getting data on a report. One summer day, The Little Red Hen found a grain of wheat. A grain of wheat said the Little Red Hen to herself. I will plant it. She asked the duck, will you help me plant this grain of wheat? Not I said the duck. She asked the goose, will you help me plant this grain of wheat? Not I said the goose. She asked the cat, will you help me plant this grain of wheat? Not I said the cat. She asked the pig, will you help me plant this grain of wheat? Not I said the pig. Then I will plant it myself, said the Little Red Hen and she did. Soon the wheat will grow grew tall and the Little Red Hen knew it was time to reap it. Who will help me reap the wheat? She asked. Not I said the duck. Not I said the goose. Not I said the cat. Not I said the pig. Then I will reap it myself, the Little Red Hen, said the Little Red Hen and she did. She reaped the wheat and it was ready to be taken to the mill to be made into flour. Who will help me carry the wheat to the mill? She asked. Not I said the duck, goose, cat and pig. Then I will carry it myself, said the Little Red Hen and she did. She carried the wheat to the mill and the miller made it into flour. When she got home, she asked, who will help me take the flour, make the flour into dough? Not I said the duck. Not I said the goose. Not I said the cat. Not I said the pig. Then I will make the dough myself and the Little Red Hen said the Little Red Hen and she did. Soon the bread was ready to go into the oven. Who will help me bake the bread? Not I, not I, not I. Then I will bake it myself, said the Little Red Hen and she did. After the loaf had been taken from the oven it was on the window sill to cool. And now, said the Little Red Hen, who will help me eat the bread? I will, said the duck. I will, said the goose. I will, said the cat. I will, said the pig. And a Little Red Hen said, no, I will eat it myself. The end. So that very much relates to the supply chain of data. Oh, and I need to give a hat tip to Anthony Algman who first planted this joke in my head. He said, oh, it's the Little Red Hen. It's the most important business book you'll ever read. So I got a copy and I was like, oh boy, is this ever the perfect allegory to explain data lineage? So thank you very much, Anthony, for incepting that joke into my brain. So yes, this very much relates to data lineage, the supply chain of data, data provenance, if that's a word that we wanna use. And what does it mean to implement data lineage? How do we show it? What does it mean? So what is the size and scope of the environment? What is your architecture like? Who's available to help you do it? What does automation mean? And what we did is we spent quite a bit of time looking at potential return and return on investment with the business. So what does it mean to work with the stewards and business users? Will they take advantage of and appreciate lineage? And it turns out that they do. So when we implemented lineage at Nate, we used Octopi. Octopi are here, you can check them out and I encourage you to do so, they're a wonderful team. It took us mere weeks to get up to date and to get running with it. And really the effort on our side was maybe three hours. We set up a VM and ran their client and set up a couple of accounts and we were bringing in information from our SQL server database, our Tableau reports, our SSIS environments and all sorts of things. So they have a wonderful list of technologies that they work with as well. So I encourage you to check them out. Lots of other vendors have great lineage solutions as well. Irwin springs to mind, Kaliber springs to mind. I'm wearing the DeNoto shirt right now. I was joking with Eric earlier then I'm fully vendor compliant. So what does it mean to implement data lineage? Who's using it? Why are they using it? Why is it saving them time? We found that for local BI development, for folks writing Tableau reports, they were checking lineage to see what reports were out there. They were checking that same lineage to see how things were calculated on similar reports and replicating that. We were able to look at our entire architecture and go, hey, this isn't going anywhere. We've got all of these tables in our data warehouse that aren't being used in any reports and they're deprecated. We could probably just delete some of this stuff. And so we did, but also usage outside the BI team. So Nate is a fairly broad organization, diverse. So we've got pockets of folks who do their own reporting. And it kind of opens up and takes away the black box that is our reporting architecture and really enables them to see how things are calculated, how things are done, where it's coming from, where it's going to, and they can really understand how the bread is baked and bake their own bread with much more success. The things that we saw a lot of success with at Nate was implementing a new integration between enterprise systems. So we had an old graduates or alumni system. So an old feed that kind of moved data from our student system to our alumni system. And nobody really knew how that data moved across. It was like a 20 year old integration. Everybody was just scared of it. But because we flipped on the automated lineage piece and it just finds everything out, it just automates it. We don't have to write anything down, it just does it. And it was able to pick apart what exactly was happening in that integration. And then we learned things and we saved weeks and weeks and weeks of time. Like something that would have taken like at least a month and a half of analysis. We did in minutes. One of the other things that we did was the development of a storyboard. And so in the context of higher ed, we're talking about the life cycle of a student. But we could just as easily be talking about the life cycle of a customer. How did they get recruited? When they were recruited as a customer, what turned them into a prospect? How did we convert our prospects to actual customers? And then how do we support our customers after the fact? In the context of higher ed, that's recruit prospective students, applicant, student, alumni. What is their journey along the way? But we can use this lineage tool not only to help us document and explain where the data comes for those points, but to also see what has been done on individual reports that already described those things. Because it's not like we weren't reporting on applicants before we were looking at a storyboard. The other thing that it was helpful for, not as much of a FTE savings on this, but still very helpful was support for our COVID relaunch. So we were taking daily checkings and we had a number of applications that dealt with information as it relates to COVID declarations, COVID tests, are you self-isolating? All of those sorts of things. Well, we're bringing that information in a web form, where's it going? How are we reporting that out to the departments and programs and academic units that need to know? How are we securing that information? What happens to it along the way before it gets in front of somebody's face? So it was very easy to support that sort of thing when the documentation is free and automated for where your data comes from and where your data is going and what happens to it along the way. So where that kind of leads to, remember that everything that we do in data governance is related to people, process and technology. The automated lineage piece is just a tool and we had that wonderful introduction in our talk because you don't just buy a tool, you look at governance, you implement governance, you have a successful governance program, that program has processes attached to it and then technology comes along and makes all of those things possible. And that's why I say anything worth doing. You already know what's worth doing because you're doing it. If it's not worth doing, don't do it, but it's almost always worth automating. Make your life easier. And one thing I've been saying a lot lately is technology can be a force multiplier where it takes your data governance team of one, two, three, four, five, six people and triples their effectiveness. At Nate, just doing lineage alone, we could show a savings of five FTE across the organization and the cost of that tool and the implementation of that tool and the maintenance of that tool is not five FTE. So you will see benefit from doing just lineage, just lineage, like not even the whole gambit of things that you can do. And everything we achieve is it's about raising the overall data literacy. Folks have no hope of understanding what it means to ask for a report. They might think that a report, I'm gonna ask for it on Thursday and expect it to be done on Friday. Well, if you're exposing lineage and making people literate about the provenance of data, then they can understand that that's not a reasonable request, that maybe it doesn't make sense to just ask for how many students we had yesterday when we don't even know what a student is. So we get into talking a bit about data strategy. I love everything Donna Burbank does. I think she gave an excellent talk here this week. So check out her website, Global Data Strategy. But really she talks about breaking things into these categories, vision and strategy. She's got a whole list and a whole framework there, but what she talks about is, is there a clear understanding of the strategic goals of your organization and the need for enterprise data governance? How does your organization rely on data now and in the future? What impact are data problems currently having on your organization? Do you have a data governance policy? Is there some teeth behind what the program's trying to achieve? And what are the overall benefits of better data governance? And then we can think about organization and people who are the key stakeholders within and outside of our organization who are the primary data producers, consumers, modifiers. So that's the production use and definition of data. Are individuals formally accountable for data ownership? I don't always like the ownership word, but somebody's gotta be accountable for the data in the system. Are employees trained in good data management practices or do we just trust that they're doing their best? And is that good enough to meet our strategic goals? Are there any channels through which data shortcomings can be highlighted and investigated? And then we could think about process and workflow. Do business process design and operations management take data needs into account? Are there specific data management improvement processes in place? Are there issue and workflow management processes to address data problems? Like these are all things that we can think about. We can think about data management measures, have key data been identified, defined and analyzed, how much is left to do? And that goes back to a question asked earlier, examples of metrics where we're looking at, these are the critical data elements that we need to define. How many did we define this month? What's our goal for next month? Let's go, let's go, let's go. And what data models have you built within your organization? Do you have a conceptual, logical and physical model? These are things we need to think about and these are things that we need to be literate about and we need to be strategic about as data professionals, not just expect that literacy grows magically out of the executive's brains one day because it's not gonna work that way. So we also have to look at culture and communications, see overall value of good data management understood and championed by our senior leaders. Do all employees and third parties receive data awareness and improvement and education and training? Do people understand the compliance and regulatory environment that we're working in? And then tools and technology, where is our data stored? What metadata is captured? What specialist tools do we have? Like we really need to write this stuff down and be strategic about it. With that, I think I'm gonna ask for some questions and this is the end of my slide deck. So I hope there's lots of questions I haven't been able to watch and I'll move this over here. Okay, well, Mark, we have one question in the hopper. So I'll read it slowly and give people a chance to get their questions in. Yasu Shibata asks, are there large data challenge, are there large challenges with lineage as data shifts from on-prem into cloud spaces? There can be and it really depends what cloud environment you have. Octopi does a really great job of digging into things like Snowflake and I think they have Hadoop and Spark and a few other things. So it really depends on your solution and your architecture and the tools that you're looking at. Sometimes with some cloud technologies, you don't have a lot of visibility into the actual structures of data that are managed by an enterprise system or by a solution. I think Workday is a good example of that where you can get anything you want out of it, but visibility to the actual structures of data is not something that you can just do. You can't just log into the underlying database and do whatever the heck you want. In those kind of contexts, what you're governing is not the data model itself or the structure or the architecture, you're governing the API. So the means at which you are getting data into and out of that system is where governance comes into place and then it follows, that's where your lineage gets pulled from. Okay, we have another one here. P. Bischoff has asked, what lineage tool are you going to use in your current role? Oh my gosh, well, we looked at a few and I'm excited about a lot, but I don't think I should say. Okay, we haven't decided yet. We haven't decided yet. There are a lot of wonderful vendors here that are sponsoring Enterprise Data World, Calibra's here, Irwin's here, Octopi's here. There's so many great vendors and they're all wonderful people that want to build a relationship with you because they're hoping to snag some money from you, but it's great to look at some of these demos and really what you need to look at is what's a good fit for your organization and what's a good fit for what you need to get out of a data governance programmer tool. Okay, we do have another question that came in. Dominic Ebeling is asking, what are systems typically challenging automatic lineage and how do you tackle these? Well, it really depends on what vendors you work with to solve automated lineage. It's not as hard as you'd think and when I say we got up and running with Octopi in a matter of weeks and like just hours of effort from my side of the house, that's exactly what it was. It's like mash button, download metadata, they do magic thing and then we have wonderful charts. I don't work there anymore so I rightly don't have access to just give you a live demo because I totally would. But yeah, like there are very few gotchas that you'll run into depending on your technology stack. So if you're doing something crazy and custom, then maybe you'll run into something but even in some of those cases, if you can automate or create a even a flat file that says, hey, this is the name of a field and this is where it went to and this is what happened to it and work with your vendor and suck that into their solution, then they can make it work. The tools and technology out there related to lineage are more amazing than you would imagine. Okay, let's take one more. We have a little bit more time. So Tony Sallie is asking, there is saying struggling with how to incorporate API structures utilizing a common message model when moving data from one storage to another. Any suggestions on how or if to represent in lineage? Yeah, and Octopi actually did a really good job of that for us too. And I referenced that a bit when we were talking about integrating two enterprise systems where our middle stopping point just happened to be our warehouse solution. So that's why it just automatically got picked up by Octopi but being able to grab that metadata around the transit of data between system A and system B is valuable to describe what data is moving, how it's moving, what happens to it along the way and how it goes into the other system. And if you can automate that, then you don't have to write it down. And if it's automated and it's just refreshed all the time, you don't have to do any work to maintain it. I hope that answers your question. All right, we'll do one more because we have that much more time. Pbischoff has asked automatic lineage is one thing but how do you marry data definitions that have business meaning to that lineage? Yep, and there are a lot of tools do a lot of different things in that regard. Octopi has their automated business glossary that we played with at Nate but we didn't actually launch that content for a variety of reasons. But if you look at a lot of other vendors there's a direct tie between that lineage piece and the business definition piece. We saw a lot of great things from our vendor partners as we're going through our journey of selecting a tool. But the things that you can see that tie, this is customer, the concept of customer is these fields and these fields show up on these reports and customer also has a definition here it is over here. And so that's where you see and we've seen a lot of discussion about it over the week of these knowledge graphs for explaining lineage as well as business definitions and business glossary terms. And man, knowledge graphs right now make me so excited because you're marrying everything together in a way that's readable for whoever wants to look at something. I would be comfortable throwing a knowledge graph in front of an executive because they're gonna look at it and they're gonna see customer. Hey, I care about those. Oh, customers are managed by these policies. Yeah, they're gonna care about that. They're gonna care about what customers are managed by GDPR or CCPA or whatever. They're gonna care about that landscape. They're not gonna care about the, it's got a VARCAR 2-100 field over here, but they're not gonna look at that. So the knowledge graph representations of these things I think is the wave of the data management tool future, the data governance tool future. Great, Mark, we're gonna leave it there. We are out of time. Thank you so much for this great presentation. Thanks to our audience for tuning in and for the great questions. Please do complete your conference session survey on the page for this session. Between sessions, you're welcome to continue networking with other attendees within the Spot.me app and the next sessions will start in about 10 minutes. So thanks again, everybody.