 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining the latest in the Monthly Webinar Series, Data Architecture Strategies with Donna Burbank. Today Donna will discuss metadata management from technical architecture and business techniques sponsored today by Irwin. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. For questions we will be collecting them via the Q&A section or if you like to tweet we encourage you to share how it's a question via Twitter using hashtag DA Strategies. And we very much encourage you to chat with us and with each other throughout the webinar to do so just click the chat icon in the bottom middle of the screen to activate that feature. And if you'd like to continue the conversation after the webinar or follow Donna further you may do so at community.dativersity.net. As always we will send a follow-up email within two business days containing links to the recording of the session and additional information requested throughout the webinar. Now let me turn it over the webinar over to Sean from Irwin for a word from our sponsor Sean. Hello and welcome. Hello. So thank you everybody for your time today. It's an honor to be here. My name is Sean Roberts. I'm the vice president of Solution Strategy for Irwin. I think I have control. I do. Okay so let me tell you a little bit about Irwin and kind of our journey first and where we are today. As most of you know Irwin is a very well-established brand in the de facto standard in data modeling. Today Irwin is used at over 3500 customers and by more than 50,000 users around the world. In 2016 the Irwin assets were spun off from computer associates as a standalone private privately held company and Irwin Inc. was established. So we like to say we're basically now a 30 year old startup. Since then we have heavily invested in the solution and made several acquisitions to broaden our portfolio into the leading data governance platform. We've integrated these acquisitions and solutions together along with significant R&D basically with a single goal. We want to be the only software platform with integrated capabilities for enterprise modeling, data cataloging, and data literacy. So the Irwin Edge basically creates an enterprise data governance experience that facilitates you know collaboration between IT and the business users to discover, understand, govern, and socialize your data. As I said today we're trusted by more than 3,500 global customers most of which fall within the Fortune 500. We have 250 employees and seven offices around the globe. We are proud of the value that we can deliver to our customers and the satisfaction basically they show in our industry leading net promoter score NPS. So let me tell you a little bit more about our mission. So basically you know everybody has what we've seen today is kind of the with the growth and the explosion of data you know has the same challenge right. Poor data management creates chaos and as a result we all face kind of the same common dilemma. What data do we have and where is it? And so you know how do you solve that? So we have a suite of tools that allow you to harvest, scan, collect, and analyze your organize and analyze your data. Once that's complete our solution will allow you to analyze it. You can you have complete end-to-end visibility of the mappings to see how your data is being federated throughout your organization. This allows you to see the complete data journey down to the system table and column. We also enable you to link your business terms and definitions to your data elements and dictionary and enable you to structure and govern your data. Data is never at rest right it's constantly moving evolving and changing. So when you have VDAs and data models constantly making updates to your data sources are going in your data sources are going through rapid states of change. So we have a set of tools that allow you to version the data, audit those changes as well as the ETL processes that are moving and transforming that data through the complete data lifecycle change process. Having the ability to version and audit the data through the lifecycle process then is the foundation for true governance. We also provide you the tools to visualize the data and the relationships among the data so you can visualize manage your data landscape right. You can see where your data is within your organization how it's evolving changing how it's being used and who it's being used by which enables you to then institute a true and robust data governance model with best practices and establish you know a single source of truth for enterprise socialization of data assets. So this next slide is intended it's very busy but it's intended to be busy. It's what we consider representative of typical large organizations. It represents the types of tools that they use they haven't used today your ERC systems your high performance database systems your operational systems various ETL and ETL tools as well as BI reporting solutions. So if you're going to have a proper governance program you also have to have the ability to catalog your data sources understand the meanings and definitions and you also need a robust set of connectors that can reach into these tools automatically catalog them and document how the data is being processed so you can understand the uses patterns of your data. For everyone this is really our sweet spot and what we do it's extremely well. We have the industry's broadest set of connectors we've got over 125 and we'll continue to draw in these connectors and and these connectors can tap into all these tools and technologies that you see on this screen here. So this is a representation of a fairly typical data warehouse architecture probably looks very familiar to almost everybody and this is what we see for most mature large organizations how they store process and organize the data. You have multiple CRMs third-party inputs that come into your organization land and stage or raw databases that are typically moved to an operation data store even at data lake. This all then gets pulled into a data warehouse structure followed by your analytical service layer and then on into your BI layer for visibility. So Erwin can automatically catalog and document all of these data structures and data sources you see here and display the data journey for every asset with the impact analysis and full in and lineage. We can also link your business philosophy terms policies and rules to your data elements so that you can see your data structure to glance so you can know instantly where all your impactful data is such as PI or HIPAA where it's stored and who has access to it. So we're seeing a few you know kind of the three primary market T-market trends in data today and those are automation, artificial intelligence, machine learning and data lineage and its impact right and so you know Erwin has answers to all these trends right for automation you know we we automatically scan, harvest and document all of your data sources we continue to develop and add new data connectors you know as I said regularly. For AI and ML today we automatically link business terms and the data assets automatically document your lineage and our AI roadmap includes more than eight additional use cases in the next six to twelve months so we're heavily invested in that and then you know like I said earlier in our true sweet spot now is the enterprise lineage and impact. Today we can show you know both the upstream and downstream impacts at you know a source a table all the way down to the column level who's using it who it's being used by and and where your data is being federated out so that kind of brings us to you know our metadata management vision here and so from from an Erwin standpoint basically you know we broke that down into five areas right so data intelligence is where we continue to enhance our data intelligence suite it's got a data catalog serving as a single source of truth for all our customers our enhancements include you know AI and all things as a guiding technology principle and we're ensuring we help customers automate data operations and innovate their data intelligent connections assumptions and next best actions. Security where is another one our key customers are large regulated enterprises that store data in the cloud behind firewalls and in unstructured environments and other locations they can't they can't yet envision right and so that's you know data security is paramount but often in odds with the metadata management in terms of that accessibility so our security infrastructure roadmap provides automated tools to meet security requirements and automatically access and document data regardless of its location which is another you know you know good solid sweet spot for us we also have real-time monitoring so that Erwin data catalog includes patent and technology that automatically refreshes and versions data on a scheduled basis we continue to extend this capability and will include real-time monitoring support for customers where an on-demand data intelligence is required and then semantic modeling so semantic models show the intersection of business requirements and the business and technical infrastructure needed to satisfy them and our set our semantic modeling capability enables customers to analyze specify and communicate an organizational wide view of data's role within the business from multiple perspectives and then finally the active metadata so we have a very robust platform open apis are important to our strategy for active metadata and we have already activated Erwin DM metadata through apis and connectivity to a wide variety of tools including repositories database management systems data integration tools and other suites and the data intelligence suite is becoming a fully open is fully open via our api platform so we strive to have a visionary impact on the metadata solution market today with our data intelligence platform and taking an active metadata approach being data driven we allows us to offer speedy insights and our connectors allow for tracking data in motion as well as data at rest and you know finally we have a significant focus on data quality as well so at the data governance as a data governance company Erwin provides data modeling you know breaking down of organizational and technical silos for collaboration across key data business architectures and this is a key part of our data intelligence suite and our Edge Erwin Edge data and governance experience our data and data literacy suite enables you like I said to automate data operations provide business users with better visibility and together these products combine what we call the enterprise data governance experience or edge the getting this provides data driven insights agile innovation regulatory compliance and business transformation with data governance as a hub and driving principle you know so the enterprise and its critical architectures are connected to enable discovery understanding governance and even socialization socialization of data assets and supporting information we do this and we see this as ability a way to reduce risk and realize desired operational results so at Erwin we believe that data is everyone's business everyone's responsibility it's not just a single person and so the data journey is very important to us and who touches it and how we enable those that touch it and use it and consume it so ensuring that we're meeting the needs of data consumers across enterprise value chain whether it's IT business or data operations people so this is the key to unlocking the power and the value of your data and power and digital transformation so what we mean by digital transform transformation so in our experience customers are looking to become data driven in really four key areas right improving digital experiences so bridging data together with a variety of touch points across the enterprise to better understand and optimize the way the business interacts and supports its customers enhancing digital operations most organizations are spending tremendous amount of time in manual data prep and maybe dramatically reduce time to insights for analytics projects driving did digital innovation is regardless of the industry organizations are looking for compelling competitive advantages and leveraging data to deliver new products and disrupt with new business models examples of those are uber and urban sorry uber and amazon and are you know part of virtually every technology charter and then finally doing a digital ecosystem so no one can do it alone today in today's market so companies need to build platforms and partnerships that allow you to accelerate scale and growth Sean this is amazing we're right out over the uh 10 minute so we do want to make sure we move on to don anything to wrap up and yep so sorry i thought i was going fast i apologize so yeah so i just so i all i have next is just a set of kind of screenshot screenshots on the slides it basically gives you an idea you know data stewardship and curation integrated data profiling and data quality scoring and integrated business glossary on-demand dynamic data lineage on-demand impact analysis for both upstream and downstream and then on-demand mind mapping so that's all i have thank you everyone again i appreciate the opportunity to present and uh i'll turn it over to you now thank you shon this is fantastic and there are questions coming in so just so you all know shon will be joining us in the q&a section at the end of the presentation today with donna so uh well you will be sure to get to those questions and now i want to do i do want to turn it over to donna you all know her you love her she's the speaker of our this monthly webinar series i'm not going to go into too much uh you about her right now so we can move on to the presentation um with that i will give the floor to donna to get today's webinar started donna hello and welcome thank you always a pleasure uh to chat um and as shana mentioned in the beginning and was kind of touched on by the sponsor uh today's topic is metadata something we all know and love um and as you know metadata is as much a technical um challenge as it is a business challenge so today we'll talk about the intersection of those um and as the previous speaker mentioned that you know there's a lot of people involved in metadata and how to get those roles aligned with things like data governance um you all know me and hopefully love me i don't know thank you shannon um one thing it's probably worth noting and full disclosure i am i am x urwin and one of those books is data modeling with urwin so if you're interested in buy the book um you know but a solid product and i see some familiar faces on the uh on the attendees from that so thanks for joining um so again today is metadata if you've missed any of the previous webinars as you know most likely this is a monthly event we often have some of the same familiar faces which is great and often you guys come up and say hi at the conferences which is even better now so if you missed any of the others shannon will send out a reminder and the emails these are all on demand so you can catch them again and again and please join us next month we'll be talking about data quality with my partner in crime from the uk um or wales uh Nigel Turner so i will talk more about that when we close it out so without further ado um today we talk about metadata um and the fact that metadata really is that context around data and you know metadata is one of the most obvious things that we gave the most difficult word for metadata just sounds nerdy so i think we did ourselves a disservice in the industry there but um because it is created both by business and it and there are you know that can be complex you know as as shon mentioned that there's a lot of different types of information a lot of different types of user there is some practical ways to get started um and we hopefully can give you some ways to demystify that in the presentation today so again metadata is that who what where why and when and how of those there are um and i i put this together because i thought it was just a helpful way to think about that because again metadata is sort of a nerdy word um but this demystifies a little bit so who created the data who's the steward who's using it um often just finding out that can be the most helpful piece of information um yes i can read the definition but i have more questions who do i go to right when we think of what that's often what we think of traditionally or least i have in the past with metadata you know what's the definition of this what's the business rule what's the data type what's the structure what's the format is there an abbreviation for a common you know element um and that's super valuable but that's not the only thing part of it is the where when we talk about things like lineage where's the data stored where does it come from where is it used what's the impact if we make a change are there where is this data used in the terms of regional um or security or privacy rules you know is the data used in germany different than the data used in the u k versus the u s and each country has its own specific uh regional laws so you need to be aware of that even more so in today's day and age why is always a good one to think about um and so many of of my uh customers now we're sort of joke we do the Marie condo of data right now sort of why are we storing this data um what is the purpose what is the highest priority no one can do everything so starting with priority is almost the best way to start also there's a risk so um i tweeted this a few weeks ago one of my customers had the best line so i stole it he said someday it is like nuclear waste that you just too risky to keep it just you know yes we might be great to have somebody's credit score and their social security number their social insurance number in canada etc um unless we really need it do we want to put ourselves at that exposure so a lot of companies i'm working with are really thinking closely about that why or storage you know back to the Marie condo uh this stuff is expensive to store if we don't really need it so yes a data lake can be great um but just storing data for data sake isn't always the wisest thing so metadata can definitely help with that uh as well as things like governance to get to that why when super helpful uh when was this data created when was it last updated and then when we think of things like retention um how long should it be stored um what are the rules around that uh if there aren't rules as part of our governance is they're just practical common sense do we need to keep every customer transaction for the last hundred years maybe not um so then if not when do we purge and delete it and then how and there's so many great options now with the cloud in terms of cold storage and different options that you know we didn't have before or some need to be on paper etc etc etc and then the how is kind of like the what um you know how is the data formatted um how is it stored in the database and things like that so i i find this helpful is kind of a holistic way to look at the data um when you're looking at metadata am i capturing everything am i so focused on the what that i've forgotten the why and it's also a nice way to sort of explain it to people if people say matter what you know it kind of explains that in a nice way uh there's several um ways to look at metadata uh as you may know and there's kind of the business view of that as well as the technical view uh all synergy and happiness when those two can be linked together and there's a lot of great tools in the market that can help you do that and that's super valuable of yes here's the definition of i don't know customer identifier and here's the 17 places in the databases where this is linked especially with things like gdpr and the california protection and privacy laws is super important to be really thinking of that where my data is stored across the organization which gets back to the why do i need it stored in this many places or should i master it etc etc so you know business definitions things like what is the customer and we could wax poetic about that all day in fact a webinar one of these webinars a few months ago people had a really fun time in the chat with some of their you know different definitions of common things you know what is a customer what is the product things that in normal life seem so basic until you start doing things like data modeling and metadata and you can see all of the subtleties and beauty of that um and then technical metadata which of course things like the data type the definition the standards the domains is it nullable what are the keys etc etc um there's a wealth of information on both sides um and you want to kind of understand who wants to see which so what you don't want to do is show that this is people all the the keys and the nullability rules and you know there's a good way to to alienate them i guess i would argue that everyone should see the business metadata especially people building the technology but think of the you know the audiences as well um because uh metadata is part of a larger enterprise landscape i mean one way one could build this framework but if you've joined my webinars you see each time because it's something we always refer back to because everything here is so interrelated metadata i would love to see it almost a circle around this entire diagram right metadata helps with understanding the business the business drivers and understands what helps with governance and traceability and lineage it helps with some of your basic data you know asset planning and inventory do i even know what i have and what the data looks like and how it's structured um so it really is key to everything but it is its own practice and its own right and i think sometimes because it's so ubiquitous and so embedded in everything else it's sort of the unsung hero uh uh amongst all of these and you can't do good master data management without metadata you can't have great quality unless you understand the metadata but when it's working well uh it sort of everything goes smoothly and no one thinks about it right i guess that's that's a that's a problem of it in general right when things are working well they forget about you i guess sometimes it's good to be forgotten but um that that is you know so ubiquitous that it's worth mentioning as part of this larger framework and it's harder than ever uh for a lot of reasons and it was touched on by the sponsor that um not only are there more technologies and more database platforms we need to manage and more systems but more people are looking at the data and i think personally that's the bigger the bigger reason as more people look at the data they want to know what it means um so this is a shameless plug for a survey we did with or i did with data diversity a few years ago um that is still very relevant in the fact that we looked at emerging trends in metadata management and if anyone had any doubt and you probably don't if you're on this webinar or listening to it after the fact that metadata is hot and so we wanted to get the metric right we're data people we wanted the metric around this uh over 80 percent said metadata is not only as important but probably more important than it was in the past and again no surprises to me and probably not too many people on the call there's just as you look at the data how can you not look at the metadata in fact i've used this line before but i'll use it again because it was so fitting um when i was in the technical role we were sort of going to the business sponsor and trying to get buy-in for a metadata repository with lineage and a glossary and trying to show that if you want this business intelligence report you need to know where the data came from and what it means and the sponsor looked at us and she said almost was shocked she was from finance and said you mean you're not doing this already that scares me you know when you think of someone from finance they can't get away with that thing well kind of have a lot of money in the bank not exactly sure but you know we don't have time for things like that we're agile we just pay our bills fast and we don't get attention to how much they are and where they came from you know other industries just can't get away with that for many reasons data we sort of have um and i think now that more people are looking at the very obvious questions come up how is this calculated where to come from what does it mean who's using it who owns it stewards it all of that and that's that's metadata um so on that point of view who uses metadata again from the survey there's some nice metrics around that um you'll see that there's a wide range of people who use metadata it isn't just data architects or data modelers or dba's you'll see the business users is the high range up there you know bi reporting isn't a surprise they're data scientists of course um me i did find that interesting and maybe obvious that business users were actually the largest audience of in the survey of metadata um because they're the ones using the data they want to know how that field was calculated what that you know how the data was sourced what it means um so this is data to prove that fact um and again just a little more about that um you know 80 percent of them the users are from the business one of the quotes from that survey um i liked we called out was you know the metadata is that that concept that helps both it and the business understand the data they're working with without it they're at risk for making decisions based on the wrong data right and and so that you know the business you know the business guy in the corner how is this total spale sales figure calculated he really wants to know his bonus might be on based on that or his commission or his job if he's ceo we need to know is this total revenue from or is it total you know from all regions as they have commissions taken out is it retail sales hotel sales etc i mean there's so many ways to calculate total sales um that you really want to be clear on that and i find when you're trying to quote sell metadata often the business gets it more than it does i think it often sees it as a burden oh i know the stuff gosh do i have to tell other people i'm too busy um and maybe that's true um but as more people are looking at it for it uh you you can't live without it so often you know selling into the business is a great way to kind of get sponsorship for these things i thought this was helpful too that was sort of behind the scenes in the survey um i was curious about this other which is you know a fairly good percentage there you'll see it's about 20 percent kind of listed other and i thought some of the answers were very interesting clients external data providers general public a couple of our clients are government organizations that are doing some open data initiatives and you know of course you can't use open data unless that you know what that open data means um and so metadata is just completely obvious to them we didn't have to sell that at all they said of course how could you publish data without the you know the directory of what that data means um you'll see some terms they're regulators and uh you know uh audit and things like data governance often there's the carrot and the stick you know maybe i don't want to publish the metadata but i'll certainly get in trouble if i don't um i thought it was interesting things like research and scientists and students and you know as people are doing research exactly they need to know the context behind this and anyone who is doing research a number used in the wrong context could have a completely different meaning when you did this this piece of research what were your parameters around that data that was collected is critical um and then i also like some of these extras but one of the my pet peeves is we sort of generalize into well there's technical people and there's business people which really makes so little sense it's just as easy for us to say that well you know your your business users might be actuaries which are data scientists before data scientists were around right or your tech your non-technical people might be research scientists well they're certainly technical they're just not a database administrator right um and business people uh becoming more technical and many people from quotes it you know if you're sort of that data architect i would argue sometimes the data architect no more about business than anyone else because they know the data i told this story before but you know that never stops me from telling it again um i did some work at a water company a couple years ago and i was there for about a week and i did my first draft of a data model and i presented it to the business users at the logical level and one of the guys said oh this is great it's such a clear way to explain exactly what we do how many years have you been here i've been here a week and it wasn't that i'm particularly smart you know i'd like to think so but i it's really just those artifacts of my data modeling is that really susses out that core logic of the business um which is metadata um that everyone can understand and i like that last quote you know everybody is using metadata they just don't know it yet um and maybe because it's that funny word metadata yeah well oh you mean the glossary yeah i look at that every day for these acronyms i'm just a new employee i don't know what this term means i didn't know that was metadata i know i'm just using it right so um i wouldn't worry what we call it as long as people are doing it right um so who uses metadata i don't want to over do this point but i thought this was helpful when you look you know everyone could be looking at the same information uh but in a different way you know maybe the del valper wants to know if i change this field what's got to be affected um i sort of wish more folks did this in fact like i keep my client data very very close to the chest but in the past two months the number of major almost auditable breach issues that have happened with several major customers i'm looking for working with um that were caused by metadata you know sort of remind yourself wow this is still happening in this day and age one of them was a developer that changed a field um without governance and without doing impact analysis and shut down the sales system so uh these things still happen this was last year right so um some of them was you know what is the definition of sales or customer uh we had one people twice in the past year people selling sending out mailings to the wrong list of people so people who don't aren't customers get renewal notices or people who are customers get a sales notice i still get those from my bank every every week from the past several years they don't know my system yeah so these things still happen so maybe it's foundational things like metadata um but there's still a lot of work to be done because they have such an impact it's such a simple thing uh to document some of this stuff and to do those checks or it can be um and the the downstream effect can be massive so um and again that last one i mentioned uh that example of how do we get the speed in our company's business terminology often bad is an underthought of quick win um i know the first thing i do when i go to a client is create my own glossary because i'm a big old nerd partly but also it helps me you know the hardest thing i remember when i was first in it the in doing consulting the most painful thing was getting an acronym he's like oh my gosh is this something like xml that i'm supposed to know if i don't i look stupid or is this the qbr division of the company and i have i shouldn't know that because that's something within them so that's why things like acronyms and glossaries and definitions everybody has that when you go to a new company everyone loves acronyms everyone loves terms and everyone use terms differently so creating something that's easy you know could be a share point or a wiki or you know a website um that's some metadata there that you can get some visibility from people who creates metadata well probably the same people right or is that somebody else's job wouldn't it be great if someone else did all those great definitions and you know that's the hard part right everyone knows it's a good idea but i'm too busy and well i know the stuff um so it'd be great i know if you know i clean my dishes in the company kitchen um that helps other people but i'm busy and i don't until you really realize someone else hasn't cleaned their dishes then it's not very nice right so that's the problem you know your data so you're less motivated to document it but once you realize that other people love document their data it's a lot more helpful and then we'll talk a little bit later about this idea of collaboration tools i think that's gone a long way it makes it seem more like a community of information sharing and less like a job that you have to do in a negative task i think is more people see the value and and i share a little bit you share a little bit and that collective sharing becomes a conversation where oh that's how you you define total sales good to know i was doing it differently let's change my report or can i borrow your query and things like that and a lot of these modern data catalogs really have that different approach which i think is great um data governance is you know was talked about in the introduction you almost can't talk about metadata without governing or their sort of overlapping efforts i i would say you know what's metadata what's governance what's the the governance is really the people process roles and organizational structure um things like metadata can help with governance you know a big part of governance is having naming standards data lineage data models metadata um and back to the who creates this i think this chart is a nice way to look at it huge big caveat you know i build data governance models for living in the first thing i would say is that they should be customized for your company this is just an example please don't take this and say well donna said on the slide that the data steward is the one that creates the business rules maybe it isn't in your organization or maybe you use a different term it's these are fairly common enough now that you can't go too wrong we're looking at them but it's just indicative writes an idea um but there's different levels of metadata and different people who have different ownership and they should have ownership or stewardship or an official responsibility to when the buck stops here um so for example the business data owner that's generally i i have that as a higher level manager um there's a kpi i'm i'm having total i'll keep using total sales so i'll stick with it right we're talking to the board about total sales this quarter i am going to be accountable for this calculation and you know i'm not going to change it to make my numbers look better i'm going to be held for that i'm going to make sure all the sales people are using that same kpi that should be a business person yes the the bi analyst might have created that report but he or she is not the one that's responsible for the calculation of that kpi or if those numbers are wrong the buck doesn't stop at the person who built the report the buck stops at the business owner um so kpi's are metrics um also things like regulatory guidelines and policy so you could say yes that could be legal or it could be an external agency but you know if we're talking that we need to be ggpr compliant or there's certain banking regulations we need to follow or i'm a hospital in this hippo rule that's the business that really needs to be accountable for that if something goes wrong um and make sure people are understanding what that means you know hippo that's a data thing but there's a lot more to hippo than just getting your data rules right and that really is a business thing similarly in the steward i generally see that as one level down that might be your definition of the terms or your business rules or these acronyms and maybe it's the owner maybe it's a steward kind of depends how big your company is are those the same person um but probably you're you know if you think of the business data owner as your vp or your your director yes they would want to know the kpi that i'm being held to probably doesn't want to sit down and do a glossary with you um and i've seen too many projects fail where people get the wrong level of person doing the wrong thing where people really do get a senior level c-level staff person saying could you please look at all these data definitions or the lineage of these reports and funny they never want to come back again and you want to get the right level of metadata for the right person data architect i think that's a special role in that they sort of sit between the business as i mentioned they can speak to business but they can also speak tech uh they build things like data models or naming standards or they are often responsible for these metadata tools that can do things like data lineage they do the semantic modeling and things like that um i put this role of a system data steward the guy with the funny hair um i like that icon so i put him in there um this doesn't always make sense and and i yeah you could validly push back and say you shouldn't do your governance based on systems you should do it on business and that's true in a perfect world where everything's ideal and we have a perfect modeled enterprise where everything's clean but so many times your business is running on erp system or your crm system or your you know point of sale system at the front and they're so embedded into your business rules um and they work a certain way so having a person that knows that system and is responsible for the data of that system um and part of the way they're responsible for it is to make sure it aligns with business rules so many times when we come in and and try to fix a problem the problem might be we bought a crm system and it doesn't work well it's probably not a problem with the crm system it may be but those crm systems have certain rules embedded in them and often they don't match your data model um i have one client that's a reference of ours and i'm particularly proud of their small non-profit and they had this problem with their hr system they brought the hr system it didn't work they hadn't sort of looked at it until after the fact once they got more mature with their data they realized the problem wasn't the hr system it was how they had implemented with the data learned their lesson now they have an enterprise data model when they have any vendor come in they say could you please show your data model and how that that um compares to our rules and make sure at least if it doesn't match how your system can be customized surprised the heck out of the vendors um and there was a few little don't you know don't worry you're pretty little head i don't think you mean something like a data model until they pulled it out and said yes here's the one data modeling tool on the wall and we want to know how this relationship is stored in your system and it worked they were actually generally most of the time able to make that but they had to kind of push back so having that data model and understanding your business rules is huge um and that system data stewards should be the one being able to customize that system or understanding it you know sometimes you can't fix the system but at least you know uh where there may be a difference and they understand that and then the dba or the kind of new new fangled way to look at that as a data engineer that could be a whole webinar in and of itself um they're the one super important role who implements in a physical metadata who does the naming standards and the data types and all of that within the system so what governance can do is not only make these roles part of their day job there is actual role in responsibility sometimes that's even put in your hr uh job responsibility um if nothing else is part of the data governance organization you are held accountable for that and that that's so much a part of it everyone loves to read metadata it's when it comes down to creating it and that's really when governance pops in um and this kind of gets back to that of you know avoid that dreaded i just know i mean so much of business metadata is just in your head and i hear that all the time of why why would i document something like part number or customer that seems so obvious but generally just think of it there's some extra step well customer well that's only when they buy a product the other ones are prospects or that's not a member that's a different definition of customer is someone who's a member who bought some products from us etc etc there's so much subtlety so just write it down um and take it out of people's heads and then put it in tools um and that as i mentioned already a lot of these more modern um repository tools or catalogs or if you want to call it really have that way of of doing that sharing and making it more of a conversation and having open editing um well but choose your tool very wisely there's good tools and bad tools and there's um good tools in the wrong use case and i've seen both there's a type of tool out there and there's there's tools that can do both of these but there's a different modality of your metadata just think about it before you implement if you're doing something there could be different types of data in your same organization you could have both if you're talking about things like master data i would think you would want that very highly vetted yes these are our master data fields this is the data type please don't change the length of the product code that's the customer i mentioned earlier that brought down their sales system because somebody did that you do not touch this this is our master data there are rules this is for your information only you now know or things like data warehousing your enterprise data statistic near your kpi's that you're you know sending to the board can't really mess with those those are highly audited please don't go in and change them very different from doing some data science or some self-service analytics and you want that idea of hey i'm using this calculation for net promoter score did you think about asking people when they're walking out of the store what they think rather than doing a survey hey that's a great idea maybe we could etc etc you want that kind of collaboration and you often get to a great answer by having that back and forth and even with the encyclopedia we should have some sort of feedback to say uh guys you have this as a standard but by the way this group isn't using it or that doesn't work for us or something um but think of that carefully before you implement because i've seen companies that were sort of wowed by a great interface with this kind of wikipedia approach um and then they weren't able to do things like gdpr compliance and lineage because it wasn't strict enough um i've also seen people who bought the one that was too strict and then people didn't like it because they weren't able to collaborate and so just neither one is good nor bad you just want to make sure it fits with your use case um or the right use case in your your particular project and that's kind of this point of just find that right balance are you really trying to enforce standards and or are you trying to do collaboration and what's the right balance between that and when you do look at these tools i always recommend um and we've got some templates at our organization of these just just and you can claim yourself easy enough like what what am i before i look at the tool what are my requirements who are the users is it a business tool that a technical tool because it's so easy to get wowed by some of these great user interfaces and they just look really cool and you realize wait that's not really what i wanted so should it go in before you look at it it's like going into i don't know look look for a kitten before you realize that you don't need a pet that was a stupid idea but you know what i mean you see it it looks cute um but you want to actually have your requirements before you look for those tools what these tools can do are great um when we get now to the more to the technical side um it's really top down and bottom up so one of it no metadata is great about the person you know what a lot of these tools can do when it was alluded to in the introduction from the sponsor was that most of these metadata tools have some sort of scanner or crawler or searcher or you all have their own word for it or basically if it's in a structured data source of some sort if it's in a relational database or an xml file or you've done an etl rule with some lineage it can extract that because it's basic there's something behind it it can read and and they you know they're worth their price because they have the logic to know how to read a cobalt copy book and a db2 database and an aws bucket and a etc etc right this is i'm not going to be little the effort to do that um but once they've done that those can be fairly easy um to use in that i don't want to over simplify it but it isn't too far from you press the button and you get a nice schema of the state base um and so that you can also often get the wow factor for metadata of just buying some of these tools to really get an inventory of what you have that's often the first step but somebody created the structure of that database right so um that's often where this top down some of the tools you know the sponsors the data modeling tool those are actually great for i'm going to design the the database i'm going to create those technical standards and the beauty of that is that it's active metadata you press again i don't want to over promise but sort of you press the button and ddl is formed because you've done all the the work in the picture in a model and then when you make the changes you make changes from the model so there it's really amazing things you can do and if you're not looking at that and you're doing a lot of this manually you know business metadata is sort of a manual thing because people and and you know that some of that can be automated but mostly that is a human effort things like glossaries kind of has that people um a lot of the technical metadata should be automated if you're doing a manual mapping and a spreadsheet for all of your etl rules and you're using an etl tool there's probably an easier way um so probably worth looking at this and if you're not using one of these tools for the creation of your metadata you might look at that because it just makes it simpler you can also sort of do the testing before you implement right so beauty of some of these data models is you know i can do the impact analysis i can argue about the table columns with the team and i really get that worked out before i'm actually making this live into production or even into the test environment uh this is a very common way to look at this that almost classic data lineage from here is my total sales again to overdo that analogy um i have total sales this quarter where did that come from what where was the data warehouse field uh what what dimensional model you know what are the facts and and dimensions on here what where did it come from the staging area and what's the source and most of these tools on the market again you have to have some of it automated in the in the back so it has something to read you know something stored in a detail tool or a database or it's going to be lost but so much of this can be automated pretty nicely and there's a lot of uh nice visual tools where both from the business user and the technical user you can kind of drill down either just see a high level lineage like the one i showed and a lot of them can actually drill down and see the column so if you're doing things like gdpr or any lineage or just want to know that your data is right i would kind of look at some of these tools because they they do a lot of stuff um and i okay i'm going to just slightly rant here so often when i show this type of thing people think oh that's very old school that's data warehousing uh which no one does anymore but everybody does still they might do it in the cloud and they might do something with like a snowflake or a uh well i shouldn't mention product names but you know there are more modern faster ways uh to do some data warehousing but the core uh business need of understanding total sales by month doesn't go away um and you still need this sort of lineage and so i do find an interest this is a classic i i will admit we use this kind of diagram in the 90s and yes i'm old um but we this is a classic but it still happens this is something i you know full disclosure from aws amazon glue right it's the same idea i might be an amazon s3 and using redshift i still need a lineage and they might call it a crawler and not a scanner and they have a data catalog not a metadata repository and there are differences between them um but again if there's some structure there and there's some way that this data was was moved you can get it um and so you know think of the you think of your sources do your inventory look at what your data catalogs and your metadata repositories can support you know you don't want to get the wrong tool that doesn't support your sources but there's a lot of great tools out there that can can automate a lot of this again if you're doing mappings by spreadsheet please don't um a lot of probably better ways to use your your valuable brain um and and uh you know sean mentioned the beginning some machine learning and metadata discovery uh i get nervous when i say ai machine learning because that is such an overused buzzword right now but um whatever we call it that i i do think there are some ways to automate a lot of the stuff we did manually you know i remember way back in my early days yes having to kind of map out the ss and the social security number and all the different flavors that might be social security number and that's something that a machine can do really well i think even machines get bored with that for the cartoon there right now but that you know machines can find these patterns of oh i have nn number number desk number number desk number number number that looks like a security social security number in the us maybe that's something i could kind of automate those pattern matching that is great that is something that we didn't have years and years ago and and look for a tool that supports that um but there's also sometimes where you do want to have specific mapping rules i want to know that i want my naming standard to be i don't know um under store c o for company or does that mean colorado or does that mean you know there's certain things you want your own specific mapping rules and again just like that do you want the encyclopedia or wikipedia type repository think of it when you're so many vendors i think is my own personal opinion here um are going so much for machine learning rules sometimes they skimp on the fact when you really want to be able to do that mapping i guess my analogy is a google search is great uh where you just want to say i don't know tell me everything about uh i don't know golden retrievers because i think i'm going to buy one right and you got all this great stuff about golden retrievers but sometimes um you know zappos is a great example uh where i want to say that i want to buy a woman's tennis shoe in size eight in blue and then you can get very specific and i think sometimes when people are kind of well wooed by the google search and then when you start to look at it you realize oh i can't get that specific so this is the same sort of thing i want to be able to have that rules-based approach and just make sure you can do both if you need to do well um just quickly um when you look at architectural office for metadata management there's a lot and it can be overwhelming in there just like anything there's no one-size-fits-all approach you may need um a full on metadata repository or a metadata catalog it's the new word for that and there's a meta metadata and they're hard to get that in there so they have a meta model so what kind of metadata can they store out of the box and then one of the reasons these guys have a price tag that they do is because i've given that thought people are in there and i've written some of these in the past and there's nerds out there like me that it would you know are thinking of how do i do the common store of a column in a database and is that different the cobalt copy book and how do i match and merge all that and they can do that and they often have these scanners where they can scan all that list of you know whether it's aws on the cloud or an old cobalt copy book or a you know in the cloud oracle versus on-prem etc etc when store then to that common meta model so then you can publish it out um and or it can take from these tools specific repository so if you're using an etl tool you can get the source of target mappings if you're using a data modeling tool or bi tool it can source from that but um having worked with a lot of these tools on all sides i've worked for data modeling vendors and i've worked for metadata repository vendors often it's good enough the i'm hearing the what he called the uh the thunder and lightning from the vendors not letting me say this um maybe just the data modeling tool is enough i mean don't you don't always need to buy a full on metadata repository if you're just looking for your tables and columns for some relational databases maybe an html report from that is enough maybe your etl tool has enough visualization for your short stuck mapping for now right maybe if you're a business glossary just put it out in share point and get people using it before you buy anything else uh because these things do have a price tag and they're awesome love to use them but especially as you're starting you don't always need to do that at the same time you know don't start a massive global data governance program you know with excel spreadsheets because that's probably not going to scale or google sheets or you know whatever um and then the other thing that's often forgotten and i think in our diversity type presentations we get so inward thinking with metadata in terms of inside your organization and a lot of my companies my customers are doing this uh external and you saw in the users of metadata that we talked about earlier it could be governor organizations you could be doing an open data initiative you could do doing b2b data sharing so are there kind of metadata registries are there standards you should be comparing against um to really do kind of standard xml or metadata exchange across organizations i mean this is a dense slide that we could spend a whole whole webinar on i know we want to open up for questions so we won't but it's worth thinking of your metadata architecture not just your data architecture if that makes sense you know how are we what is our ecosystem to manage metadata to manage our data and that's when we start sounding crazy with our meta-label levels but it is important to think about um so again just quick summary metadata is a such a simple thing to understand with a funny name who what where why when a lot of people from both business and it both create it um and consume it uh data governance is that nice way to get those roles working together and kind of have the both the carrot and the stick um and then technical metadata and how you architect it is as important because it really helps that human side right the better tool you have for these people to use it makes it easier but just think of that wisely um and try to automate try to automate as much as you can and then allow humans to do that that valuable part of really creating that human metadata so without further ado i will pass it over to shannon just a a plug for next month if you want to talk about net data quality please do and then we can open it up for questions donna thank you so much a great presentation as always and thanks to both of you for the great presentations today just to answer the most commonly asked questions just a reminder i will send a follow-up email by end of day monday for this webinar with links to the slides links to the recording and anything else requested throughout um uh shon the first question coming in here is for you do you have a connector to in corda oh you're on mute shon if you're speaking i'm on now i'm off you now yeah you're good and sorry can you repeat that a connection to what in corda um you know what i don't know i can take that as an action i haven't personally worked with that one we you know we have over 125 connectors though and growing so i'll look at that and see uh see if that see what um what the team comes back with but i can i can follow up on that awesome and another question that came in while you were talking um and donna i know you can speak to this as well as how do you distinguish between data catalog and data inventory yeah i actually responded back to that one in the chat um and and i don't you know donna i think it'd be great for you to respond in as well you know my definition was a distinguishing factor between it was you know think of a data data inventory is really a list of systems for your data stored or derived from and and kind of a technical metadata if you will and then a data catalog is more from a business view and represents what data is available and and how it's being used and who you know who has access to it etc yeah no i think that's a great example um and i think it's a bit of an evolution the inventory is just that if you think of just in the business sense of an inventory is a list of your stuff you know a catalog is more like think of your product catalog that you might see in the web from amazon right it has some context it's an easy way to search it's more business friendly um it has some kind of you know business context around that more to catch you know your your your inventory the list of stuff and your catalog is really a way to present that in a better way from a rules and responsibilities perspective who should be responsible managing metadata and overseeing metadata is consistent complete and accurate donnie you want to start us off on that um yeah i think we talked about a little bit in the presentation but it's a variety of roles depending on the on that's where governance comes in to really create these data stewardship and ownership so from the business side it's probably your business stakeholders the people actually creating and using the data would be responsible for the business definitions from the technical side is your data engineer or your dba that's going to be responsible your your business intelligence architects who's responsible for creating that technical uh data structures and lineage and then your architect data architect is often the person that creates the models or might might help support that glossary sometimes your your group is big enough you have a metadata architect that might just be responsible for things like the repository sometimes that architect data architect can kind of wear many hats um or sometimes the business can you know some of these tools are getting easy enough where the business people or the data governance lead can manage the glossary and things like that really depends but you know governance is a great way to flesh that out taun anything you want to add there uh no i actually i for me it's it's a very difficult you know question to say it can give you a definitive answer right i i've seen organizations be extremely successful in having someone you know on the it side owns part of that are as well as the business side i've seen it done by committee successfully so i think it depends a little bit on on you know the the culture of the company the one thing that standard that i have seen it seems to be pretty consistent though is if there's pieces around governance like ccpa then you know who owns those ccpa terms tends to be a group that's focused on that compliance and and drives that so we typically see multiple business owners are you know owners of the glossary and the term the standard whether that's in the business or on the it side or even on the legal side the good practices to capture metadata i joined recently in a new organization and there's no metadata captured i'll jump in quickly and i'm sure son has um input so yeah there's a lot from the technical point of view there's so much that can be automated and i think that's a way you can look like a hero really quickly um either a data modeling tool and or a data catalog or glossary can often literally just scan these things and create that inventory or catalog depending on which tool you choose and that's such a nice way to just start of even what i have um and then you know things like business glossary is often something you can you know people have to populate it but picking something that everyone wants to know and getting those right stakeholders and start to populate that quickly is often a nice way to get people kind of understanding what metadata is often the business metadata has that bigger impact because people get it a little better and that you know it's more manual but it's easy to kind of spin up quickly if you had other points on that or yeah so that one i'm going to be obviously biased i think i think the i think the you know the the best way to do it is you know whether it's you know you know erwin or you know another solution but to to actually go after a solution that can connect all your sources and catalog them and auto document them for you i think that's the first step pretty much any any organization needs to take is to understand what data you have and where it is and then you can start building on it from there i love it i think we have time to flip in another question here you know um what are the good practices excuse me i've already said that one i have multiple copies of data in various areas without much of a business team um what can we tell our team members business to accept or delete redundant data what's the good process to reduce redundancy without affecting existing processes that's a tough one um i you want to be careful about that i think some picking i think picking something small begin with that people start to understand the impact because out of the gate that just seems so abstract and go into a business person who's busy you know trying to get sales driven to say you know we have redundant data fields the great go fix that it um but trying to show what that impact is and just pick one small thing that's going to be visceral you know we have three customers um customer records for the same person we sent them two campaigns and now they canceled their product because it looked stupid or you know something that's small and then and then i think you mentioned process in there and the question really do a holistic look at that look at how the business processes that affected that what data is what the sources are and do your impact analysis before you fix it because you can break things by trying to fix it without getting that kind of impact so really understand the impact of it get the buy-in from the right people do it through governance and then fix it small little pieces because it's a big thing to fix but you want to get that buy in first Sean anything to add to that in the last minute here yeah i i i agree i think that the biggest piece is the impact analysis and i think that's where the lineage comes in right you need to understand you know it's one thing to have duplicate data but but you know is that data you know where's that data being used and just because it's duplicate doesn't mean that those two pieces of data those two data elements aren't used in different areas for different reasons things like that so you know just because it looks duplicate from a you know a data management standpoint you need to understand the impact of it and i think that's where the lineage comes in to understand you know where that data how that data gets federated out through your organization i love it thank you both so much been afraid that is all the time we have for it and thanks to all the attendees for being so engaged in everything we do i just love the chat and the and the questions that come in as you all know i it's so fantastic and just a reminder again i will send a follow-up email again by end of day monday with links to slides links to the recording there's also a question uh dawn about past webinars so i will also include a link to all the past webinars from donna you can check all those out on demand at your convenience um and thanks everyone for sponsoring today i hope you all have a great day thanks everybody thank you