 Hello and welcome. My name is Shannon Kemp. I'm the Chief Digital Manager of DataVersity. We would like to thank you for attending Database Now online. The first occurrence of this online conference produced by DataVersity. We're very excited to kip-kop the event and have a great lineup of sessions for you today. And of course a special thanks to all our sponsors today who helped make it happen. Just a couple of points to get us started. Due to the large number of people that attend these sessions, he will be muted during the event. For questions, we will have a short Q&A at the end of each presentation today and we will be collecting questions by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag DBNOW. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the chat icon in the top right-hand corner for that feature. For this event, we will send a follow-up email next Monday to all registrants containing your unique login to access the recordings and the slides from today's presentation. Now let me introduce to you our second speaker for today, Donna Burbank, who will be discussing the latest in the database and metadata relationship. To give you a brief background, Donna is a recognized industry expert in information management with over 20 years of experience in data management, metadata management, and enterprise architecture. She is currently managing director of Global Strategy Limited and an international data management consulting company. Her background is multifaceted across consulting, product development, product management, brand strategy, marketing, and business leadership. And with that, I will give the floor to Donna to get this session started. Hello and welcome. Thank you. It's always a pleasure to just talk with the university and do these webinars. And we are happy to be a sponsor of the event. So thank you very much. And as Shannon mentioned, today's hashtag is hashtag DBNOW. If folks want to join us online, you can follow me on Twitter at Donna Burbank if you are a Twitter kind of person. For the right end, because I know there's a lot of you online and we've got a tight schedule today, just to kind of cover the agenda of what we'll be talking about a bit of a different take from the last session. We'll be talking about metadata management. And one of the things that the university always tells me is that every single session they have always has at least one or more questions about metadata. And we'll talk about how metadata is actually a growing trend in the industry. And we'll talk a bit about some of those emerging trends in metadata management. And then the real reason we're doing this is the business value and what that means for organizations, regardless of technology, and how that fits into a wider enterprise data management infrastructure with things like governance and master data. And the fact that, you know, sort of one of the points of this conference is that metadata isn't just for relational databases anymore. So what does that mean with some of these new emerging technologies? And then because of these new emerging technology, what has changed and how we manage metadata? And what are some of the new technical innovations we can maybe look to to manage this ever-growing volume of metadata in the different sources? So without further ado, to jump in, I'm going to reference throughout this presentation paper we did last year with the diversity on emerging trends in metadata management. And there's a link at the back. You can download this for free, the report yourself. But it wasn't a surprise to me and probably to many of you, it's not that metadata is harder than ever. So as part of this survey, over 80% said that it is now more important, if not as important, it's not more important than in the past. And we'll kind of go through why some of those trends are, but wasn't surprised to us, probably not a surprise to you, but it's nice to have that as kind of a documented fact that this actually is a growing trend. What is metadata, right? So you've probably all heard, and I have to say at least once in this presentation, although I for the record hate it, the definition of metadata is data about data, because to me that doesn't say much more, but one simple way if you want to keep the simple definition is really data in context is supposed to business and technical content text around data, or one of the ways I like to look at it, sort of a Zuckman framework of sorts for metadata, is that really it's the who, what, where, why, and when and how of data. So who created it? Who's the data steward if we're thinking of governance, who owns the data? What, what is a big part of what a lot of us think about with metadata? What's the business definition? What business rules? What's the data structure? When we get into the where, where is it stored? What's the provenance? Where's the lineage? Where did this data come from? Et cetera, et cetera. You know, the why is I think something that's often overlooked, and I always focus on why are we installing this anyway? What's its usage and purpose? A lot of us who are in the business of creating definitions know that that's often some of the trickiest part of metadata that what one group might use for a certain code might be what other companies use differently. So or parts of the organization use differently. So often the why is part of the biggest challenge. You know, when was it created? When it was updated? What's the recency of this data? You know, it might be a great piece of information, but it's 30 years old. Is it still relevant? Maybe. But at least we should know where that's coming from, especially when we think of things like open data. You know, the when is a huge thing. Part of it, you know, also the when of how long should we store? What's the storage rule? When does it need to be purged and updated? Do we need to keep it for 10 years or do we need to deal with it every year? You know, some of those rules actually are very important when it comes to metadata. And then the how. I think a lot of us think of how also with metadata. How is it formatted? Just the nitty gritty of, is this a character field or a numeric? And as we know, we integrate data that does become important. How many data sources use the store the same data? Are there 16 different versions of customer ID number? Probably, unfortunately. And do we worry about that? And how do we worry about that? So I find this is a helpful graphic to kind of think of when we're thinking of data. And I think it resonates, a piece of this will resonate with somebody in the organization when they're thinking of, okay, now I understand what you're talking about when you're talking about metadata. It's kind of that context, the who, what, where, why, when and how of data. When we're talking about talking to different audiences, you know, metadata on a fan, I love to say it, it's just a funny term, metadata. But I don't like the term because it just seems overly complex and technical, when one of the biggest users of metadata is business users. And again, from the survey, over 80% of the users are from the business. And I often find, like Carla commentary, sometimes it is the business that sort of gets metadata more than it does, right? So sometimes we'll say, oh, I don't have time, or I know what this field means, or I'm too busy coding, I can't possibly document it. I think those are excuses, sometimes given by IT. My favorite line from the business is we were actually trying to justify a big metadata project so that we can actually see the definition of data and the lineage and where it came from. And the business unit sort of looked at a shock and said, you don't do this already. We couldn't get away with that in finance. You know, I'm not sure where the money comes from. I just sort of store it in bags in the bad room, right? So I think as data management matures and how data is becoming more of a business driver and more people are looking at it, you know, the excuse of not having metadata just isn't an option. Sort of like not having a data trail for your finance department. You get audited, you have to. So I think data is sort of seeing that same need, which is a good thing. It actually is giving some visibility to metadata, which is great. And poor metadata management can be expensive. Metadata leads to better data quality. And when you think of some of these, one of the statistics is that the US economy loses as large as $3.1 trillion due to data quality, right? And it's something as simple as you can't deliver the mail to your users because the address is wrong, right? That might be sort of the boring, banal piece of data management, but that stuff is important. And to get that stuff right, that's often your metadata. Are we all using the same type of information for the address one field? Or are we putting different formats in there? Is the residential address different than a mailing address? You know, some of these basic metadata type questions that could be answered actually has a huge business value. One of the more high level ones that you may have heard of, and we love to mention it because it is out there and it's sort of a big price tag, but you might have heard of the example of the Mars rover, the Mars climate orbiter back in the late 1999, where they actually lost $125 million. They basically lost the orbiter. And the reason was pure metadata issue. So not to get overly technical, and I'm certainly not a rocket scientist, is that when they were sending the data for the thrusters that actually send it up, it was sent in English pound second units instead of metric units, right? So a very basic data type issue of retalking English units or metric units. And because they didn't have that metadata documented, that the system actually went off course. So pretty big price tag for a very simple effort. And I'm sure there was somebody out there that just said, I just know. I mean, duh, we always use it in metric. Or we always send it in English. They just knew. But to document that is very important. So not only was $125 million lost, which is a fairly big price tag by anybody's standards, when you think of it, it's actually the brand and reputational damage that really didn't look good to NASA, who I'm sure are brilliant individuals. And this was a pretty silly mistake. And the other one that gets to me is the lost opportunities for research that you could have found out from that Mars atmosphere. So when you think of a business, the same thing. If we could have better usage of our data, think of all the insights we could have instead of just trying to manage the data and mix it together. So the better our data quality and the better our metadata is. Actually, that's where you start to get the business value. And to be fair for NASA, they've actually got some great open data sets that I love to look at because I'm a nerd. And they actually document the metadata pretty well. So I don't want to totally just smash NASA because they actually have some great data and do have some good metadata. So perhaps they've learned their lesson, as we all do. And as I mentioned, avoid that just I just know. I mean some of the best metadata, we'll talk technology, don't worry, this is a database conference. But sometimes the most important valuable metadata is the personal metadata. Just the I just know. So especially we'll get into when we look at some of the trends. Some of the biggest trends in metadata are some of the legacy systems. Think of the COBOL application that this gentleman wrote. He's like, I just knew. When I wrote it, I just knew. Part numbers used to be what we used to be called component number. Why do I write that down? I just know. But probably what you just know is valuable to document and how long does that take. So that's the fact that we have things like business glossaries and metadata repositories and or data models and or, you know, all of the above stores some of this human metadata, which can be most some of the most critical business rules around your data. Because metadata is really part of a larger enterprise landscape. So metadata in and of itself, you know, I'm a metadata nerd. I actually find it sort of fun to rationalize data types and things like that to an extent. You know, I think we're in the business because we like to organize things, many of us. But done in a vacuum that gets pretty boring by anybody's standards. The reason you manage metadata is to get some of these business insights. So a lot of what we do in our practice is really start with that business strategy. What are we looking to do? And how can metadata support that? How can data support that? And how does that link in with the data strategy? And then coming from the bottom up, what's the type of information we have to manage? And as we'll talk about later in this presentation, it's all not in relational databases anymore. It's big data and unstructured data and documents and all of that, which is part of the reason metadata becomes important. If I'm trying to get that single view of customer and customers in 17 different systems and different formats, that's where metadata comes into play. And the reason you do that, and we'll talk about that a bit from the survey results, to get value. And part of that value is by things like data warehouse and BI or big data analytics, getting that master data 360 view of customer type information. Can't do that without good metadata. And then the governance is kind of the people process and policy around information. And that can't be done without metadata as well. Because a lot of those rules and definitions and context and what is the single definition of customer, that's all metadata. So metadata could sort of be the circle around all of this because it really links that technology with the business, which is one of the values of it. So this is sort of backed up, that picture is backed up by some of the results we got from the recent survey. And I found it interesting. One of them was how are you using metadata, both today and then how are you using it in the future. And I found this interesting because some of the classic ones remain. That things like data governance, master data management, data quality are big drivers for metadata now and they will be in the future. So the little red dots on the diagram are kind of the top five, just to make it easy for you. So that wasn't a surprise. A lot of the projects I work on with metadata are in that category, governance, quality, master data. Some of the other changes we saw, and I didn't see this as a huge surprise, is that people are evolving with a little less BI and data warehouse, although I still see a lot of that out there. It's not going away. And a bit more big data and data science, which I think is not a surprise to any of you as well. And some of the questions in the last presentation were a lot about big data and data lakes and that sort of thing and where the data fits. Also a little less software development, which may be that some of that software development goes into more data science and analytics now. That is the trend. Some of the big three are still the big three, but people are looking at some of the new technologies and things like big data and data science. So that sort of led us to the next question of well, what type of technologies are people using? It's great with the use case, but how is the technology changing? So what types of metadata are people using? So again, in the DOS or kind of the top five, some of the things are classic. Business glossaries will be around, will stay around, and that sort of leads to that first statistic we've had that the main users of metadata are the business people. What do we mean by this term or this metric or this field in a database? Huge, that's very important. You can mix the data, you can have automated data transformation and you can even do some AI and you can do a lot of matching, but at some point there is a human element to what this data means and that just is the reality because data is the business construct. You're using it for business value, so how people are using it will never be automated because that's actually a person's definition of it and that's just part of how the business runs. So business glossaries stay, data warehouse, even though it's a little bit of a disconnect, even though folks are using less data warehousing as a use case in the future, they predict, they still exist, and they're still a big part of the business. So a lot of metadata is being done for data warehousing now and in the future. Some of the things that we're growing, not necessarily a surprise data quality, which was a big driver, and then big data platforms, which I know big data sort of includes a lot of things, which are sort of included in some of those others, things like social media, media files, that kind of thing that you see are also growing when you look at this graph. And you'll see that I also found interesting in the first where you see sort of the big ones are sort of towards the bottom, your traditional data models, relational warehouse, ETL, BI. When you look at the one in the future, it's almost a lot more equity across a lot of different sources. It was sort of almost hard to read the graph, whether it's IoT or media or social media being the highest. There's a lot of sort of equally important things, which makes our job a little more complicated. So that is one of the main themes of this is that metadata isn't just for relational databases anymore. I don't think that's a surprise to any of us, not that those are going away by any means. It's still a huge part of people's business, but I think it's being augmented by a lot of other systems. So bear with me. It is part of my east coast upgrading, bringing and part of my genetic structures. I know I speak quickly. I apologize. I have tried to cure it and I dismayed it. I'll augment it by caffeine, I'll have to say. But these next slides I will go through quickly and by design. There is a full metadata course we have with the university I'll refer to at the end if you want more detail discussing on these and you can download the slides to kind of read through them. But I don't want to stress anybody out by expecting you to read through them. It really is just a high level deafness and these are sort of the types of metadata that we have in our wheelhouse now. And maybe for some of you, it might be kind of an introduction of what are we even talking about with, I don't know, Internet of Things metadata. Right. So I think there's a valid question for a lot of us that might have started with relational and are now expanding into other things. So join me on my whirlwind tour of metadata in this new world. So relational database, that shouldn't be a surprise to anybody. And I put it there because A, it remains and it's still very valid and there's still a lot of it. So you might have your technical metadata which is sort of your structures that employee ID is an integer, it can't be nullable, that sort of thing augmented by business metadata, which could either be in a data model or a glossary or a data dictionary or a metadata repository all at the above. But it's important that, you know, what do we mean by employee? So one part of it is the structure and the other part of it is a lot of the deafness around it that, again, might seem obvious, but if you read it there's an employee that currently works for the organization, that's obvious, but also that might have been recently employed in the past six months. Hmm, that's something I might not have thought of offhand, that some of the people in my employee database have recently been terminated or have left for other jobs but they're still in and that's probably a different from how you communicate to them. So again, that's the business context that can never be automated, that's just how your business works and that's how your definition is. And then there's the actual data. There's actually a person out there that might be an employee and a customer called John Smith. And it's often important to remember that there are people behind this, there are actual instances of data out there especially when we're getting into PII and personal information. Often good to step back and think that there's a person behind this data. So that's relational databases related to that, no pun intended, is data models and they're often related to databases and they store not only the technical structure, the relationships between data, but they also are a good source of business rules around data. So you can actually put some of that, what do we mean by customer? It's a person and organization, not just people and some of the technical structure. I tend to like them because they're graphical and a lot of us thinking pictures. So that can be very helpful to some of these business users who are interested in metadata because often there is that sort of overlap between yes, they know the definition of a customer, I can speak, but also some of the structure is important to them as well and how it might relate to other things. A customer can have more than one account if some of you are familiar with some of those rules. Those are business rules, so the business user can often see that in a data model fairly intuitively. Next on the whirlwind tour are things like ERP systems, Salesforce, PeopleSoft, that sort of thing. There is very important metadata around those and if anyone has ever tried to integrate them with, for instance, with a data warehouse, it could be a challenge. There are tools out there that you can see the business metadata. So for example, if you just reverse engineered some of this, it's very technical tables instead of some of them are even German technical terms. They're hard to understand so getting that business metadata of what this table means. The other big thing is back to the business rules. These systems have built their business rules for a certain way of working of them or the way they designed it. Yours might be slightly different. So if you're trying to integrate how you use customer data with how they use customer data, you just need to understand those differences if there are and be able to integrate them. So that's why understanding the metadata is pretty key. No SQL, a big push in the industry. They are great. They're great for a lot of different things. Web applications, online gaming, online shopping cards. If you're trying to manage a session and all of the activities for a certain user around that session, really great, really scalable. Not the best when it comes to metadata instead of documenting that. If you think it's kind of key value pairs, it really is literally just that. It's kind of the key in value. And the structure of a lot of this and the usage is often in the application code. So not as well suited to things like a metadata repository or a data model around them to get that metadata. And that isn't necessarily a weakness if that's not, but consider that when you're using them, what they're good for and what they're not. Lots of types of no SQL. I think a lot of us probably cringe at that term because it's just so broad. There's so many different things. It's like saying colors are not green. There's a lot of different colors that aren't green out there. But we'll use it because there's sort of out there. But the previous speaker also mentioned document databases. Great way to kind of store a lot of different types of unstructured data in a flexible way, especially when you think of things like social media or multimedia. For example, here's a case where we're talking about different types of things in the museum that all relate to China. There's books, there's artifacts, but a book and an artifact have different attributes. One might have a medium whether it's ceramic or clay and the other one might have a title of the book. But they all have some certain commonalities. Document metadata does have it sort of leads itself better to have a sort of data models or structure around it that you can sort of document the metadata for. Here's an example blatantly stolen from the MongoDB website. They actually have some good examples of how they can model and a lot of the data modeling tools can support that now and kind of having some metadata around that. Big data platform, big data as another one that makes me cringe because it means a lot of things. Is it the data on the big data platform or the big data platform itself? In this case, we're actually talking about the big data platform itself. Really, it's a file structure. It's almost like your file structure on your laptop. The metadata around that is going to be the tree structure of say your ACFS directories. You can get some format statistics around that. You can do some tagging and get some business metadata. But when we're thinking of more traditional metadata, like relational, that's more when you're kind of building a high structure. We're kind of creating relational type structure. Or if and we'll go in our whirlwind tour on some of these others, restoring media files there, there's some metadata about those files. So why it's hard to say how do I get the metadata around big data because big data has a lot of different things. But so you can get metadata around the file structure. You can get metadata around a relational table like structure you create with something like Hive or you can tag the files or you can have metadata about the files that are stored on the platform. So again, hand wave. But there's a lot of different types of metadata there. Jumping all over because that is our reality. So I'm not trying to be overly confusing, but this is we're talking about where is customer data. It's probably everywhere. So many system companies still have the good old mainframe that's probably but to give it some credit, it's probably been running for years and years and it's chugging away. Why we're still using it. But a lot of those people have retired. So often in the mainframe you might have something with the cobalt copy book which is just looking at it. You might not be a cobalt programmer, but many of you will probably not. You can kind of figure out what that is. It's okay. I have a student table with a first name and last name and date of birth, birth and there's some data typing there. So the beauty of being able to see that in some sort of metadata format is you can kind of see that structure. I went through it quickly but in that metadata survey what I found interesting is when they talk about one of the sources that's growing it actually is legacy metadata. In part of it is that I just know that somebody who coded it knew what this meant but that person is long retired sitting on a beach giggling at us trying to figure it out. If you have the metadata around it and it's the next person coming along I'll be able to understand that whether folks are still using this system and need to understand it or need to migrate it to something else and need to understand the metadata behind it which is why I included it. So yeah jumping right from Big Data to Cobalt we're all over the map which is the reality of our world often. So graph relationships it's a little plug I think Shannon will allow me we're actually doing a detailed webinar next week on graph databases and their relationship with metadata. So I'll touch it quickly but they're great there's a lot of detail or uses for them with testing when we're trying to do things like social media connections and that sort of thing really the metadata there is in those connections and the relationships a lot of folks kind of think of the metadata is the database itself kind of the nodes and the relationships between and the different data points is kind of that value of the how things relate together. XML still a lot of it out there and here's an example of so you have in several ways itself documenting so if you look at an XML file one of the beauties of that is I can see that the data in that is okay there's a name and there's an address and there's a city and as a country so in that sense it's sort of self documenting with the data metadata in it but there's also the con concept of an XML schema where you can sort of create some of these standards or the canonical model that the previous speaker was talking about to really understand that we're all using the same format especially when you're sharing information with another company or we actually have an order here how are we all sharing the same information about this order kind of a hesitate to say more modern but when a lot of people are using now with the idea of JSON which is sort of similar or if you have a JavaScript version of it so but the same idea that in some senses it is self documenting you can look at the data here and see that there's a brand and there's a price you can create some tags on it if you wish but there's also the idea of a JSON schema but if you do we'll kind of want to have a canonical format for these so that we're not all having different formats when we share the information so often used in dead exchange Internet of Things metadata so yeah I can have my stove talk to me on my cell phone if I wish right or you can see some of the kind of the metadata that's sent okay so I have some values I'm sending a hundred and hundred and forty but is that the temperature of my stove or the heart rate on my run right so having those tags and if you think back to the NASA example you know I'm having a machine send me values isn't metric is it in English is it what is this number that's being sent when was it being sent how are we storing that date value so even with something like IoT information it might just be simple numbers but knowing the context of when it was what the format it is even what we're looking at is it a temperature or a heart rate that kind of thing document metadata so you know this is of the document data stores that we talked about but this is literally a document right so I'm sure everyone this called because we are all data people when we use a word document we put all of the metadata in there right so we have comments and tags and categories and date and timestamps and who the authors are so some of its system default so you know Microsoft Word does for you who created it and when but you can also do things like metadata tagging and descriptions of the document and title of the document and that kind of thing to help other people who came later sort of understand it or if you want to do the document search that comes in image metadata is why I find interesting you know I kind of talk to a lot of database folks but I was at a party talking to a friend who is a photographer professional photographer and she was talking about the metadata and I jumped on her said you should metadata that's our word what do you mean and she said well how else was you that it was critical to her business for many reasons one if you want to post your files online and you want to get those files found she was trying to get her stock photos purchased over somebody else's it's in the metadata how do I tag that in a way that other people can find it so clearly I am not a professional photographer this is a picture I took at the Data Diversity Conference a couple years back in San Diego not a bad picture but yeah if you're a photographer the front is a little dark et cetera but if you wanted to see how I got that awesome photo from the photo itself there is actually embedded metadata just like the document the word document automatically populates who did it when it was updated well your camera does the same thing so I took it from my iPhone what kind of my lens settings did I use a flash you know because you're wondering how I got such a great photo I know well now you know my secret but you can do that that's sort of the embedded metadata from the technology of how that picture was taken but then I can add on to that what a title was keyword so if I want to post this and have everyone use it as a stock photo I might say it's San Diego Bay location with San Diego so if someone googles San Diego photos they will find mine because it's gorgeous there's also and think of when you are Google metadata photos there's things like copyright and licensing do I want people to use it as open kind of creative commons licensing or not so metadata is huge in this space not only for having people find it but for legal ramifications as well so it's not just us pure data folks that get metadata it's across the board and if you're storing some of these images on your big data platform right good to know what kind of metadata can be found similarly social media has its own metadata and this you could nerd out for a long time on all the metadata from Twitter and some of you may have but you can get a lot of great information again some user defined and some system defined so I actually did tweet this great picture and you can see all of the things about it so you can see who the author was that I knew you can see what the contacts was I did a hashtag so I kind of tagged this for EDW see the number of retweets this is fake news I actually put that in that a hundred people retweeted me I think nobody did but that's okay I know you love me so that's how the user created metadata but there's also that embedded metadata that is either the creep factor or the interesting stuff where was that taken what device there's an ID for each tweet what language was used there's a lot of embedded metadata so if you're trying to do the statistics of who was tweeting from San Diego that day or who was related to Donna who read this tweet all that type of information can be found from social media so if you're trying to do some social media analysis there's a lot of metadata there as well open data so here we go here we give NASA a call out I picked on them earlier so here's an example of some of the open data one of my previous I think it was January I did a webinar with Data Diversity on kind of BI and the importance of metadata and I talked about open data which I use a lot in my analysis there is some amazing metadata but I took some from a source that didn't have this so I did all these statistics and it literally had things like field one has grown in the past 10 years over field two but I didn't say what field one and two were so it was sort of a horrible example of having no metadata but this example is actually pretty good they have who published it when it was published what the usage of that what are the security and usage restrictions how often is refreshed almost all of the who, what, where, why and when that we were talking about and a feedback loop so if you do have a problem and there's an issue you can let them know as well as the data itself but what I find interesting if you look kind of the volume the data is a little pieces an Excel document the metadata is a lot more in terms of volume on the page so it's the importance of metadata the numbers in the Excel document are interesting but only when you have all of the metadata around it so tons of great open data if you have not nerded out yet go big there's some really great sources out there especially if you're doing some data science and that kind of thing business process okay you're processing this seems really old school for an emerging trends above but it is certainly not and I do this in every project I do is even if it's a whiteboard and not going that detail but if we're talking about the business importance of data and you don't understand how that's related to key business processes you only see a piece of the picture so I often will build a little bit the process model and then with that do the old fast and crud matrix so what's created updated read and deleted to really understand how that's being used so where is the customer data being used where is it being updated that sort of thing very important to data integration I've seen some online comments of whether it's pure technically technically called metadata or not but I think it is because the context around your data who's using it how it's being updated all of that so hopefully that was a helpful whirlwind tour so the next question is and we could go I could I could do a whole three hour session and all the different types of metadata and what the formats are and you know probably bore all of you but just suffice it to say when we're trying to manage that you can see why that becomes the challenge because it was hard enough when everything was stored in a relational database and I thought that was ever purely true because there's always unstructured data somewhere but it's just the scale and volume of the type of information we are talking to store has grown and how to get that on one place lots of different ways to manage that and they're evolving and there's no one size if it's all so they can use together they can use one of the sort of the original way or one of the common ways is to have this idea of a central enterprise-wide metadata repository which is sort of like your data warehouse for metadata it's great points it does is that single view of the truth there is some work to do that you need to sort of scan the information in there's some sort of reuse and matching logic you have to have interfaces to these sources but the value is there to do some of the lineages and all of that a lot of the tools themselves have their own metadata things like modeling tools business glossaries BI tools so depending on your usage if you really are just using a single tool it has its own data dictionary or its own I've seen some tools now and kudos to them they actually have their own data quality statistics and profiling and reports on that that's fine again not there's no one size fits all there's metadata exchanges and registries you know if I am sharing information with other organizations I want to have a common JSON or XML schema you can kind of have that through a common metadata registry and some of the value of doing that is you know this is almost your classic and I know there's sort of mixed results in terms of current and future usage of metadata but whether it's growing or shrinking it's still there this idea of the data warehouse or whether it turns into what they like or whatever but this idea of lineage that I have source systems and it may or may not go to a staging area which goes to a warehouse and then I report on it there's a lot of pieces along the way it could be in a database and just the idea that it's called customer on one and custom on another and table one on another there's different formats it may or may not be documented in the data model or transmitted in ETL tool or et cetera et cetera there may be a glossary a business glossary that contains all those so they all have a piece of the puzzle maybe the definition of the metric was in the BI tool maybe that's in the glossary right so trying to get that full lineage of how this data was created what it means how it was transformed is often a big value to different customers getting that can be a challenge but there's a lot of new ways to do that and customer are doing it and especially with some of the regulations this does become important as well so here's some of the ways and it certainly is not inclusive but some of the ways we're able to do some of this metadata discovery and lineage and matching and I'll just talk about each one so the first one is sort of the classic way and the way I grew up on I've been doing this for over 20 years and this is still a very valid way and it has its each one of these has its place so this could have creating some of those matching rules right so that I know that there's this business thing called customer but I know in the databases it could be CUST or customer or C1 and you know some of the modeling tools or metadata tools can actually you can create these matching patterns that I know that the official landing standard should be CUST on all platforms et cetera you can also kind of do some logic okay so I know it has the same name of the column but it also has to have the same data type or the same definitions so you can kind of create these different logic rules based on the structure of the data but you can also look at the data values and see the pattern and some of these artificial intelligence constructs can actually help with some of this pattern matching so you can kind of look at the content of the value and say okay I see a bunch of things in a format kind of name at domain.com I bet those are emails when I can kind of classify them as emails or I see a consistent pattern of number, number, desk, number, number, desk et cetera and that's probably a social security number in the US so you can do a lot by actually looking at the sort of inferring metadata from the data itself which is pretty cool and pretty powerful and does same a lot of effort that you don't have to say that anything that has social security number needs to be named social security it could be called Joseph or my favorite column but you can see by the data that whatever you name it I still know that social security number can't fool me right so that happens place as well there's some cool stuff and also a quick example of kind of when you think of facial recognition you know the creep factor of Facebook knowing it's you and kind of auto tagging you because they can look at your face you can actually do that with data and you can think of unstructured documents that are often very complicated to manage some of the vendors are actually looking at some of those same recognition patterns down at the bite level of your documents and saying this looks like a services contract and then they all kind of you can kind of give it a pattern to match which is pretty cool and then there's the good old fashioned or maybe new fashion depending on how you look at it of tagging right so there's the tagging of you know I'm sure we've all seen tag on the internet here's a picture of my favorite furry kitten or tagging on Facebook or we have some examples of that earlier presentation but you know things like AWS can do tags as well and also some examples of each as we go through so this is kind of that idea of the reuse rules still has this place sometimes you want a very prescriptive way of saying I know that this is the same column and the same database if it has the same name and the same data type and the same nullability if it doesn't match those criteria it's not the same thing so that can be helpful in the lineage to know that I have this idea of a customer thing and where is it the same and how does that get transformed you can kind of do the matching that way you can also do it with some rationalization so that I could say an example on the right I have this table called customer and it's on Oracle but then if I match it with that same table in the data model actually that's where they store the definition of that you know kind of the business definition so maybe we can merge those two into the same metadata construct so again this can get a little complicated but it also has a lot of value doesn't want to get this right you do have that full lineage and you can avoid some redundancies so pretty classic but it's still available today it's still valuable this is that AI and pattern matching that a lot of tools do now when you're looking at this you need to think of your own use case which of these and is it a combination of them that's actually going to help with my lineage or my way of I classify information so it might be you are trying to do something like PII information and I want to see where email stored well I can infer that if it has a .com or .edu or is in a certain format that you can classify that as email address I might be able to say you know these things might look like a physical address that kind of thing so the tools have gotten pretty robust that it is pretty neat that they can do a lot of this pattern matching and then kind of classify and do some lineage there based on the data patterns itself this is one I was talking about just on neat demo just the other day actually of a tool that was doing this so it actually looks at the bite level pattern so you could do something like when you think of unstructured data I scan in something looking like a check that looks like the thing on the top and then it's smart enough to kind of do based on the learning you've given it oh this looks like another check so you can do that with things like the social security number I mentioned earlier it's seeing the pattern you can do it with things like this is a service contract or this looks like a termination letter you know all of these different types of things that you use when you think of the volumes of documents that don't necessarily have structured metadata like a database this can come in very handy and it's just another tool in the toolkit again this may not be a value to you but or and or is a combination of all of them but just as data is changing to be drastically so those metadata so there are some new tools in our tool kit to manage it we think of tagging you know this is almost your classic is you know the definition of this is kind of just a well I guess a non-hierarchical keyword that you can assign to a piece of information so it's like it's like you tag a photo what are all the pictures of Donna on Facebook please don't look I really have nothing incriminating I'm a data person pretty boring but that is literally sort of what it is I put a tag on top of this so here's an example on sort of one of the more modern database platforms something like an Amazon S3 bucket you can create these user defined metadata tags it's really just a key value pair where you can either sign at the file level or the bucket level and you can assign when you upload the object and what's needed about it is it sort of follows that data and lives with it so if I want to tag something as you know a secret piece of information and I move that to a different it is tagged onto that piece of data where I have certain retention rules so I might be uploading data after S3 but I probably want to keep the fact that it's PII or something like that so these tags can kind of be a neat way to do that you know, not just for things like documents and files but you can actually use it for your data things like things like an AWS type bucket so in summary metadata is more important than ever some folks might say because we have these new technologies you don't need it I say because we have these technologies you need it even more right and one of the reasons we need it is for some of these wider enterprise data management initiatives that we're trying to do things like governance and quality and master data and to really get that right you need not only the technical stuff but the business meaning around it as well and there are some of these new use cases that we just need new tools for it because that concept of database is evolving what do we mean by a database anymore right it's not everything that's in a structured format is it my conglomeration of media files that I have out in an AWS bucket is it my data lake that has a bunch of stuff I put out there is it my data warehouse is it all of them and so because we have all these options we need to have these new metadata options which include things like everything from the good old-fashioned matching roles to tagging to image pattern recognition so there's some cool stuff out there if you are interested in any help that's what we do for the living there's our quick sales pitch as a sponsor there's my contact information if I'm not able to answer all the questions today feel free to reach out or buy me on Twitter we did mention I'm sorry I interrupt we have five minutes left for Q and A so I want to make sure we get to some questions here we have a ton coming in I know we've got a lot of good stuff here that you want to we can and I'll certainly get this information out to everybody just a reminder I'll get a follow-up email to everybody on Monday with a unique login to access the slides and the recording so sorry to interrupt this is such a great presentation there's so many great questions coming in I just want to make sure we start getting to some of them do you have an example of metadata describing graph databases so metadata and I will give a plug for the thing next week where we'll have a whole webinar on the graph databases the easiest way is probably there's metadata around the object itself there's also metadata around the relationships and probably the quickest way I think with graph a lot of it is those relationships between the nodes and it's sort of like what do you think of an online social media who's related to whom it's a great example of graph what's not interesting is me as a person but it's the fact that I'm linked with Shannon and that makes me more important right so that's kind of where graphs can come in that's my quick answer to that one that's funny so real world terms have multiple definitions dependent on context what are best practices to address this in data data dictionaries or business term glossaries class example right of the business metadata so different groups have different terms of the different things and there's that can create stress but there's some very practical ways to do it is it the same term used by two different people and you have a different word for it well if so can we agree a certain word if not often I will keep the same definition as preferred word and stay also known as I mean it's more important than we have the information you can battle the rest of the day of whether we call it a client or a customer right sometimes there really is a different definition I see this a lot on business metrics we're calculating total sales differently well you might just be calculating total sales differently there's one metadata tool in the market that actually it does it it basically links the terms to the report so the term is this is how total sales is used on this report and that's at least you know that and you link it or it could be that this is financed total sales versus accounting total sale not the best example you say I mean so sometimes it's a different name for a different thing sometimes you just rename it and sometimes you just need to have that battle of trying to have that common term that the main thing is to know that when we're using it differently we're using it differently I would avoid the kind of the territorial battles and minimum say sales uses the word customer different in the world the way renewal team uses customer and at least say that they're used in a different way and the cloud and certain tools including data prep and bi tools such as Tableau present challenges in terms of gathering technical metadata how are these sorts of things being handled to produce lineage and relationships between tech and business metadata I didn't hear the beginning of the question about what was it Oh yeah so the cloud and certain tools including data prep and bi tools present challenges in terms of gathering technical metadata how are these sorts of things being handled to produce lineage and relationships between tech and business metadata I will give a call out last month we did a good one on that self service data analysis and wrangling and mumbling and modeling and we kind of talked a lot about that one but in general I mean that is it's some of the tools out there have their own metadata so it might be that sort of gets back to the slide where I had it could either be an individual tool or a metadata repository so sometimes within Tableau it sort of depends where people are doing they're doing their own self service analysis or are they doing it on the core database a lot of the metadata tools out there now can kind of have scanners or interfaces so you can say some of the structure was in Oracle or a spreadsheet some of it was some of the business definitions might be in Tableau and they can kind of rationalize it do at the metadata repository level so it I don't want to say it depends but it depends sort of who the users are what they're seeing it's all in one tool that makes it easier a lot of the tools are a lot better about passing metadata back and forth to each other and you can kind of get that common metadata so if I can go back to that slide while she reads the next question I think let me go find it that was sort of this that sometimes there is a tool specific repository but those lines in between there is a lot of check your vendor a lot of them have and especially things like Tableau that are so important they'll have an interface that'll populate either your tool or a common metadata repository Donna thanks as always for this great presentation if you haven't noticed we do a lot with Donna she's so great and well versed in everything that she does so thank you for this and thanks to our attendees for being engaged in our event so far but I'm afraid that's all we have time flattered for this session and thanks to our sponsors and thanks to all of the attendees who have joined so far we have a 10 minute break now where we encourage you to network with each other as you hear us get our next speaker set up the session will begin at 1 p.m. Eastern right at the top of the hour where we will hear Karen Lopez talk about surviving as a data architect in a polyglot database world Donna thank you so much really appreciate it as always and it's just fabulous thank you always a pleasure working with you guys thanks