 And here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for DataVersity. We want to thank you for joining the monthly webinar series Data Architecture Strategies with Donna Burbank. Today Donna will discuss master data management, aligning data process and governance sponsored today by DataStacks. Just a couple of points to get us started. Due to a large number of people that attend these sessions you will be muted during the webinar. For questions we'll be collecting them by the Q&A in the bottom right hand corner of your screen or if you'd like to tweet we encourage you to share highlights or questions via Twitter using hashtag DA Strategies. And we very much encourage you to chat with us and with each other throughout the webinar so to do so just click the chat icon in the bottom middle of the screen to activate that feature. And if you'd like to continue the conversation after the webinar or follow Donna further you may do so at community.dativersity.net. And as always we will send a follow-up email within two business days containing links to the slides and the recording of the session and additional information requested throughout the webinar. Now let me turn it over to David from DataStacks for award from our sponsor David. Hello and welcome. Well hello there Shannon. Hello everybody. Thank you guys for letting me come in and chat. I am David Jones-Chillardi, a developer advocate at DataStacks. And pretty much what I'm really going to get into is just kind of talking about how our advanced state of workloads you know they have really gained a lot in complexity these days and how you can simplify them with a NoSQL platform. And a NoSQL platform today that I am referring to is Apache Cassandra. So for those of you who are not familiar with Apache Cassandra there are some really main items that really kind of come up. You see them there and bold on the screen that 100% uptime zero lock in a global scale. So from the 100% uptime standpoint some of you might kind of raise an eyebrow be like dude how exactly can you claim that? And here's where this really comes from. So Cassandra is a distributed database, right? So it's made up of multiple nodes or instances. It's not a single instance. And that is a masterless and peer-to-peer architecture. So what that translates into is any node can do what any other node does. And I can actually lose a majority of my nodes, data centers, all sorts of things. And the database is still up and available to serve requests. So that's what we mean by 100% uptime. It also scales linearly. So if you want to say double your throughput or double your capacity you just double your nodes. And these can be databases in the size of a handful of nodes to 100,000 nodes, right? We've seen this proven out time and time again over the last decade or so that Cassandra has been around. Now from the zero lock in standpoint really what this comes down to is Cassandra is deployment agnostic. It just doesn't care where you put it. That could be on-prem, that could be in a cloud provider, AWS, GCP, Azure, whatever, right? And so you're not locked into any particular vendor with Cassandra. And then from the scalability standpoint, from the global scale standpoint, separate from the scale, the linear scalability piece that I was talking about there a moment ago, Cassandra allows you if you want a single database it say would span the globe where you can have data centers that are in multiple geographic locations to put your data close to where your users are, you can do that. And you can do that as a single database, right? So you could have one global database that can scale with you. Now if you look at data diversity today, right? And the amount of things that we have to contend with, there is actually a significant amount of complexity that exists. And if you just, we'll just step through these items here. So if you take legacy data integration, whether let's say you have a mainframe system, or you have some project that got grandfathered in, or an example that I actually like I used to work in defense, and there was a particular orbital mechanics platform that was implemented in Pascal. It was a very solid platform, very efficient. Nobody really wanted to change it. And so it just stayed that way. And we had to deal with integrating with it, right? Or could be real time streaming events, like whether that be say IoT streams, event log data, all sorts of things like that coming in. Then anybody today, I know for every organization I've worked at over the last 20-something years, the disparate silo data thing is just a reality. Whether that's through mergers and acquisitions, whether it's different teams with different requirements and different databases that are spread around an organization. And each one of those silos has value on its own, but they have a lot more value if you can actually bring that data together. Then from the data security and sovereignty standpoint, anybody has worked, especially say in the MIA market with GDPR, right? This is something, this is another complexity that we have to deal with on a regular basis, because the security requirements may be different in different regions of the world, again just adding to that complexity. And then from the scalability standpoint, people in the retail world, for something like Black Friday, they have to deal with unpredictable scale all the time. I mean, this can literally make or break a business, right? So that's yet another piece. And then lastly there is the hybrid multi-cloud piece. I kind of touched on this in the slide ago where having systems now that can not only exist, the data is on on-prem, it's on different cloud providers, and all of this comes together to really increase the complexity that we have to deal with. That also then translates into more workload complexity. And the reason why that is, is because over the last so many decades, our data management has really evolved. If you look at how we started with things with traditional relational databases, a lot of times we're working with tabular formats. And then over time that has evolved to where now, especially after the no-sequel revolution where more, you know, more capability started to kind of come into the world to say, go past some of the extents that we had with relational databases. Take a document store, for example. Can I store a document in a relational database? Totally, right? What happens, though, if the document is big enough and I blow out the number of characters I have in my VARCAR field or something and I have to split it up? Can I search through that document in a relational database? Totally. But am I going to get the same kind of efficient search that I might get if I have like a pure document store? And there was all sorts of technologies that kind of came out in the no-sequel revolution. Graph databases are another example of this. Well, what that translates into is with these new tech stacks and these new technologies came different ways of actually getting to the data and stuff where you get these mixed workloads. So then how do you solve for mixed workloads? It really comes down to two things. How complex is your query or how fast do you need it? Complexity and speed. So if we break this down just a little bit, now this is the part where I start kind of talking about the DaySax Enterprise piece. So if you're not aware of it already at DaySax, we're the company behind Apache Cassandra. And Apache Cassandra itself is open source. The DaySax Enterprise piece is now where we're starting to push into the commercial realm and expanding past what you can do with just basic Apache Cassandra. And what that includes are more workloads than just CQL and Tabular and JSON and such. This is where we see search, analytics, and graphs. So now this starts to provide a single system, a single operational data layer that can actually perform all of these different workloads, whether that is from a simple query to complex or things that are at the machine-fast level to offline-fast. And what this really does is open up a whole new set of use cases or opportunities. This is not an exhaustive list by any stretch of the imagination, but I'm just going to start with fraud. So most people can kind of relate with fraud detection. Imagine for a second, I'm in Orlando and I'm making a credit card purchase at some store or something along the lines, and then all of a sudden at that same time, there's somebody out in Texas who's attempting to make many thousands-dollar purchase for a bike on that same credit card. This actually happened to me. Not only do you need that real-time anomaly detection, but there needs to be some kind of follow-up. You have to find out, okay, can I find the fraudster? Can I find the fraudster's network? Were there other illicit activities going on at that time that are related to this particular activity? Something like that. And so this is where, now, having this mixed workload capability along one database, so not just CQL, but things like graph analytics, search, advanced security, as I mentioned, in the case of like GDPR, being able to kind of flip those switches, really comes into play to kind of give you a more holistic view on all of your data. And so with DataSex Enterprise, this really brings down, it simplifies the data complexity. So it takes all those mixed workloads and it brings this together into a single platform built on Apache Cassandra that can scale with you, can handle the real-time intelligence, and is something you can deploy in multi-DC and multi-platforms, again, whether you're on premises or in multiple cloud providers in a single database. So for the time that we have, obviously, I can't do a deep dive on any one of these particular areas. So if you're interested in whether it's Apache Cassandra, things like graph, search, analytics capabilities, so on and so forth, if you go to academy.datasex.com, everything there is free, right? There is literally a month worth of video courses that you can take exercises, all sorts of stuff that you can go into at academy.datasex.com. And also for those who might be interested in DataSex Enterprise itself, you want to check it out. If you go to datasex.com, such downloads, you not only can download DataSex Enterprise, but for those of you who are using Docker or Kubernetes, if you download the DSC desktop, the Datasex desktop, you can use that to click button auto provision Datasex Enterprise with all of the workloads I mentioned. So you've got, you know, CQL, search, analytics and graph, and examples and notebooks and stuff that you can actually load there on your laptop and experiment with. And with that, I appreciate the time today. Thank you very much, and I will pass it back over to Shannon. David, thank you so much for this great presentation and for the Datasex sponsorship. And if you have questions for David about Datasex, you can submit them in the bottom right hand corner of your screen. And David will be joining Donna at the end of the Q&A presentation today. And now let me introduce to you our speaker of the series, Donna Burbank. Donna is a recognized industry expert of information management with over 20 years of experience helping organizations around the globe enrich their opportunities, business opportunities through data and information. She is currently the Managing Director of Global Data Strategy Limited, where she assists organizations around the globe in driving value from their data. And with that, I will turn it over to Donna to get that her session started. Hello and welcome. Thanks, Sharon. Shannon, always a pleasure to do these and thanks everyone who joined. And as was mentioned in the beginning, today's topic is master data management and how mastery aligns with not only data, but perhaps more importantly, business process and data governance. So many of you join these on a monthly basis, and we appreciate that. This is your first time joining us at Data Diversity. The good news is that everything is on demand. And that's often a question that Shannon gets of, will this be available after the presentation? So yes. And as well, the slides, the email sent out, and you can catch any of these on the Data Diversity website. And we hope you can join us next month. We'll talk about data governance. So I will talk about data governance on this call, but I will be a little lighter than I may have been normally because we have a whole dedicated session next month on that. So data governance is integral to master data management. They will not succeed without each other, but we will have a whole session on that next month. So just keep that in mind if you feel that I was a bit light on governance on this session and it's near and dear to my heart. So what is the topic today is master data or MDM as we like to abbreviate that too. So really the easy way of thinking of what MDM is, this is new to you and that's what you're joining this. It's your business critical data asset and getting that not only accurate, but comprehensive view. So things like the classic customers, products, vendors, et cetera. And as I already sort of touched on, that yes, this can be complex and there's tremendous value for analytics and warehousing and real-time operations, but there's a bit of an art and science to how you get that, not only the architectural piece, but the business piece, which is arguably as if not more important. So again, if you've joined these webinars before you may be familiar with this framework, because you need to look holistically, especially with something like master data, which is so core to everything. So this is the framework we use in our practice at Global Data Strategy. And again, it should be familiar as a line closely with the DMBock, the data management body of knowledge from DEMA. But everything should start with a business strategy, especially master data. When you think of what master data is, it really should be how you describe your business, your customers, your products, et cetera. And then how do you manage that through things like master data that require not only architecture and modeling that you'll see there to the right, quality, metadata, but data governance, which is the people in the business process around making that right. So we'll touch on a lot of those. Although the topic is the one in the box there, master data, they are absolutely linked with everything else. So again, you may be joining this webinar because master data is new to you. And thank you. This is hopefully going to be helpful for you. I always, because we're data architects in this call, we love, or especially data modelers on the call, love our definition. So let's start with, I chose Gartner's definition here of what is master data and then what is master data management. So master data is similar, as I just mentioned, how they define it is the consistent and unique form of identifiers that describe the core entities of the enterprise, customers, prospects, citizen sites, charts of account. I had to do my English major there, I got that quite the problem. But anyway, so that should make sense, right? And that's something that viscerally most people understand when you describe that this is what people get because it's stuff they're using every single day. Master data management obviously is the management of master data, obvious definition. But you'll see there is technology enabled, but it isn't a technology exercise. Yes, you need technology often, you don't always, but generally need technology to manage master data. But as importantly is the stewardship, the semantic consistency, the accountability, and the business process around that. So when you talk with master data, and I feel master data, like a lot of things, it's easiest to just explain by giving examples of it. And whenever I do data modeling, that's the easiest way to describe a model, give us an example of one. And then that sort of helps define the entity. Specifically, when I'm talking to executives or they're trying to understand it, they want to see some examples. And I thought it'd be interesting to kind of go through ones we've seen day to day in our practice that maybe aren't the typical ones, right? Because the other part of these is that generally there is a business impact. And I either am a very interesting person or a difficult person to live with because we all have, we come home from work and there's that story from work, right? And obviously I'm working with 4 to 100 companies. I can't necessarily talk about the name of the company and what happened, right? Because there's client confidentiality. So I still have my little stories. There's the cheese slice incident, right? That we all know that we can all understand because we've talked about it. So how is a slice of cheese master data? And why was it a million dollar cheese slice? Well, this was a restaurant we worked with and had a big part of their business. And you think of it when you're trying to describe a business, you basically describe a business by your master data. What do they have, of course, customers and sites, et cetera, but they have recipes and they have ingredients in that recipe and they have menus, all which have cheese involved in that. They have a supply chain that orders the ingredients on that recipe. So long story short, cheese is a big part of their master data. And again, because they had a very customized menu on their point of sale system, customers were able to sort of add different ingredients that maybe weren't on the traditional menu. Well, this particular fancy cheese slice was much more expensive, but it has not been linked with the supply chain and everything else across the business and it wasn't priced accordingly. So a lot of people, because this was a very unique menu item, had ordered it and actually lost the company over a million dollars across the year because that wasn't caught. So something as simple as a slice of cheese can have a big impact. And that, yes, that's ingredient master, that was menu master, that was supply chain component master. And the biggest issue that they had was really getting that business process together. So yeah, cheese is master data. Another similar example was a $2 million baby bottle. And you go to wonder how can that all add up to $2 million? Well, this particular organization sold some of their products, one of which was a baby bottle online with Amazon. Master data is very core to that. If we think master data, we often think of master data within an organization, it's also beyond in our organization. The more we want to have shared data passed across and to sort of have your data on Amazon, you need that in a certain format. Well, the particular company, because their master data was not organized, they were fined by Amazon every time they had to do a change in the data. Something as simple as, you know, baby bottle size, large color, blue and green, price X wasn't formatted accordingly. So there was fines that built up over time that actually were more than $2 million just because simple things like the size and cost and price and color of a product weren't quite right. So again, these things are all so simple yet so complex because because they're so common, they're touched by a lot of people and often when we'll talk more about that is the people in the process, not just the data that makes these things break down. Probably my favorite one is the dead fish over in the right. I love about my job. It's always sort of interesting things. I think I've talked about this one before and I can use this company name because they actually spoke with us, I think it was last year, one day diversity and that's all available on demand. It was the environment agency. And what's so interesting about master data, when you describe the environment agency, they're basically tracking living organisms across the UK. So whether that's a cow or a fish or a you know, biological component or whether that's no water on sea or on land. And really that's what we were trying to understand through master data is what is a living organism. And then someone sort of snarkily said, well, but the fish are dead. So wouldn't that be a dead organism? And of course, I had to jump in and say, well, that was a living organism with a status of dead. There's been a very money price on this. But it was fun. But again, that's a very different example of yes, a living organism. And in that case, was their master data that's core of what they're tracking if the environment agency is environmental organisms, whether that's a fish or a cow or a biological component. So that was a very different example. Maybe some more of the common ones than other company we worked with was a major hospital. And so when you think of a hospital, what does the hospital have patients and providers and doctors and a big part of the doctor was not only getting sort of the name and the address, right? But core to this was credentialing. And I'm sure you're glad that they do this. There's two Dr. Smith, which one is credentialed to actually do my heart surgery. We don't get that wrong. That would be terrible. And that was tied to a lot of things, sort of a funny story there. Actually, the president of the hospital tried to get in with his badge. And after they've implemented this master data system, again, when you think of people in process, the very well-meaning office admin didn't let him into the building because he wasn't credentialed to do surgery. And it was the surgery department. He was quite upset, but I'm the president. She said, but you're not credentialed. She was actually right. But it was sort of a funny story that everyone passed across. They were a bit maybe too strict on that. But I would rather have it that way than the opposite. But when you think of the criticality of master data, yes, you can lose a million dollars on a lot of cheese slice, or you can lose a lot of money on someone dying or a lot of life off someone dying. So often these have a big impact on very simple things like linking a credential to a doctor. Another story, again, when we think of customers, almost the most common master data that a lot of people talk about, we have to have that in here. But again, or David was talking earlier about analytics and graph databases and all the exciting things you can do with data nowadays, and you can. And this example actually came from one of our financial services customers where they managed insurance for very high-net-worth individuals, folks like us that have three different mansions and they're fine art and several different corporations they own. And of course, they wanted to get that type of customer right and know as much as they could about them. So they had done all this great analytics across the web and scraping of data and seeing if there was lawsuits and seeing what companies they own. When they had all that, which was exciting analytics, the problem was when they tried to match that with their own customers, their master data wasn't good enough. So even though they couldn't tell which Michael Jones was the high-net-worth customer and which Michael Jones had just gone bankrupt and was just not a high-worth customer, and that actually prompted a whole master data management effort because even though they could do all of the sexy analytics, they didn't have their core data right. So I think that was a particularly great example. And they were very, as you can imagine, large, well-established companies. And even they was as high-stakes didn't have a great sense of their master data because master data is hard. There's just so many moving pieces. And that's the complexity to get that right. Another one that's common across a lot of different regions, no pun intended, is region or location or market or depending what you call it, location is another tough one. It could be a job site in a construction industry. It could be a store location. And in terms of the environment agency that spoke on data diversity, it was cashments and sort of how do you group these little different living organisms and how do we look at the different environmental areas, et cetera, et cetera. It could be a market. We're sort of a market of reselling our products in and they can often be complicated as well. So all of this is complicated. And if it makes you feel any better at the status of perhaps your master data, I have seen, and again names protected for the innocent, major Fortune 100 companies who have, and you probably have one of these in your organization, master data and a certain name here, there was a major retail company that had the Mary spreadsheet that had all of their store locations in this spreadsheet. So imagine if Mary had left or the spreadsheet were deleted or anything to have happened, that Mary spreadsheet contained one of the most valuable assets, which was all of their different stores across the globe. Probably a more extreme example, it was a utility that did a lot of acquisitions. And one of the cases, some of these smaller utilities in remote areas, their quote, customer list was literally a physical notebook with paper and pencil written down. And so trying to integrate that. It's something to think of. And I'll use some examples of that. It's so nice to think of a nice automated system where master data is merged through flood logic. And you can do all this analysis, but really think of the real world business process, because that utility had all of that. They had a really advanced MDM system, but they still had a piece of paper that they needed to integrate that had very valuable customer data. So no matter who you are, I'm sure that exists in every organization. There's the Mary spreadsheet or the literal notebook of pieces of paper, or the, you know, old access database underneath someone's desk, probably because these are valuable. And I know I'm spending a long time on this one slide. But hopefully these stories sort of help resonate that, you know, there's a lot of different types. Also think of the human factor. There was one large corporation that they had a lot of trouble with the master data per customer. And when they looked at the reasons, it wasn't the technology with the sales folks were putting in their own telephone number, because they weren't going to let anyone else know the telephone number of their customer, right? Think of the motivation behind that, right? They get commission off getting, they're not going to share that one. And so that was more of a governance issue than a technology issue. So I know we spent some time in that hopefully it was helpful, maybe entertaining and prestigious, but hopefully give some examples of maybe master data you hadn't thought of, maybe areas to think of, of are there gotchas in master data that aren't technology? I just kind of want to stress that. Another piece of master data, and we could have a whole webinar on just that is the age old question is what is master data and what is reference data? And I'll just say one person's master data could be another person's reference data. I mentioned location. And that's a classic one that often is master data, the site of your construction areas, your store locations, your attachments for an environmental area, et cetera, et cetera. Or it could be your old fast and standard reference data. I have a list of states and there's state codes. So go out and find the standard list of US states and they don't change very often unless not overthink it. But sometimes you have to. We're working with a media company right now, a market research company that is looking at country. And yes, there's standard ISO country codes, but there's also some political sensitivity of who defines what a country is. That's often, you know, some areas of the world disputed. So you can't always just, yes, there may be a list. But again, think of that usage for your organization. So it really does depend. There's no one final answer of what is reference data and what is master data. It really depends. I'll use that example of that we'll touch on later was flavor, right? So this was a market where I'll be talking with them this afternoon from this actually, they may be on this call that flavor for them is a major master data item because they're tracking flavors and trends across the globe. For a lot of organizations, the flavor is just an attribute on a table. It happens to be the flavor of something on a menu, right? And again, it is so very, you need to understand your organization and what is a good handle on that is just describe your company to somebody, pick out the nouns, and those are probably your master data. I sell products to customer in different sites around the globe. Those were three, right? Product customer site. So you could analyze this too much and I would just caution you not to do that, right? So I couldn't help it as I was thinking of this. Sometimes when you follow me on Twitter, I sometimes when my job can be so bizarre sometimes you think of the things we argue about when it comes to data. And this was a true story. And I realized after I got off the call is I just spent an hour discussing what a mushroom was customer. And for this particular company, it actually made a lot of sense. Was it a flavor? Was it a product? Was it ingredient? Was it a pharmaceutical? Was it, you know, what was it a cartoon character that's trending, right? So that actually was a business reason for us to be discussing mushroom for an hour. But you could realize someone who might have been walking in the room and wondering why data architects are quirky that might have seemed odd. So yes, a lot of us in the call can discuss whether it's master data or reference data, probably for hours, but please don't when you're talking to that business guy in the loader right, who's just trying to get something done. And maybe it doesn't matter whether you call it master data or reference data. The most important thing is how are you managing it? Is it correct? Is it accurate? And do you have a data steward for it? Et cetera. So yes, it's important to think about, but don't be that weird architect that, you know, God, those people are arguing, you know, what a product is for the past seven days in the conference room. Yes, that's important, but don't get that reputation. So again, back to this, this framework always go with business strategy. You know, what is the business reason for arguing this? I was doing a big project at a water company and we architects can get passionate. And I was arguing with someone I said, but why does this matter? Why are you are he has step back and he said, you know what? I don't know. There's no reason. Let's go on. You can get someone to the semantics. He kind of freezes. I don't know. You're right. There's no reason for me to argue about this. So again, whether it's master day, whether it's reference day, think of the why that's with anything. So another example, we talked about customer, but that is the classic example. So we're trying to understand our customer to get that classic 360 degree view of our customer through our data. So as another example, we could be a sporting goods company. And we may have our kind of by our ideal customer with Stefan Krause, 31 years old. He lives in Plotterskina, Switzerland. He's a ski instructor. He works at St. Merritt. Like this is great. He'd be our perfect customer. We want to know everything about him, what he's purchased, how he wants to be communicated with. Does he want a text message or a letter, how much, you know, a center, but when you look at it, he purchased about 500 euro of gear in 2015. He looks good on paper, but, you know, he probably has a lot of free gear because he is a ski instructor. So maybe he isn't our best, but if you're trying to get a good sense of your master data and you're looking at the data, there's another Stefan Krause and he's 62. And if you just kind of looked at him in the service, you wouldn't think for a sporting goods company. You know, he's a banker in Zurich. He likes to watch football on their European football and television. He wants all everything on a letter and his secretary opens it with a letter opener. And he wouldn't think he's like our outdoor guy. You know, I probably wouldn't put him on the brochure. No offense. He's like a nice guy. But when you look at it, he actually spent, you know, 3,500 euro. A lot more because he just buys the fanciest equipment he has when he goes hiking once a year in Rome and, you know, he spent a lot of money. So you want to be able to understand that. So to be clear, and I want to make this clarification, this 360 view isn't master data. That's your analytics around master data. The master data is who's Stefan Kraus? Is he Stefan Kraus the banker who spends a lot of money with us? Or is he Stefan Kraus the ski instructor that's 31 and doesn't spend a thing, but he uses a lot of our equipment? Maybe we want him as a spokesperson. But maybe not. He's not our best customer. How do you do that? There's a bit of that putting all the pieces together. That is the hard stuff. So a little bit more on kind of definitions. And it helped me when I was first learning about master data. And master data is one of those things. It's so simple yet so complex. And if you're joining master data for the first time, it's probably a lot of the pieces of things you already know put together, like relational databases, like data quality, like matching rules, et cetera. But it's good to get your definitions right. So transaction data, that's more of your actions that someone bought some information. Your master data are your core nouns. They're more static. Your customer, your product location, et cetera. So if we want to look at this example, which is the retail transaction for the Sporting Goods Company, you'll see that the transaction data is the fact that Stefan Kraus bought some telemarked ski boots in 2017 instead of Donna. And those are the transactions. The master data are going to be the things that remain, that we have a product and there's a telemarked ski boot. But hey, the code is different. Is that a typo or is it because, well, one's in the U.S. in Boulder and one's in Switzerland and maybe they have a different product code, et cetera. So there's certain things that are your nouns and there's certain things that are your verbs and your actions. And one is transaction data and one is master data. And they're very different and you just don't want to mix those two up. So again, the transaction data here is going to be the actual purchases. The master data are the things that remain. So the fact that I have a customer and which Stefan Kraus was this, was this the banker or was this the ski instructor? Do we have certain products? Are these the same products sold in Europe versus the U.S.? Is it a different code? Is that a title? How do we price them? All of that. Just the locations. Now even just when we think of metadata, is that the location where Donna lives in Boulder, Colorado or is that the location where Donna bought that project product? What if Donna, Donna, I can say my own name, it's been quite a day, is Donna visiting Stefan in St. Marist and bought the product there or does she live there, et cetera, et cetera. Is, here's a two-character field. Well one is a state or region code, CO, Colorado. One is a country code, a CH, which is Switzerland, right? I mean that seems so simple but that's a big difference. And you can see just a program could look, okay that looks right. There's a two-character code. It validates against something. But a very different business meaning and you don't want to mix those two up. One's a country and one's a state. Very different things. So again, hopefully that's a little bit of a primer there that kind of helps explain some of these terms and then hopefully why they make a difference. So how do we manage all of that? And there's many different ways and we could argue whether it's distributed or centralized. I'm going to kind of talk about the centralized model. I think that makes a bit of sense. But I will just start with please don't put it in the spreadsheet and please don't put it in the notebook with pencil and paper which I think is obvious but you will still find that in your organization. So here is a classic view of it and again there's different flavors of this. But if we think of master data, one way of looking at master data is how do we get that quote golden record? How do we know that there's one Stefan Kraus and he's 31 years old not 62 and he lives in Pondrasima. He doesn't live in St. Maris. How do we get that? Because the problem is there's CRM data. Maybe he talked to a salesperson in a store in your customer relationship management. Maybe he bought something in the store. Maybe he bought something online. Maybe we had marketed to him in the web. Maybe he bought something and he's in the finance system and we had to order and ship that to a special order through the supply chain etc etc etc. We want to report on that in the warehouse later. How many products were sold by country, by product type etc which is different than master data. They are related but different key distinction that a lot of folks get sort of wrong. And then you have your reference data which again we could we could argue about but it kind of like the little brother or sister of of MDM. A lot of the core concepts are the same but again that might be a state code versus a product code whereas product would be your master data and maybe a state code might be your reference data that supports that master data. That makes sense. So why is that so hard? It could seem on the surface. A lot of these master data problems are what I kind of turn into my little rant. Again I'm fun to do the parties but you know I just changed my address for my insurance company and it's taken me three years. Why is that so hard? Well because there are different systems and insurance you know the year address changes the rate and affects the different system and so but so many of these things that seem obvious. I mean my credit card company they will rename nameless but I've been complaining about them for about three years on DataVersity that they don't realize that I have their credit card and I get a bill and then the same mailing and I also get an ad to get their credit card and so their marketing does not talk with their finance because I am a customer and they don't seem to realize that. They're master data for this particular company. It's not great and that's why there's just different systems. The other complexity is that each of these systems has its own functionality and their own associated data model which isn't wrong because they were built to do a thing right. So marketing was built to do marketing and CRM was built to do CRM. So they might have your first name and your family name and you're still a lot of the issues just with that so the data architects and the data modelers in the call are probably twitching with excitement. This is where we live right is it first name is it is it family name is it surname is it last name how many characters is that that is all important and that's what you can help rationalize in the golden record but it is it mean the same thing is it stored the same way and then which one is the right one right you might have three different names and how do I do that matching but so you need to understand that one of the first steps in master data is understanding what your source systems are and again this is a perfect well not even a perfect world but a world where everything is automated and again not that I've ever made a mistake on a consulting engagement but had I ever tongue-in-cheek um we were ready to implement one of these and then we were sort of doing some lost analysis and people said okay here these are all the systems but yeah we don't actually use that marketing system we actually use the spreadsheet over here to actually put the data in and we had done a full credit matrix and all that we we hadn't talked to all the stakeholders and it was an entirely missing system so this may seem sort of boring and you know repetitive super important to just even understand where all of that source data is and what business processes touch them so once you have a sense of that the next piece of that is what do you want your target model what do you want that golden record to be it isn't everything I would generally say it is not everything it's that golden subset of what is the most important cross-functional information and this is a discussion and it should be governed and it should be a cross-functional where I also see MDM go wrong is that people didn't talk to everybody is that the right set do we all mean the same thing by family name or first name or you know how do we want to store that get that right I was once tackled by a woman I was holding a pen trying to uh to uh write this out and it was so passive she literally pushed me out of the way and sort of was wrong I mean this can be very you know political and then people you want to make sure everybody's involved and you want to make sure that this golden it can evolve over time you don't have to get it all right so I would sort of say start with a subset and grow it's hard enough to get the subset right but give a lot of thought to that as well and of course the reference data that will go with that once you have that that's where the MDM especially if you're using a tool can help with the matching rules to create this quote golden record right so and I could spend all day on this one slide don't worry I won't right but so it could just be what is the first name of this customer is it John is it Jack is it Jay is it John with two ends or one in et cetera et cetera right just getting the first name of this customer what's the right one and should one of those be a nicknames stored as a different field et cetera that's where these tools especially with names can help automate you may need a data steward to look at that who might want to know and validate that um what's the best sometimes you can do it by we always know that the CRM is the right source of truth because there's a business process around it et cetera et cetera et cetera but how you get to that golden record again could be an entire webcast on a zone but that is kind of a magic sauce that a your business process needs to be good to understand how that data source be you need good data stewards and data governance that can help understand what this data means and often you need a tool that can help with those matching and merge roles and then really get a lot of that together so that was the super fast and I know how I talk fast um version of mdm in a nutshell and that's only a piece of it but if we want to kind of drill into a little bit of a how do you do some of this fancy matching right so a couple ways um sometimes you could it wouldn't it be perfect if everyone were born with a customer ID on their forehead and everyone to be matched that way that's not how the world would be and that sounds a little scary actually um but so you don't always have that nice perfect key so often you have to have an alternate key of could I match on date of birth and in the U.S. social security number or social insurance number in canada etc first name last name so that's a bit of an art and science as well you don't want to over match and maybe lose some candidates or just have too many rules that slow things down um so you don't over match you don't want to under match and just say okay let's match on first name because I bet there's a lot of jobs in your company right so um getting that right how you get that right some of this tech you can do some data profiling you can understand it often it's your data governance committee or your data stewards and data owners that can help understand the data and this can be an evolution over time generally I would say you may want to get this backwards in my head you want to you want to make sure you don't have too many residents that are not right so you want to be a little looser in the beginning right because you again if when you want to automate this you don't want to sort of over match people that are going to you're going to mess up your data later so and often you can have a human in the loop in early stages because you don't want to automate everything all at once you want to have that kind of learning happen I mean you can use technology for this and there's fuzzy matching which you can either code or while the tools have some of this and some of it's the obvious things that are easy to kind of automate is street also s t with a period and also street or you can also put in common names it can kind of see things that are similar and again that is something that you can some can be you know i can auto approve if it's street with a period and street without a period yes i can just have them do that i don't think there's much you know the the human to look at but maybe tim or timothy or those commonly or his main street the same as mainly street or whatever you may want to have some people look at it especially as you start to train those rules and do that but again this doesn't have to be people looking at this you know been around a while and probably horror stories of what goes wrong with master data and again none of this is brain surgery it's you know it's not probably checking all of the boxes and one of the things that have gone wrong in one company is they actually had fairly high level business users actually even up to the vp level looking through all of this data to see is tim and timothy the same person in the street and st the same thing absolutely colossal waste of business users time and finally they never wanted to work with the data team again so automate where you can and use your people wisely because they're valuable and make sure that when you bring it this human in the loop they're they're doing something that really is tricky that maybe you need to look yes hey there's a john there's two john smiths at this household 162 and 123 so either that's a typo with the birthday or as the father son maybe it's an insurance company we need to go to the agent and when they can ask right um that's something you don't want to get wrong because that's two customers or one customer and that's a relationship you really need to understand so that might be a perfect example have a human in the loop um but having a human loop taking the period off of a street breast tea versus street um probably a not a great use of people's time so give that some thought but don't over automate don't under automate if that helps i know that's not as maybe specific as it can be um so um once you have master data in this nice golden record one of the nice things is applications can use that so your classic and you probably do this all the time when web lookup forms right i start to type in my name it's an autofill or you go to the doctor's office and instead of having i don't know i have good doctors and back daughter because you go to the good doctor and they can you know say oh you still live in 101 main street yes i do thank you the the bad business process is you have to fill out that same form often with pencil and paper every single time you go into the doctor that's not a great customer experience so that lookup is really master data that's the hey do you still look at main limit main street because you can look up at that nice kind of centralized golden record um i mentioned that mdm is different from reporting and it is but if you have a good master data that can feed your your sort of dimensions in your model so if you have good you know we're thinking of reporting sales by customer by product by location all of those are your often your master data areas that could be fed from your mdm hub um so i want to stress that and i could stress that again because i think there's a lot of confusion in the market and so you're david in the beginning was talking about all these great technologies that are great there's graph databases and there's big data and they are very different from mdm they are enabled by mdm so once you have a great golden record you can do great reporting you can do social network analytics right but and and you can do some of this without master data but it's a lot better with master data am i linking to myself because you know i i have too many donna verbanks and i think i'm related to myself rather than i am myself um again one of the biggest frustrations of any data scientist using a data lake is really trying to understand the data and having good quality data so these don't these aren't mutually exclusive you could have your mdm system feeding your data lake with a great list of customers and then do sentiment analysis off those customers so if you don't know who your customers are in your master data it's hard to do sentiment analysis and what they're saying right so that they it's not either or they absolutely augment each other but they are different things so don't get them confused so i've mentioned this but i want to stress this again and this was again a a gartner report that talks about what i have been saying so i want to stress it that yes technology is important technology is complicated in mdm technology is only a piece of the puzzle and when they looked at this is gartner that looked at top reasons of failing for mdm systems it was business failure to align with business process and not having the right data governance in place and i would say that resonates with my practice of the hardest thing to get right um some of the success stories we've had i don't want to make it sound all bad generally uh when master data sings well it's a huge success um but the ones that are success generally started with a good governance group in place first and a good architecture in place it's hard to build mdm on a shoddy foundation so i wouldn't pick that i wouldn't pick if you're just starting out into anything data say let's master all of our customers they want us your first thing you want to build some steps into that and the steps are often architecture do you have data models you understand the rules in the architecture do you have data governance do you know who the stewards are would be for all those systems and do you understand the business process how that data is populated so again when i find interesting and i'm a semantic kind of person when you talk about different rules or different terms what it means often there's an entirely different group in the organization that does master data that's from the the business in quotes right what does that even mean right but there's a a business process around master data it's often very separate ironically from the technical master data team because the business people think yeah that's my master data that is to come the data that runs my my company maybe it's an ERP system and i have rules and processes around that which is correct and then the data architecture team says oh mdm is the thing and the tool that has match merge rules and they are correct also right and so that's where the governance and then the collaboration really makes sense you need all of those and make sure you don't i i've seen and we've all seen it all i've seen companies with three different customer mdm systems i have seen customers with a a business mdm and a technical mdm but don't talk and that's just you can imagine it's silly right so that's we're getting the governance and getting people working together so some tools that help with that big fan of business process doesn't have to be massive doesn't have to be you know an entire enterprise version of all of your processes it could be a simple swim lane diagram so how does the data get into the system who touches it how is it used are we talking to all the people on on all ends of that where it's used how it's sourced another great tool that i'm a fan of i think i've said this one before it's a terrible name for a great tool it's a good old crud matrix um drunk or we could kind of reuse those to switch the words around because it's a great tool and it's so simple but so so valuable and i often get you know questions of where to start with data governance well a great way to start is look at all the different systems get the data model for those systems what are the key attributes that you are it feels you want to track and then let's just do a simple matrix like that so for the product name where who creates that product name where is it used where is it updated so where's it read and just start doing that mapping and get that right and that's a very simple tool but it's entirely valuable and often where i see it's old fast and it's been around forever but it doesn't make it less valuable and often where i see things go wrong it's the simple thing like i mentioned that one project went wrong um we'd forgotten the simple system that had updated the system or populated the system and we weren't aware of it that's where things can break or a user that you didn't talk to maybe they're just reading it they have an entirely different meaning of what product name is and and that is true i add there's a marketing product name there's an internal product name there's a product so any of these things your will be made it says a different definition with different groups so that's where governance is comes in that you can talk to everybody and get their viewpoint so when everything goes well and i know this is complicated but it doesn't have to be there's different layers of it is there enterprise why are we even prioritizing what the right business domains are and i would i recommend there's a lot of things to do don't do them all at once pick something and pick something i maybe don't start with the biggest one maybe don't start with customers the first thing you do maybe you pick something like site that might be a little easier get your your you know processes and tools around that do you have a conceptual model that even though what a site is if you define that what a customer is do we map that with our customer journey or logistics do we understand the business process do we understand the overlap with other areas and often especially if you're a large company how do we get there's no one answer do we do this with a particular region first do we want to do a slice of it globally i would say there's always that it depends but i would say always do the conceptual model globally you want to understand what customer means across the globe you may roll out mdm in different regions but you know there's going to be different rules right do we have the right ownership and both on the business side and the technical side and then do we understand the logical rules the match merge criteria the survivorship there's a lot to getting that right the business process workflow and then of course the technical level which is don't start there right do we profile the data to know do we even have an email address that's a part of it maybe we want to match on name and email address and phone number you might look at the data and say well that'd be nice but we don't have an email address populated so we can't so all of these interact you really need to do all three layers not to get this right and singing in the organization so i do want to leave time for questions because generally you guys are not shy crowd which is great but i want to leave you with a case study and this actually is is the cheese incident case study to kind of bring it back full circle so we worked with a major restaurant chain the fast casual restaurant chain here in the u.s and that what was interesting about this is we didn't know what this was if that makes sense we came in we just knew there was a problem and marketing was the one that sort of came to us with this and gotta love this i find this a lot and a lot of people across the organization that have data models and data flow diagrams who have even known what they are they had a white board that showed their menu data and how it was populated and they look at this whiteboard and fix it and basically the data flow diagram that they couldn't get a great view of all the ingredients and products and prices on their menus which kind of caused that cheese incident because someone had sort of ordered something on the point of sale system that wasn't priced right in supply chain that was listed on the menu but didn't necessarily match what the chefs had created in the field so yes this was a master data effort we didn't know that up front we went to the ceo to get this approved as you can imagine this was a fairly expensive aspirin she had the same question all CEOs have what the heck is master data and why do i care so we literally did a slide with a slice of cheese in the middle that had all the cost of that slice of cheese the processor on that slice of cheese so owns that slice of cheese and she got it she said great man her next question the data said what other master data should we be looking at and i thought that was a wise question then we did supplier and then we did the location etc etc so a lot of that was a business process model to really understand how everything fit together so this was a great example of where it was more of a business process issue than a date the data was fairly simple itself it was you know a slice of cheese and the slight off there's a whole lot more to slices of cheese than i ever realized this whole database is out there different types of cheese that wasn't the main problem it was more of the process around it so but that was kind of a funny example that was interesting so summarize an mdm is hot i think because more and more companies want to be data driven more people want to do analytics they want to do this great you know graph database and central media analysis and fraud detection and all the stuff that was talked about in the beginning of the call and then a breakdown when they realized i don't even know who my customers are i don't have a list of sites etc it's the dirty details behind everything to get it right but when it is successful it's super successful and that really helps the company sing to get it right you need everything between the the data the process and the governance so before i open it up to shannon with the questions just to remind you if you felt we were light on governance there's a whole topic next month on that just a quick plug we do this for a living so if you need help with governance know where to find us and then i will pass it back to shannon if we have questions from the group donna thank you so much for another great presentation as always and just to answer the most commonly asked questions just a reminder i will send a follow-up email by end of day Monday with links to the slides and the recording of this session so you will indeed get those and if you have questions for donna or david feel free to smithman the bottom right hand corner the screen in the q and a section so diving in here um among the business data what belongs to master data and what does not belong to master data i guess back see i knew that would be a debatable question that we should have talked about in the beginning of together depending on the organization some is master data some might be reference data um in these were some of the examples we showed i think a core definition of master data would be it's used across the organization it's critical to business uh process and um yeah dependent on it's fast changing and that might be also where it's different from reference data where you know maybe state codes how often do we get a new state in the us right not too often um but your customer change all the time and in a quick way to do it as i mentioned before might be just be describe your company to somebody you know we have a clinic that supports doc you know that has doctors that support patients across different regions and there was three right we had doctors who are your providers and your patients in your location or you're selling products to you know customers there's products and there's customer so that's probably the easiest way to think about that um and it's generally across different systems and then they're kind of highly used the drive to business if that makes sense another slide we showed they kind of explain that was uh this one they kind of talked about your transaction data and then from that what might be master data so hopefully this helped hi there no i actually don't have anything to add to it don't just it all right i love it all right so um moving on to the next question then given the way that you are explaining these examples would you say that reference data is just a subset of master data that hierarchy of regions markets locations et cetera seems more for the realm of rdm no yeah that could be again wicked wax poetic but i think in the fact that's almost how i threw it here is that the reference data is kind of a little cousin of master data i think you know part of it is uh importance to the mind what's the importance of business everything's for but you know does it change very often are there often kind of another way to think of it often if there's kind of an external standard like iso country codes or you know things like that kind of code lists are commonly your reference data i again you know over generalized because you have product codes and that's your master data but that's kind of a nice one i mean it's almost what it says it's the reference data things that are often referenced by master data that makes sense indeed and david i'll let you jump in whenever you're you feel the need to add in so um so then is it imperative to have a master reference data when implementing a master data management project see did i call it that i said in the beginning that we're all architects and we could talk for an hour on master and reference data so i would say yeah yes phases i mean don't stop doing master data if you don't have reference data but if i go back to that slide that i think i had i mean if you're trying to master customer here and part of mastering customer is location you need to have your right location codes is do we have our country codes right do we have our state codes right etc or even you know or you could even say metadata how are we doing dates is european dates is at us states etc etc so yeah it's all it's all related so it's hard to get good master data if you don't have reference data because generally your master god we said before we start sound crazy when business people walk in the room right so be careful if you're if you're a technical person on the call it's hard to get your master data right if you don't have the associated reference data because generally a field in one of your master data fields is a i mean referencing reference data i don't feel like i said that eloquently but i think you know what i mean works i love it uh you know we've got just under a minute here but let me see if i can flip in one more question we got so many great questions um we're in need of mastering uh product that focus on this on is on reference data does it make the sense does it make sense to start with considering software as a solution for reference data before mastering product i think you're doing this on first list putting all the reference data calls but maybe this just we talked too much about i'm not sure what they're saying they wanted to do it use our products to manage it i'm not sure i understand the question it doesn't make sense to start um considering software as a solution for reference data before mastering quote-unquote product uh i yeah i i suppose i i would probably start with the master data well because they're all related i would say what are we even trying to track with product understand the core fields and then as part of your mastering of product you want to make sure you want you know data quality is a part of this so you have really bad data quality that's a part of master data but it's not you have to they're all related right so you can't say i'm not going to do my master data until i the data quality is good you're creating the data quality as part of mastering if that makes sense so i would see them as related because part of master data is even one of those core fields so you can start that part i mean you may not start the full life cycle of mastering which is the something i didn't talk enough about perhaps which is the publish and subscribe you know am i am i storing it am i mastering it and am i pushing it back to systems to actually use live in the end you're you might want to hold off on that to your reference data is running but the idea of understanding the source systems and where things are because you might be a reference to something that really does need to be a master data field if you're not really looking at holistically i think i rambled there as well but hopefully both pieces of that might have been helpful no that's great well donna thank you again so much that does bring us to the top of the hour and david thank you so much for joining us and just again a reminder to everybody i will send a follow-up email by end of day monday with links to the slides and links to the recording to all registrants um and thank you so much for all our attendees for being so engaged in everything we do i just love all the great questions in the chat that's been going on throughout i hope and you all have a great day and stay safe out there donna and david thank you so much and thanks to data stacks for sponsoring thank you shannon thank you donna take care everyone thanks