 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for DataVersity. Thank you for joining the latest in the monthly webinar series, Lessons in Data Modeling with Donna Burbank. Today Donna will discuss data modeling and MDM, just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar and we very much encourage you to chat with us and with each other throughout the webinar. To do so, click the chat icon in the upper right hand corner to activate that feature. And for questions, we'll be collecting them by the Q&A section or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag lessonsdm. As always, we will send a follow-up email within two business days containing links to the slides as well as the recording of this session and any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience in helping organizations enrich their business opportunities through data and information. She currently is the managing director of Global Data Strategy Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. In fact, she's going to be, you can meet Donna in person at the Data Architecture Summit 2017 in Chicago, November 13th through 16th. She'll be giving a talk on designing data governance, metadata and quality into your data architecture as well as leading a panel discussion pulling points from new data diversity research paper on data architecture, which we'll be releasing within the next two weeks co-authored by Donna. So keep an eye out for that. And with that, I will turn the webinar over to Donna. Hello and welcome. Hi, Shannon. Thank you, as always. And just to continue on what Shannon had mentioned, we are also on Twitter. So if you want to follow me, if you're a Twitter person, I'm at Donna Burbank. And then there's a hashtag today, which is lessons DM. I know sometimes folks like to continue the chat online. So if you're one of those people, please go ahead and do so. So as Shannon mentioned, this is part of the data modeling series in on data diversity. And thank you to a lot of you that attend each month, which is great. And I've also seen some new names on the call as well today. So thank you for that. So for those of you who are new and you might have missed some of the previous webinars, they are all available on demand on the data diversity website. So if you want to come back and catch any of them, as you know, and as I've discussed in the past call, I am a big fan of data modeling near and dear to my heart. But data modeling is only interesting when you're doing it for a purpose, right? So we're trying to kind of spice it up and really talk about data modeling in the context of other things, enterprise architecture, business intelligence, graph data modeling, all that sort of thing. Next month is on agile in data modeling and do they work together? I hint, I think they do. If you want to learn more about that, that's in October. And then in December, we'll be talking about how data modeling applies to data quality and governance. The Descending Medicine will also be talking about in Chicago if you join us at that conference. So do check those out if you are interested. But today, we're talking about master data. And I think probably a lot of you are familiar with master data, but I'm sure some of you are not, and that's why you've joined. So and plus, I'm a data modeler. And what a data modelers love is definitions and context, right? So I took a definition from Gartner. And if you look at this kind of two things, there's master data. And I like their definitions that really it talks about those core entities of the enterprise. And then master data management is obviously the management of that. And so what I think I also like about their definitions is that it really talks about how business and IT work together. And it's not just about the uniformity and getting that single version of truth, but it's also about the stewardship and accountability. And we'll talk a little bit about that. But I did want to hit back on that topic of really master data being the core entities of the enterprise. And like anything, let me say if you have a hammer, everything looks like a nail. A data modeler sees core entities. And I think data model, right? Because that's what a data model really does. It defines those core entities of the business. So if any of you who have ever taken any of my data modeling classes, you'll know, and probably any data modeling class that goes through this, to be fair, that really a data model is those nouns of the business, the who, what, where, why, whether we have products or orders or locations. Those are all the things of the business. So those are the entities of the business. What I like about a data model is that it's a graphical, often, if you're using a data modeling tool, a graphical representation of those core entities. And the nice thing about that is that humans tend to think in pictures. So this is a nice way for either a business or a non-technical person to really look at their business through its data. But you'll see my note there. Not all entities are master data entities. So how do we define that? And that is different for every company, right? So I think the core ones, if you were to look at this, you'd probably jump out in terms of master data. It's probably customer. It's probably product, right? But it's weather event or master data. What would probably say no, that maybe has to do with our product or location or sale. But I don't know, I might be the National Weather Service. And then maybe weather event is their master data, right? So it really does depend. Social media account might not be master data unless you're Facebook. And then that probably is your customer, right? So it is dependent on each organization. Again, there's always some of the common ones. Like customer product location can be, or staff is another one. That we tend to master. That really is dependent on each organization. But it is those key entities of the business. And in fact, they've recently discovered that folks have been data modeling since the beginning of time. And there was a cave outside of the western side of France. No, I'm just kidding. But when you think of it, we all tend to think in pictures, right? And there were emoticons on the walls of early caves. And that could have been early master data. What's important to this business, right? Cavemen and animals, and that's what we eat. So they're probably drawing pictures of that. So sort of facetious, but not necessarily, is that I think we always have that idea of the certain core things that are important to us or to our organization in a simplistic way. That is your master data. So if you're cavemen, then cavemen and animals are probably your master data for your particular business, which might be hunting, right? So we won't be talking more about that today. But hopefully it draws the point home. This might be a more realistic example. There's lots of types of data, and I think we all realize that. So one common distinction is give you your transaction data, and you have your master data. So this might be a list of your retail transactions. The fact that Stefan Krauss bought some telemark ski boots in St. Merritt, Switzerland, and Donna Burbank bought those same ski boots in Boulder, Colorado. Full disclosure, I'm a huge skier. We'll come up again. And we had our first snow here on Sunday, which I forget that most people in the world that would not be a good thing at the end of the summer in mid-September in the Northern Hemisphere. But we folks in Colorado get excited about that. So I've expressed my consultant as to what I was building part of this presentation. So the excitement for snow came through in this example. But that's the idea. There's a certain transactions. They have a date. They have a time and a place, and how many were bought at a certain time. So that actually describes more your action, right? Stefan bought something at a certain time. But the master data is going to describe the key entities around that, the information about Stefan, the fact that, Wendy, who? She might have bought a Prana yoga pant in New York. Well, that's interesting at that point in time. But what's more interesting is that she's 25 years old. She's female. She lives in New York. She wasn't just visiting. She was visiting a customer since 2005. She's preferred because all that stuff around your customer, which is key to your business, that is really your master data. So to further explain that example, so the list of transactions, that's the stuff that's happening. I'm buying things at a certain day at a certain time. To pull out some examples of master data, probably your customer, that I already mentioned, that Stefan and Dada and Wendy and Joe have all bought stuff from our company, the fact that I have certain products and certain product codes and pricing and all of that information is probably your master data. Location can be master data. I don't go deep into this, but there's often the distinction that should be made between reference data and master data. So I mean, when I think of the master data, those of your core entities of the product, they tend to be more volatile, have more attributes change. An easy way to think of reference data is things like country codes and state codes, maybe even product codes, you could say here. Your region codes, those are kind of your list of things. Often they're external. For example, country codes might be an external source. Maybe your region codes within your company are internal. So they're kind of a mix. Why are they important? It should be self-evident in terms of consistency, but this is a good example of Stefan, he's in St. Merritt, Switzerland. So that two character codes actually country code. I'm in Boulder, Colorado, which is a state code. We'd better get some clarity on that, right? So the keeping track of those things that might be sort of simple and banal and boring, but getting those wrong can really wreak havoc on your organization. I'm just trying to see a list of everybody who bought ski boots in Colorado this month. And if those codes aren't right, we have Colorado, and we have CO, and we have CD, and we have all these different things. As you know, if it works in the business, that can just create unnecessary complexity, especially when you're doing things like, if you think of internal reference data, let's think of something like a region code and you've merged two companies, right? And the different region for Northeast might be, it might be from New York North in the US, or it might be just the New England States, and so trying to get a consistency across those basic reference data is important. But what we're talking more specifically about today is those master data. The idea of maybe my single view of customer or my single consistent view of product or my single consistent view of location. So to kind of continue that analogy, why do we care, right? It's becoming more and more popular. I mean, I run a consulting practice that I think Shannon mentioned in the beginning, and we are seeing more and more requests for master data management because that is a huge opportunity. Getting that safe, we think of customer, getting that 360 view of your customer through data. So it could be anything. And we'll give some examples later in the presentation. Customer is the easy one. You know, a lot of, we think retail, what is the most important thing is your products and your customers, right? And getting a sense of that is huge and full disclosure, I had skiing in the mine. So this example is heavily skiing related. And then I actually stole this from myself. There's probably only one other speaking engagement other than data diversity that would be better than data diversity. So if anyone knows me, I'm a huge outdoor fan, rock climbing and skiing and hiking and all of that. And I was asked to speak at the European Outdoor Sporting Good Conference in Barcelona last year, which was my dream because the other sponsors were Patagonia and North Face. And actually they had the Chief Marketing Officer from Patagonia talking and it was super interesting. So what was interesting is the reason they asked me is because they saw that to be a good retail organization you'd be more data driven. So my challenge was trying to talk to a North Face ski manufacturer about data. And so I was trying to hit it home. So here's a good example that I think anyone can relate to whether you're in the ski industry or not. So you might have this customer, right? This is, I'm trying to get that single, not only the single view of my customer that I have a customer called Stefan Kraus, age 31. The fact that he lives in Pontesina, Switzerland. But all of these things around him that maybe some of it you have that he purchased 500 euro in outdoor gear in 2016. That's something I can probably get internally but when we think of big data and we think of all of the new information we can get about our customer that we can track through this idea of master data. We're trying to master our own business which is our customer. So the fact that he might be a ski instructor in St. Merritt's, right? Lucky guy. The fact that he was the top finisher in the Angadine ski marathon five years in a row. I am going to take this opportunity for having a lived a life of people giving sports analogies at work which is one of my pet peeves. I'm gonna give my own. I am a huge Nordic ski racer and I love it. So if anyone else on this call which I doubt is a Nordic ski aficionado. I highly recommend that race. It's one of the most beautiful ones in the world. But this guy, lucky guy won. So you probably wanna know that. I'm gonna try to market a new Nordic ski boot or Nordic ski jacket to this guy but I might not have known that had I not done my research. The fact that he's been a member of my loyalty program since 2010, pretty important. But when we start thinking more about how do we understand our customer? He's 31, he's hip. He's got this dream job being a screen instructor at St. Merritt's. He's probably up by, he bought all his stuff online. He'd rather get a text message if we were gonna market to this guy. We really need to understand. But what's ironic is I'm trying to get this idea of stuff on crowd stage 31. You'd think if I were to profile as an outdoor gear who my perfect customer is, it'd be this guy. He's all right. But he actually only bought 500 Euro in outdoor gear. That's not a lot. These guys don't make a lot of money. They might live life. They don't make a lot of money. He probably gets a lot of his stuff free. So probably not your best message, your customer. So we go ahead. We've got another Stefan Krauss in our system and he's 62. How do we know which Stefan Krauss is Stefan Krauss? But that is one of the key issues of master data. Is this the same person? Probably not. There's one that's 62 and one that's 41. Is this father son? Are they completely unrelated? But when you look at this, you probably wouldn't think that a banker based in Zurich is gonna be your prime customer for outdoor gear. But you look at the numbers, he actually bought $3,500 of outdoor gear in 2016, because he can. And I actually had a friend who was a physician and an older gentleman. And he actually did buy a North Face jacket and he went to the store. He said, don't talk to me about going to Antarctica. I'm going to walk the dog. But I want to be warm and I have a lot of money. So I'm gonna get the best mountaineering jacket I can. So this might be this gentleman. He has 75% of his spending while he's on holiday. He likes to buy in the store and he would rather, if you're going to market to him, get a physical piece of mail. He doesn't even know how to turn on his cell phone and he would not respond to a text. So lots of things going on here through Master Data. It's one, understanding the customer, understanding their buying patterns, and then the boring banal data quality issue. If I'm gonna market this stuff on crowds, which one is it? How do I best address that customer? What's gonna be my lowest lucrative customer? You can't do any of that if you don't have a good sense of your Master Data, who my customer is. And there are 13 different step-on crosses and which one is which, and who's my best customer. That's almost the prime example of why I want to do Master Data. So there's a couple of things, getting a consistent definition of your customer data and then being able to expand. So often when I'm sort of explaining the opportunity and the need for Master Data, if this is, you might say, I have all of my data in System A about my customer. I'd rather be a CRM. But if that's customizable, that's fine. But if you own your own data, you can add whatever you'd like into your Master Data system. The fact that Stefan actually hates his name and he'd rather go by Skip, right? He's got a nickname. Maybe your CRM system doesn't have that. So whatever you would like to add, you can, the fact that he has a favorite sport, which is football, which is not skiing, but he still buys a lot of ski gear for the one time he skis each year. Anyway, that's really both shows, I think, the opportunity of getting that 360 view of your customer or of your product, or we just use customer as an example, but also the need, getting that basic matching of who our customers are and have that single golden record. So that's a little more technical. There are a lot of options how we do MDM and that could be a whole webinar in and of itself. I'm gonna list two here. One is kind of that centralized approach where literally you just migrate or well, that's not the best word, but we store all our information into a single centralized MPM hub, which in that is used as your common reference point to that golden record, kind of that we literally centrally store in one place. Another common one is kind of that more virtualized approach where you keep the data in the source system and you might just reference that through a virtualization layer or some sort of link and there are others and there's kind of a combination of the two and all of that. But the main thing is whatever architectural approach you still need a data model. That's still a core part of wherever the data is stored, understanding the data in both the source and target is critical to actually have this data model as a part of that. So one of the ideas of a data model and this is just a traditional example of a patient if we're saying healthcare now. So if you have this view of, if we're trying to get a MDM, a single view of patient in this case, this would be sort of your super set of all the different attributes from your source systems. So there's certain core shared attributes. So across all my systems, whether that's the electronic health record, whether it's the physician notes, whether it's our marketing database, all of this probably has your patient ID and their first name and their last name and their email, right? And then so part of the issue with MDM is how do we handle the cross-matching of that, right? If one person has Joe Smith and one has Joseph Smith, which was the one we're gonna use and is this the same Joseph Smith? So that's a big part of Master Data. The fact that you do have these core shared attributes and getting that single version from those in matching is a big deal. And then there's also this idea of the each system is good at what they're good at. So one system might talk about it sort of the marital status and their race and ethnicity and that sort of information. The other might have to do with their language and special needs and education. And the third one might have about their household income and where they're living and that sort of thing. Because each system is not a bad thing. Each one is built for its own particular purpose. And what you wanna do in MDM is get that super sent. And that's a perfect reason for a data model to really see this is the logical view of my patient. I'm not looking particularly at the system. I'm looking at my business. And I think that's the biggest difference here. So way to explain that further. So say I do have all of these systems and when your organization is probably many more but this is kind of a simplified one. I might have my CRM and my sales system and marketing and finance and et cetera, et cetera. Each one of them has their own purpose and they're associated data model for that purpose. So if I have a supply chain application and I also have a sales application, they probably all deal with product but they all have their own different flavor of that. For a good reason. They weren't built necessarily to manage your customer product location data. They're meant to do their purpose. And some are getting a little better and they may try to do that and say I could be your single purpose and maybe they can. But you wanna decide that by looking at the data model and saying yes, this system has everything I need. So these systems have been built for their purpose. They all have their own data model. And the idea of this MDM system with that golden record in the center is that this would be your selected super set of all those source system models. So do you care about every attribute that's in your online sales system, the fact that this may be the transactional information or some of the log data. Some of this stuff is only important in that particular system. So what you wanna do is look at that and say what's important to the company? What's gonna be my MDM super set? And that's gonna have its own data model. You'll see that I have reference data sets there too. I mean, often it's in the same spot. I'm not necessarily talking about that today, but we kinda covered that of they are separate but related when often these MDM tools have both. And you should have common reference data sets as well. So as part of that, you'll see that data quality and matching is a big part of that and we'll talk a little bit more. So how do I know that the Joe Smith and the CRM is the same Joe Smith in the online sales and the same Joe Smith in my marketing application? I have to somehow have some sort of matching and logic based on your data model to do that. Once you've discovered or created this golden record, again, whether it's physically a golden record stored in one place or a virtualized or whatever, but you've just fine, this is the correct Joe Smith. One of the nice things that you are, is the data warehouse related, of course, it is a different thing. You're doing your data warehouse for this idea of reporting. So yes, it is storing everything one place, but kind of for a different purpose. And they work nicely together. So you can, for example, take your information from your MDM and kind of feed your dimensions of the warehouse. So maybe I have a dimension for customer and I have a dimension for location. The nice thing is you've done all that hard work to do the cleansing and the deduplication and all of that. So those golden records can feed the warehouse, which is sort of a nice, and you'll see there that often the warehouse has its own data model as well. It might put it into, for example, I know it's not the only way, that might be in a SAR schema, which is just a different way of storing it, a different data model for similar data. And that's why we do modeling, right? For part of it is different fit for purpose for different applications. The other thing, and we'll talk a bit more about this as well, is that applications can also reference this golden record for lookup in the application. So I am, if we go back to the patient example, I'm checking someone in for a patient, and I wanna see, you know, you say you're Joe Smith and I look it up and I say, okay, yes, you're Joe Smith on 17 Main Street and the last time you visited us was, and all of that sort of information, you can source that from the old record. Or if people are doing data entry, you're on the call, you're a sales rep on the phone and someone calls in, you can see that the person already has their name and address and phone number and just verify it rather than starting again. Which really helps with data quality and a lot of other things. For those of you who might be thinking, why is application separate from those databases up top? Yes, I get that. Sometimes often the end user applications are sitting on your sales database or your marketing database, for example. But just to kind of show the point, it's just showing that once you have this populated record, you can actually use it in a system. So that is one of the benefits of being able to look it up and just, it helps. We will talk a little bit about quality and governance and all of that. That is really the goal here for that. Which kind of leads to this idea of having these business rules and matching rules. So one of the things you need to do in MDM is really establish that criteria. Which to me, at least, in my world is really hard to do without a model. So what are all of the attributes and then what are those attributes you would match on? It isn't the same at every system and there may be separate strategies. I mean, it would be great if we all were born with sort of, maybe not, but for a data point of view, it would be great if we're all born with some sort of unique ID. Sort of emblazoned on our forehead that everybody could be uniquely identified, but that's not the case. And that may be rather creepy. So how do you know that this is the same Joe Smith? Is it the fact that it was two Joe Smith and the same date of birth with the same, if they're in the US, the Social Security number? Probably a high probability that that's the same person, but it may not be. So you might want to try something else. If there's a national, maybe has the same Social Security number in the same last name. Or if that's not the right, the same last name, first name, middle initial, you know, all of these different things, it can be sort of defined based on your unique business rules. There is no one standard, right? So, sorry if there's a thing in the back. So, and the idea is you can have these multiple strategies that can be executed in sequential order. So, you know, if you don't have a match on the first, you can go to the match to the second and onwards through these list of different match strategies. So that is a key part in when you're doing the master data of how do we define this golden record? So one of the places you can use the data model, I guess a lot of us have this idea of a key, right? So again, ideally in the world, we would all have our unique primary identifier, right? Or ideally in your systems, say if we're back to the patient example, everyone would have the same unique patient ID across all systems. That would be lovely. And we often have this kind of idea of the target key, but that is often realistic in a real world system, right? So that's where this idea of an alternate key, and folks are familiar with that, kind of your natural keys in the database. Again, in this particular modeling tool, you'll see how they kind of show it that, you know, data birth plus social security number is your first alternate key. The second one will be social security number plus last name. And those are nice kind of ways for candidates for your matching logic. Someone's already done that rule, right? How do we know if it's the same person? And this could be it. You could look at it and go, well, I search on first name, last name, then phone number, or I go by email address or whatever, right? And again, there's a lot of things that, you know, this ties into governance, this ties into security and privacy. I personally get annoyed when I try to go buy a pair of shoes and I say, what's your phone number? And I say, do you really need my phone number for that? I'm trying to buy a pair of shoes. Well, we're just like, well, you don't need to really have that in the system, right? Or email, right? I think a lot of marketing folks would like to define you by your email. Do you want that? So, but anyway, there's a lot of things around this and tracks. So, and then some of it, we can have an exact match on things like first name, last name, right? But there isn't always a perfect, we're human, right? And people are either entering the data differently or they are tracked differently in different systems. And as I mentioned, there's different data models from different sources and one might track middle name and one doesn't and all of that. So, that's where some of this fuzzy matching logic can come in. So, and this is often done, it's kind of outside of a data model but often done by some of the MDM systems or something you would code yourself if you're doing your own kind of manual MDM, which can happen. So, is 101 main SD the same as 101 main street? Is John Smith the same as Jay Smith? The same as Jack Smith? Or, and so you can either create some of these synonyms that I know if I see Tim and I could probably find that maybe Timothy as well or that street and street or street with period and all of these kind of things can be matched. And then you can kind of create these data quality thresholds. So you could say, you know, if it's a 0.9% match then that's probably the same, you know, Timothy didn't have the Y in it and someone typed it wrong. So I can pretty much assume that's the same. You know, 0.2 might be, I don't know, I have Jay Smith and John Smith. If I just have that information, super common last name, Jay and John could be, you know, Joanna could be anything. So, that's sometimes where you can have this idea of either auto-approved them, I know that's a match or have this idea of a human review from a data steward, which is an important part in any MDM as the people side, right? So not to belittle the technical side, because we've already seen there's different source systems, there's different data types, there's different data models, but so much of MDM is on the people side and the process side and the human stewardship side. So this is where one of the roles of the data steward MDM, if you've set it up this way is to kind of do these approval. And this is probably a bad example, even though I built it, is that there's really not enough context here. But say we have in our system that these certain people have, you know, John Smith, Jack Smith, John Smith and John R Smith, they all came from different systems, they seem similar, you know, one of the reasons is the bad examples, there is no more context. So you could say they all have the same address and they've all been a customer for the same amount of years, is this the same person? And so the data steward would really be someone who knows the data or knows the people. Oh yeah, I know John, he loves to go by Jack or something like that. So that's where often you actually need a human data steward to kind of look at this or, you know, and then often it's a phased approach. So as you are building these rules, as the rules get better, you can trust them more and maybe more things can be auto approved, but especially in the beginning as you define these rules, probably good to err on the side of, even though it takes more time, having a little more human review because you don't wanna match inappropriately as well. So okay, so then you've got the idea so that I've done this match and that John Smith is Jay Smith and one comes from System A and one comes from System B. So you've matched on certain rules and then you wanna create this idea of survivorship rules which are different formatting rules which can be kind of confusing sometimes. But where do we get that golden record for? From what was I just saying? So I know that Jay Smith is John Smith and I've done that where if I wanna populate the name information? Well, we know that, oh, I don't know, he's a customer, right? So, or this is a patient when he checks in, he probably has the name correctly but probably when he just comes in for a checkup maybe they didn't get all his name information. So let's not take it from that system, right? If we're trying to get his marital status the less probably come from the counselor that actually spoke with him. So we'll get that from System B, right? So that's part of the design of a MDM and it doesn't have to be all of one but of once we've defined this as the same person where do you get the best information from, right? Maybe the best addresses from the marketing database, best product information from that customer from the sales database or whatever. So you have to define kind of how you populate these different systems. And then there's this idea of harmonization and everyone uses slightly different terms. We're horrible for a company, for an industry that likes to create definitions of things we're terrible with our own, right? So, but the concept is the same. So the first step, well the several steps. So I matched, is this the same John Smith? Yes it is. So for this John Smith, where do we wanna take the data from the different systems? And so once you have ideally created this gold record this is John Smith, he lives on 101 Main Street he's aged 32, his favorite color is blue. Whatever you wanna track about John Smith and legal limits of course, that's your purview and that's one of the benefits of MDM. Well often you wanna then harmonize that back or populate back to the source systems so that wouldn't it be great that if we know John Smith's middle name is Gary that we will populate that to all the systems that don't have it. That really is the golden nirvana of having the golden record but that should be obvious. That should be handled carefully. You definitely wanna make sure your golden record's right. You definitely don't wanna interfere with anything in those systems. You wanna have clear stewardship and update the rules and all of that because it's a great thing to have but sort of automatically populating back should obviously be done with care but is a huge benefit once we get that. Okay so I've kind of touched on this before but this is so important. I'm going to talk even more about it is this idea of governance and business process. So successful MDM really requires that collaboration between the stewards and the owners and the users of that system in both business and IT. Again quoting Gartner here. They had an interesting paper a while back on the top, they had four reasons. I'll let you get it yourself to see all four. I pulled out just two, the top two reasons for failure of MDM systems where the failure of IT to align with business processes and the business value and not having the right information governance around it. And from my practice I would agree with that. That's probably the biggest risk. Not that again, so it all belittle how hard it is to get the data from the different systems. That is a huge challenge and why there's a lot of tools and expertise around that. But more complex I think is understanding how data is fed from business processes getting the right governance processes in place. Early on in my career, I guess it was probably 10 years ago now I was a chair at one of the forums at MDM conference. And we were talking with the benefits of MDM and this one man just stood up in the back of the room and went on, I guess that would call it a full-fledged rant of how much he hated MDM because these folks came down from corporate and posed this thing upon him and never spoke to him and ruined everything because they didn't talk to him and I couldn't help but actually agree because that was exactly a use case of this. If we're creating, this is a single version of how we store name or what information we take, definitely if we're specifically if we're gonna create standards, make sure that everyone's affected by that. Whoever's affected by this information is part of the discussion. The first way to make this fail is that you forgot a key person and the way you show the information is kind of this push down from the top. It really has to be a grassroots effort that everybody's bought in and then it's the easy thing to do. I mean, I've done these and the beautiful thing about it, we often get a lot of buy-in because people see the need. Yes, I would love to have the single view of customers so I don't have to spend half my job trying to cleanse my own data because I know it's not wrong or trying to call people and get the information or I still see a lot of sneakerware in companies that I'm trying to get when the address of this person so I have to call the person on the phone and get the information because our system doesn't have it. So all these types of information you wanna get the people involved around it. So what I'm a big fan of is outlying the business process and I'll show a little case study at the end which is a good example of that. A core part of the usage of any data is the business processes around them. The example I gave is kind of a modified version of the BPMN, the business process modeling notation but it's basically a common way to show this, the idea of swim lanes or workflow with a lot of people like to show it. One of the reasons I like, especially business people can understand this, this is their job. So I often add, if you've seen any of my presentations or worked with me in the past I'm a big fan of pictures, right? So I often will add my own flavor to this if there's a person there, almost like a UML person. But these are the people, I'm product development and supply chain accounting and marketing are all dealing with, was our example, product information. So they all have different pieces of it. So I often will put the workflow, the product starts in development and then we go to supply chain and they do the pricing and then we go to market testing with the marketing and then they do the final price. Well, why is that important? Because I could very well seen, I've been in meetings where both supply chain and marketing are arguing, I set the price. No, I set the price and they're both right, right? So what supply chain accounting might set, well, this is the price based on the cost of goods and then marketing will go and say, well, based on our market testing, here's a different, maybe you store that as a different field or maybe it's a different status at different times. No one's wrong, right? But until you map this out, that's where some of these issues come to it. We just say, oh, they both have price, let's just put it in the same field. Well, really they may be very different things. Also seeing a lot of data quality if it comes in these processes, I had a client last month and we drew one of these out and she kind of laughed and she said, wow, when you really draw out our data processes, they look ridiculous, don't they? And as she was saying it, it was sort of, well, I put it in the spreadsheet and then I modify the spreadsheet and send it to Ann who puts it in the spreadsheet who sends it to Joe, wow, that's kind of crazy, isn't it? Right, but I mean, you're so busy in your day job, no one has the time to sit back and necessarily look at the holistic process and that's why doing that in these, you can often solve data quality issues could we step that step and just have Joe send to marry and skip a spreadsheet, right? Or the fact that there are spreadsheets or the example I have here is an email that's being sent. And one client I worked with, it was email, we had one of these and almost to be facetious, but not really, I literally showed every email that was being sent and that's what helped get buy-in from the CEO. She saw spreadsheets and emails being sent all over and she said, wouldn't you think we could automate this? And we said, yes, that's the role of MDM, that we could have the master data system be controlling that with all the data quality roles rather than someone sort of sending emails back and forth to each other. So again, that's part of the reason this is helpful and especially when you're working with business folks, what do they know, they know the work they're doing, right? So this literally ties to their day job and how data applies to their day job. So I actually had an email question last night and I said, maybe online. This example that I had an article which used this example is sort of a lot of detail. Here for this particular use case, I was trying to actually show that example that both marketing and supply chain accounting are both dealing with price of the product. And so there's a lot of detail. As you get bigger, this much detail is probably too much. You might just have kind of a cylinder that says product data or customer data. And it's just showing that we're both touching customer data. Another example, I don't have a slide for here, but we did a lot of this for a big pharmaceutical company. Our sponsor was basically the pharmaceutical development lead and what he rightly saw, I said, I know data is so core to our development process and we don't have a good sense of it. We mapped out their entire clinical development process and where the data was used and got actually a lot of efficiencies in the process because no one, until you were able to draw that out, people weren't able to see it and it was hugely successful from the scientist who said, I know there's a better way we can do it. Let's draw it out. And that's where he's got seeing the redundancies and conflicts and that sort of thing. So a friend of the process model is the CRUD matrix which I've used the slide in the past, but it's the worst name ever for the most helpful tool ever and a prize to anyone who can think of a better name even though this has been in the industry forever. So my customers who have done a CRUD matrix and they suggest a CRUD matrix, they usually love it. For those of you who have a customer who haven't, they say, what, are you talking about stuff on the bottom of my shoe? CRUD is an ugly name. But what it stands for is how data is created, read, updated and deleted, which is the CRUD. And this is actually a helpful compliment to that process model previously. So maybe you did the more simplified version here and it just said product development creates product information and then it passes it over to marketing who looks up some of the product information. You probably want to go more specifically about that. So who's the owner of this? Who would actually create the product assembly instructions? Then who would maybe update them later or delete them? And it can be very helpful in kind of root cause analysis or conflict, maybe you have three different folks all updating the customer information. Is that correct? Which one wins out? How do we handle conflicts? All of that. And then sometimes these, again, that the customer that sort of laughed and said, wow, when you spell these things out, they're crazy, some of these processes and they can highlight that as well. So one of my customers asked me, why do you have create an update? I mean, wouldn't you be creative that you could also update? And he just thought quickly, and if you haven't heard me rant about my insurance company, I will now. If they had actually done this, I think they'd have a much more efficient business of how actually data flows to the customer and when to see you. So I had to send my address to the department and it was incorrect and I've been trying to change it for about a year and a half. And that seems to confound them. So here's a good case where I could create my address and I never from then could ever update it. And so again, that's bizarre, but had someone actually drawn out that process, that might have fixed it, right? Or if we're, I'm sure one department has my address but there's the other one looking at it and all of this type of stuff, use case analysis, conflict. And in some case, it's just a stewardship issue of yes, I update the price, maybe you updated the price or maybe it's a different point in the process, but it helps at least highlight it and expand that out. Here is a case study that I'll mention and why I find it interesting. Partly it was just a lot of fun, but they initially called us in is that they knew they needed better data governance and they knew they needed a master data management system. So this was a restaurant chain. When you think of it, what's their core master data? Of course they have customers, but to them it was the menu, which is sort of their product, but it was specifically menu and because when you think of how they differentiate in the market, there's a lot of restaurants out there and they were very innovative with their menus and they would change their menus very often and then have different campaigns around them and they didn't have a good core central spot of their menus, of historical menus, what's worked and what hasn't, what ingredients go to that menu. And I think one of the marketing ladies actually jokes like I think our printer has a better view of our menus than we do. I mean, that's a huge risk to the business. I think one of the things that sort of motivated this is they actually lost one of their recipes from the past. I think it was literally on a Word document somewhere in a SharePoint that went down and they said, that's just ridiculous. This is our core business. We can't have that happen again. So what was interesting about this is this was a master data management issue and they realized they needed governance and workflow around that. The more we were in, probably a week into the project, it clearly became apparent that this was more of a business process issue. So the first thing we did was map out business process and literally we started, I felt like it was a field trip, so it was a trip for me, that we literally went to the test kitchen and we talked to the chef who was not wearing a hat like that but he did have that kind of white shirt. And literally we learned so much and the company learned so much. It wasn't just us. This gentleman actually had his own, basically it was a process model with some crud matrix mappings of how when he created his menu that there were certain ingredients and there was certain pricing and he knew there was a problem. And this is the guy that went to chef school and he was basically building, for the folks to say kind of business person, understand data models and process models, he was a chef and he knew there was a problem with some of his data and getting the recipes flowed out quickly. And he started to build his own. Similarly, we went to the marketing department and they had built on one of their whiteboards, a complete business process flow with data touch points on it. And again, they didn't know what to call it. They just said, this is our nightmare, because you help us fix it. So the nice thing we actually get all of those together and we drew it out holistically and a lot of them had the same touch points that literally from a recipe from when it goes to recipe from when it goes to supply chain. I learned so much that a piece of cheese is not a piece of cheese. Everyone nerds out on everything. So there's a whole USDA database of what a slice of brewed cheese is about 16 different versions. And if you get that wrong, the whole cost of that could be wrong. So when I'm actually doing the point of sale system, when we actually went into the restaurants and saw the person ordering it and something as simple as I added a slice of cheese on that point of sale system is that the right cheese slice that's gonna have the right cost of supply chain that was applied to that recipe that appears on the menu, right? I mean, that was all workflows. It was a very small piece of data, small subset of data, but it went through a lot of processes. And we did not come in starting thinking we should do process models, but we ended up there because it was such a core part of their business and it was sort of an aha moment for everybody to just see how that sort of flowed through. And that's a big part of your master data, which again, isn't a data model, but it's a very close friend of one. And this was a good example of that. And it really helped, actually helped us sell it to the CEO to get more funding and buy in because again, what do business people understand is their business and the base was the drawing of their business and how the data flowed into that. So to summarize, a master data is that those core entities of the business, customer, product, supplier, menu, caveman and animal, right? It really depends on your business and you need to find that. That said, there's common ones and there's products that kind of, there are MDM products that focus on customer or product, but that's for you to decide that it's not the same for everybody. To do this, right? I'm a big fan that you need a data model because that really helps define what those core entities are what the core attributes are and then do the mapping. And so the reason we do this not just for the academic fund of doing data models, which I love, right? But it's really that business opportunities. If you remember back to that outdoor store not only knowing who my customer is what the right step on Krause is but really just start analyzing that getting more information that you can add what their favorite sport is how often have purchased all that other information around your customer you can get unless you really manage your customer. To do that right, you need proper data governance so that you have the right people who are affected by the data and it can import and can ask those right questions and do the right matching rules as well as process and crowd-message trees to really align your MDM with your business success. So just quickly, hopefully you enjoyed this and it was helpful. If you want to hit it once in the past, that's great. Next month up is agile and data modeling. We do this for living so if you need help let us know. There's me if anyone has questions after and I know a lot of you have been very vocal in sending questions and I'm always open to that. And at this point, I will pass it back to Shannon who will open it up for questions in Q&A. Donna, thank you so much for another fabulous presentation and thanks to our attendees for being so engaged and I love the chat that's just been on fire again today and lots of great questions coming in. If you do have questions, please submit them in the Q&A section in the bottom right-hand corner of your screen and just a reminder, I will send a follow-up email to answer the most commonly asked questions. I'll send a follow-up email by end of Monday for this webinar with links to the slides, links to the recording and anything else requested throughout. So diving right in, Donna, are there resources that recommend the best design of attributes for core entities, especially with an eye towards data quality standards? For example, with organizations, is it helpful to track a legal name as well as a familiar name? There are a couple of questions embedded in there. So are there kind of standards and best practices? Yes, so there are some industry models out there which are again a best guide. I would never say just take an industry model and take it music because every company is unique, but it's a good guide. A lot of the vendors out there that sell kind of, again, if you have sort of a customer MDM or a product MDM, a lot of these things are similar challenges, a lot of companies to face as some of that has been kind of worked out. To your specific question, should you have kind of legal name as well as familiar name? That's up to you and your business process, but I think yes, and that's a great example of kind of the benefit of MDM. So I'm working with a couple of clients right now, actually they're kind of clinical either with children in school or patients with a care coordinator and that sort of thing. And that is such a personal business that of course if you have your legal document of I've signed up for healthcare with Medicaid or something, I have my legal name. But you might know that my name is Gwendolyn and I hate my name in particularly Mary. That's pretty important. So again, so if you can have that field in your own MDM, that's a core part of your business and it matters. So yes, I'm a big fan of adding that one for an example for a more personalized customer interaction or patient interaction. Absolutely, so an example of MDM for a patient, if we have EDW, would you recommend MDM as ODS or Data Lake Area, which then moves data to EDW? I'm not sure, I completely followed that but I'll try to answer it and then if I messed up, someone will tell me. So I think the answer was if I have sort of the lake or I have the warehouse, do I still need MDM and how do they work together? So I think it's an and when I had that picture earlier. No, I can try to jump back to it but I won't, I think you'll remember it. That you have the MDM, which is kind of that place to store, I'll actually try to go back, store the golden record and then that can feed, here's a good example of that warehouse or it could be the lake but the benefit of storing separately defining that I need to get that single view of customer and for an example, you have all the matching rules, you do all that logic around MDM in that and then that can feed the warehouse rather than trying to duplicate that effort into the warehouse. I do see them as separate things. I see that the MDM is sort of a first step that can live in parallel and that makes sense. Indeed. So I was, so let me jump, sorry. I'm sorry, it's a day of me stumbling. How do we model historical and temporal features of MDM? You can and that's sort of a choice. I mean, often MDM is sort of done in real time because you want to sort of that current version of what that golden master record is, which means a little different than things like the warehouse where you may want historical. A lot of these MDM systems are very good though at finding out where that first record came from. So I want to know that Joe Smith is Joe Smith in the CRM system, actually it's Joe Seth Smith and you want it sort of to know where that history of where that golden record came from and that is a feature of a lot of these MDM tools is that lineage to know that I've created that golden record but it doesn't mean the source goes away or kind of the history of where that golden record came from. So yeah, that's definitely something to evaluate of how, assuming you're working with a vendor, how they do that because that's a critical part. It doesn't mean that you've created this golden record and reality goes away because those systems did have their system to find that way for a reason. So that is part of a lot of the functionalities kind of keeping that lineage of where the things came from. So is parting model the best way to model MDM entities? If so, what is the downside of it and how do we achieve required structural flexibility in it? Ah, the ever-present party. I am not a fan of party unless it's how the business looks at that party. You know, maybe I have a party and a litigation and that's something people use. I mean, the benefit is that there are certain things about a human being that are common, right? We all have names and we all have addresses. So kind of by supersetting it or abstracting it into this idea of party, that can be very valuable. So that in itself can be great. What you sometimes lose is that anonymity of it, if that's pronounced that correctly, that am I talking about customers or am I talking about employees? Yes, they're both human beings, but they both have very different attributes. What I track about an employee may be very different than what I can or should or know about customers I want to track. So to me, that's sort of the pros and cons. And so yes, there's certain things that are common and that may be good to kind of superset that way, but you lose. I mean, I had slightly off track, but we were doing some modeling for a company and it was about location data. And then we had one folk person that was more on the GIS side that kept saying, every location is just a set of GIS attributes. And I sort of fuss back, well, then you could say a school is just a superset of organisms, but a school is definitely a different thing. That's what you sort of lose by over-genericizing everything. A school is a building, yes, but a school is a school which is different than a hospital. So that's my issue with a party of that, yes, there's good things about it. If it makes sense, that's a lot, it reduces a lot of redundancy. A human is a human, and well, let's not define that 16 times, but if we're really talking about customer versus employee, for example, you probably want to, or I don't know, or a corporate organization versus human being, you know, that kind of thing, they can be very different. Or maybe we just take address out as a separate thing and track that differently. So I'm beginning to ramble, so I'll stop answering that question, but hopefully that clarifies my thoughts there. I love it. So, you know, depending on the modeling, you know, have you made no mention, you've made no mention of relationships between the core entities and the data model for the MDM hub? In your experience, are relationships used much? Yeah, that's actually a very good question. And there's this idea of a lot of vendors that I do product or whether I have kind of a vendor, MDM, maybe in a sense, a lot of the focus is more on some of the core, we call it the entities and the attributes for relationships are just as important. I mean, the other issue of this is how much you normalize versus unnormalize. And I had an argument with one of the MDM vendors that actually wanted to put everything in one golden record and not kind of have any relationships and I wouldn't agree with that. So I think some of the benefit of an MDM is the picture I had of our favorite Stefan Kraus there, way back. Is these, you'll see there's no lines here, but those are implicit relationships. I think it's just how you model it and what's an attribute and what's a relationship is maybe the easiest way to answer that. But yeah, I think the relationships are a key piece of this. It's just lots of time, we're talking about like the attributes of customer. A lot of the vendors kind of focus on those attributes, but relationships are still important as well. So is MDM an extension of conceptual approach? I am not sure I follow that. I mean, conceptual approach, I think of sort of that conceptual view of what do I mean by my customer. So in that sense, yes, it's kind of stepping back or maybe even a virtualized approach of, I'm not looking at, let me find my other picture. In that sense, I'm not looking at the systems. I'm not saying I'm looking at customer in my sales database because then you're so physically focused that you kind of lose a big picture. So if we kind of equate, I often equate conceptual with the business layer, then in that sense, yes. But where it's a no is that the actual act of doing MDM is extremely physical effort because you really do need to align that physical data types and movement rules and all that kind of stuff and the data quality rules and matching. So it's kind of a super set of the combination of those two, the business need and then the very nitty gritty technical stuff on the physical side to get you there. Lots of great questions coming in too. I think we have time for a few more. Are MDM naming conventions just a name that exists in the systems or more often a decided on surrogate reference name? I am a fan kind of for that previous question that the MDM should be the business names, right? Cause we're trying to get the, if we keep going back to customer cause it's an easy one to relate to, you know, maybe in the CRM and it was F underscore name and in the finance system it's T31 is the actual column name, right? But in the MDM, you probably want something that makes sense to that business side of what this is. This is the person's name or first name, last name. So I think the naming convention, the MDM should be separate from the source systems related to and mapped but kind of back to that lineage question earlier. But it should be based on the business requirements and have its own naming. And BPMN provides a view of artifacts and business processes but how or where do we cross reference an artifact to an entity? So, okay, so the BPMN and if anyone's joined my things before people know I'm a big fan of making my own extensions that work. So the BPMN has this idea of a data artifact. I think, I don't know if they've changed it since it was literally a piece of paper in the past but a lot of tools were custom and I kind of have a database cylinder here. So that's at a very high level and it'll link and the BPMN supports this in their way and a lot of vendors and myself extend it that in product development, there is data that relates to product and really at its highest level it's just saying that that when we're talking about product development we're talking about product data and product components in this example. Where sometimes you can get more detailed is in that crud matrix to say at this particular area of this process we use the, and this is a bad, well, not a bad example. I'm a big fan of crud, however it works. So you could do it at the attribute level. You can do it at the system level or whatever but you could say that these particular attributes like product price and product name are used in a certain system. So it's a combination of both. If it's a small process, like in this example I actually put the attributes on just because it helped highlight it but kind of the earlier point this gets really busy really quickly. So I might just say at a high level this data is touched by this process and then either in a data flow diagram and or crud, crud is a good tool for this to count and show exactly what information is touched. That makes sense. Well, I'm afraid that brings us right to the top of the hour. Donna, again, thanks so much for this fabulous presentation and thanks to our attendees for being so engaged and submitting so many great questions. And just a reminder to answer the questions that are coming in, we'll definitely be sending a follow-up email by end of day Monday to all registrants. With links to the slides, links to the recording and anything else requested throughout. So you'll definitely get a link to download the slides for you. Donna, thank you so much and thanks to everyone. I hope you all have a great day and hope to see you in next month's webinar. Thank you. Thanks.