 My name is Shannon Kemp, and I'm the Executive Editor of DataVersity. We would like to thank you for joining January's installment of the Monthly DataVersity Webinar Series, Enterprise Data World. This Webinar Series is designed to give our Enterprise Data World conference attendees an education year-round, a conference we produce in partnership with DAMA International. Enterprise Data World will be held this year in Austin, Texas, April 27th through May 4th, 2014. And today's Webinar is a preview of one of the talks you can experience at the event designing master data services for application integration with David Lotion. There's a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the Webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag EDW14. We will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the Webinar. Now, let me introduce you to our speaker today, David Lotion. David is the President of Knowledge Integrity, Incorporated Knowledge Integrity, Knowledge, excuse me, Knowledge-Integrity.com, a consulting company focused on information management solutions. David is among Knowledge Integrity's recognized experts in information management, contributing to Intelligence Enterprise DM review, and is a channel expert for the Business Intelligence Network. David's book Practitioner's Guide to Data Quality Improvement and Master Data Management and his book Business Intelligence, The Savvy Manager's Guide, has been hailed as a resource to gain an understanding of business intelligence, business management, disciplines, data warehousing, and how all of the pieces work together. David has created courses for TDWI, Data Diversity, and a number of venues and is often asked to provide thought leadership to the information management community. I'm very lucky to have him here with us today, and with that, I will give the floor to David. Hello and welcome. Thank you, Shannon. Thank you for the opportunity. Hopefully, you are hearing me okay. I know we didn't do a complete session earlier, so I'm hoping that this is coming through. I do want to thank you for the opportunity to talk. This is maybe a little bit of background for those who are attending. I heard you contact me about a couple of times. I said, hey, can you do a webinar for us? And I said, sure. Can you do a webinar for us in February? And I said, sure. So what I did is I kind of looked at a combination of two ideas. One was something I've been harping on for a number of years, and conveniently dovetails with some clients over the last couple of months as well. So it's kind of a propitious interaction of ideas, which is going back, cycling back to this whole concept of the question for master data management and why that is a challenge. And what we can do that we can explore to start thinking about how we can facilitate the integration of a master repository and parcel of application infrastructure. So the first thing is to go back to the conventional approach to master data management, which is largely focused on consolidation of data into a single repository. And if we recall the typical phrases that are used or business terms that are used to add on behalf of the need for master data management, it typically enters on things like a 360-degree view of the customer or a single source of truth or a golden copy. In fact, each one of those acts of the buyer to take data from many, many different sources and put it in a centralized repository and absorb it at all and boil it all down into a single representation of whatever entities you're talking about. I think for convenience, I'll talk about customer. For example, we have a bunch of different functions within a business, finance, sales, marketing, legal, fulfillment, HR, et cetera, customer services, customer support, and each one of them has some subsystem that contains information about customers and the designer is, let's pull all the data out of those subsystems and put it into some synchronized repository as a master repository. And focus has always been how are we going to pull our data out and dump it into this repository. But one of the things is that as a byproduct, we end up focusing on maybe the more complex issues, which is not how are we going to put the data into the environment, but rather how are we going to get the data out of it and how are we going to use that data and what are the characteristics and criteria for ensuring that that data is for a long time now, that consolidation may have been the original motivation for master data management, but it didn't drive this whole concept of master data management integration. Why is that? And there's a bunch of reasons and in this slide, I've enumerated a number of them, which is data consolidation, and I'll start from the left-hand bottom YADS, which stands for Yet Another Data Silo. If we're looking at consolidation as the objective of master data management, what we do is we're pulling data out of different sources and completing it in a way that effectively boils out all the distinction that came from those original sources, and so we actually eliminate some details that were relevant to the original sources. And yet another copy of data that the test is not synchronized, that requires continuous input, that requires maintenance and configuration and management and continued natural and management input in order to manage it as its own entity, and that right creates not just a project to build that thing, but it becomes a programmatic aspect that requires continuous input resources. The second is missing input, but not without going to the covers of what you're building, you lose a little bit of the context as to what are the expectations for what you're doing in that master data repository. And yet another copy of data has passed to a large extent that master data has been pushed or motivated from the technology standpoint and less so from the business standpoint. I think that's changing a little bit these days, but it used to be with the IT department that said we need to have a master data repository, and a lot of that was in fact driven by vendors saying, you're already doing a lot of these things. You need to have a separate tool that does master data management. You want to buy a tool and the deal is done and then all of a sudden you've got the creation of a system without necessarily understanding what the business driver and requirements are for building that system. The creation plan. And this has been one of the biggest bugaboos and one that we're really going to focus on today. Now that we've got that thing, now what do we do with it and how do we use it and how do we integrate it into either application that has yet been built or a location that are currently in production and are using their own data stores and all of a sudden you raise the cost of your own data store and migrate over to the use of this master, this customer master repository or this vendor master repository and what are the approaches for doing that and I think that has been one of the big building blocks or roadblocks to success and that's what we are going to focus on in the rest of the presentation. Number four, it's important to understand loss of knowledge. If I'm pulling data from multiple sources and I'm making some decision as to which of the attributes and which of those copies or which of those source records remains and which of those attributes are predestined because they don't correspond to the gold copy of the master, you know, the master truth, well I'm potentially losing some information and the truth is in some cases such as for analysis in a customer's name or a vendor's name original system and then become cleansed out because they're linked together based on an identity resolution, the fact that there's actually some variation that may have been deliberately introduced as a way of trying to game the system. So there's also that question as to whether you're losing information by consolidating it and the same thing with loss of meaning, which is if the sources had some course characteristics associated with the data attributes there, there's a definition of what a customer is in the sales department and there's a definition of what a customer is in customer support and when those customer data sets emerge together and boil out the differences in that definition, the truth is you may not have that because in sales a customer is the person who signs a check that pays for the product, that's a different definition from the definition of customer from customer support is the name of all the people who have been licensed to use the product than those who are allowed to call in for support. Now all of a sudden you've got data that are included with the original sources because you may have all the combinations of customers to be the same type of customer when in fact there's the really two different types of entities that interact with you within the environment. Or some misalignment where you don't have different business functions understanding how they should or should not be sharing information, process governance so that you've got to choose with consistency in how business rules are being applied when you've got data coming in from a long-distance process pathway that in fact you can speculate as a quote-and-quote thought leader and say that this happens and without really being able to back it up but I can actually say that in the last couple of weeks I've been looking at different aspects and processes associated with an operational system in which depending on the portal that you came through to create a new customer record even or not different attributes are collected and different characteristics are set and there's a whole different set of business rules that are being applied for the purpose of an entity resolution and you end up creating duplicates and sometimes not. So I could speculate. I could say that I'm speculating but I've actually seen it in practice now. There are other aspects which I don't really feel that it's worth at this point because I do want to talk about this whole concept of coming up with a migration strategy for sharing master data so as we're bringing this registry and a hub or an index online it doesn't be the creation of the consolidated view that then becomes its own data silo but rather what's the plan for migrating the functional applications of the business activities to employing that master data whether it's the master index or whether it's a master data records or master repository records or profiles or whatever you want to call them how can they be benefiting the finance and the sales and the marketing and the service and the HR departments and all the different components of your business. So the question is how can I build that migration strategy in a way that accommodates all of the different functions and what we would be doing is saying well what are the typical master data use cases and how are we using master data or how would we use our data if we actually had this master data or repository and index and the capability for doing identity resolution largely in the approximate or probabilistic methods to be able to pull together or define duplicates or define variances in that entity or data. And so the question is looking at your application and saying where are we actually looking at at information associated with those entities that would be a master entity. So again I can rely on the use of customer as the canonical example. The operational and analytical activities that I might do while there's assignment of a unique identifier and again it's a relatively simple and straightforward task and however in the presence of variation and multiple variables through which individuals can interact as customers, you might have that clear mechanism for uniquely identifying each individual and understanding when that individual has already been seen. You're usually going to create multiple records for that individual and will be generating multiple identifiers to become if anybody who's had some experience working with trying to assign an identifier or a unique identifier or two. And you can become a nightmare because the more duplicates you create, you want to make an assignment of unique identifiers, you want to be able to search for an individual customer within your master environment, you want to be able to receive unified information associated with that individual or that entity or that customer. You want to make relationships whether that's things like relationships with domains like customer and product. This customer has bought such and such a product or whether it's an alternative relationship that might be a hierarchical relationship. This individual lives in a particular household or this customer is a registered party to a subscriber to a particular type of service. Cross-references of identifiers. So if you've got, let's say you're a retail company that's just bought another retail company and you both have customer data sets and your data set has the traditional companies, identifiers or customer IDs and the new companies got their own customer IDs. You want to be able to manage the cross-references between the acquired company's customer IDs and the retail customer ID until you've been able to facilitate through interacting with the customers the transition from the prior customer ID to that unified ID. Second set of things are satisfying the data management and the governance policy side of the service. I'm managing my environment. I want to be able to look at of the customers that are calling in how many of them have called in multiple times or how many of them have certain types of customer plans. I want to get the list of the individuals who've contacted me through the different to match them against my master repository. Another one is doing duplicate analysis and elimination. Same example we used before where one company has acquired another company and they want to be able to see whether there are any customers that are within both of the company's customer databases so that you can merge with their customer files. And then the next is for standardization and cleansing. I know I've got a master repository that has the customer's name, that is their official name and birth date and location of birth and height and weight and all those different core characteristics that if I'm getting data that comes in from another source and requires standardization. Maybe they've got a nickname and they want to match a name on the page for birth date or birth date then you can do standardization and then use that data to do cleansing and then use that in operation. So this is where we can get a little more complex which is looking at what are the characteristic approaches for accessing data and for master data and how is that master data used as part of the business function. So there are some categories in this repository with other resources. So an example would be customer master but I also have my daily transactions and I want to see what percentage of my customer days transacted with a company over each of the last five days. And then feeding analytics. So I might want to see what kind of customer data clustering and classification implications or algorithms and see whether I can come up with some predict analytics based on the data that's in my customer or my master customer. So these are examples of typical use cases for master data within a particular business function. Those are some of the things that led me to consider one of the challenges for integrating and finesse some of those challenges and I've always said one of the biggest issues has been transition away from using the entire copy of the data and being able to use the master data and I thought well, how do we do this as a set of services? And I started doubling together what you might call master data services stack where we start looking at what are the data use cases and how are those related to different levels of capability that either are provided with the underlying database or whether it's provided as part of a master management project or whether it's at some kind of API or interface that we need to design to enable an application to make use of that master data. Basically five layers here. There's the array layer which is the data access, I'm sorry, which is the repositories of the master data itself. The orange layer is data access. The pink is, I'm sorry, the purple. Excuse me. The purple is the master identity resolution typically provided with the resolution. The blue are the core services, you might say, data management and related management core services provided for many of these. Then the green here is the interface that is exposed upwards to the business function. We're going to walk through each one of these in a little more detail. The master entity repository, so it might be at the back of the repository, so it might be your customer master, which will have private file data, all the information about that customer. Then the role of the apps among those customers might be households, it might be people who work for the individuals who are somehow related in a familial relationship. Whatever the relationships are and whatever those entities are, they could also be multiple, multi-domain M. It could be customer data and product data and vendor data. There's relationships between which vendors provide the products and maybe relationships between which customers have purchased which products and essentially it's the entity index and a unique identity based on an entity identifier. So have the sign created or let's say created a new entity record or new customer record, then that customer's identifying information is confused into some type of key mechanism for that index so that the next time information for that customer comes in, it can be mapped into the index and then find the unique identifier and then use that unique identifier to access that data directly from the company. So typical data interactions that support that enterprise use of metadata would be the digestion of data. So I get a new set of customers and I want to put them into my customer database. It would be profiling, being able to analyze the data that's coming in to be able to look for where the variables are. It would include probabilistic matching so that I could create for the match or duplicates or overlaps or intersections between different data sets and master data services for management of the repository. So what do you need to do on a data basis to make the integrity of that customer or the processor? How frequently are you removing the data elements or the records that are in there? How frequently are you running the duplicate analysis internally? Are you ever recognizing that there are two individuals that might have been composed into a single record that needs to be broken? So those kinds of index layers and the next layer is the data access which enables the other areas on top to get access to the master data. What do you think about this? Again, in the context of do we want to be creating multiple sizes or multiple replicas of the same data which is an alternative to create a master data environment and have a separate repository or alternative would be a separate master data environment as well so that instead of accelerating creating new master entity records, what we want to do is make use of the entity index and link it to the data through a federated way to its original source. You're not actually copying the data but you're providing access to the data from its original source. So we can either create a directory to the physical data repository that's a copy of data or we can add access to a conceptual master repository that is composed from across all the sources. But we want to have access that would be typical of the type of an application we do which generally are the databases or create accesses to data. So it's a select that would be able to pull a subset of data out of that repository or to run joins. Show me all the customers who bought more than 16 widgets in the last 30 days and multiple multi-way joins that go across multiple systems. So it's not just necessarily pulling the data out of the master repository but rather joining the data that's in a master repository to just sitting in a transaction system or in a warehouse or some kind of system. That's where the virtualization of federation sometimes comes in handy because you may want to abstract out or make that layer somewhat opaque so that you don't need to be concerned about where the data is actually sitting and that would force you to have to tinker with all sorts of precedent to make sure that you could get access to the two in the right way without jumping through too many hoops. So we need to have access layer in this services architecture. Okay, this is what I would call core services and that includes matching an identity resolution and the algorithms for probabilistic matching but this includes my call data management and identity layer services and we've got two slides on how many bubble. So we start with the matching identity resolution. That's the capabilities that may have originally been the core of your vendor's master data management. That was then linked to the management of the repository and then on top of that which are the capabilities to get access. So the data and identity search. So I give you a customer's candidate record. Here's a name, here's an address, here's a phone number, here's an email address. That would then invoke the search identifying attributes looking through the matching or identity resolution layer and I'm going to jump back here which would then go through the data access repository to the index to find whether there was that customer actually existed within your environment. It should return to you an enterprise or a customer identifier, a unique identifier which you could then use to retrieve the customer's data directly from the master repository. Likewise, you might want to update a customer's record, you might want to create a new record or even what I call deactivating. I mean we typically wouldn't delete it but rather we'd have some kind of duration associated with the actual plays. A customer or whether a person was a dependent in the household until 1997 and then after that that person became a head of her own household. So there may be relays that exist in the past we want to maintain history so we may be activating or deactivating over time. Then there's identity services, it's a middle bubble on the pro so it's the ability to generate an enterprise identifier or a unique identifier and then the cross-referencing capability so that if I've got excellent developers that are coming in to be able to manage that that nothing between an external identifier and my internal unique identifier. So a customer comes in with a customer ID from the bank account that he had that was purchased by a bank that was purchased by a bank that was purchased by a bank that was purchased by the current bank and that may sound funny but I actually had an account of me at a bank that was acquired and the acquiring bank was then acquired and so on and so on. So the account that I had was a legacy account that went back numerous years until I moved out of town and closed that account that you may maintain multiple layers of cross-referencing between an original identifier and the unique identifier that's used for the cross-referencing of both the internal and the external client identifiers. We want a relationship management to give an identity or a customer, say tell me all the relationships that are created with this customer. Tell me all the households that individual has lived in. Tell me all the relationships that that person has lived in. Tell me all the products that that person has purchased. So finally the relationships and then the management capabilities for management was essentially to establish a relationship or to break it into for the establishing relationship and making a relationship or establishing a relationship between those two entities and associating what the nature is of that relationship so an individual can be associated with the associated, can be married, an individual could purchase a product an individual could live in a particular location. So a generic capability for relating two entities together as long as you can meet the nature of that relationship and then you want to be able to break or deactivate a relationship. So a married couple, they were having trouble and they just got divorced. So now we want to break the relationship. Now again, I use the term deactivate because we don't want to forget that those two individuals were related. We want to make sure that we keep track but rather that it was during a particular time period for you all, let's say, a household license to use all the computers in your house. When a couple gets divorced, does that license change or does it survive that divorce if they happen to move out? Now they have two computers that were on the original licensing agreement but now the computers are located in two different locations. So some of the questions that you'd want to be able to verify, for example, some of the issues that existed at some point. And then there's a kind of governance and that's a real two governance activities reaching two records when they can determine that they represent the same entity. So I've got a record for David Lotion. I've got a record for Howard David Lotion. Now I've determined that those two people are one in the same. You want to take that data and link it together to demonstrate that, in fact, refers to the same individual. And then there's the split, which is it turns out that there's John Smith and there's John Smith Jr. and they are, even though they live at the same place, they really want to, they're not really the same person. So I need, I incorrectly merged them together. I want to be able to split. So I've got that one record because I've determined that there was a false positive met and that they were incorrectly merged together. So that's, those are the core services. And the top record is the application services layer. So this is where the case goes directly to what the application might look like. And this is, the case here is to say, what is it that we're expecting to see and what is it that we're expecting to do? This is where we look at those integration points because we'll say, oh yeah, you know, we have part of this business function at the website, for example, the web portal that the customer comes in. They want to know whether that customer actually, that individual actually is a customer already or not. So you would have search and retrieve entity information. So you might invoke that course, search, retrieve, retrieve. The core services back here are a customer record. Well, that, the business process may actually direct you to say, well, if there is a customer record for this individual, provide me with a customer record. If there is no, if you can't send me a customer record, then create a customer record. Fill in these pieces of information and create the new record in the master repository and then return the unique identifier back to me as well. So the, that's the service or capability that is provided to the application. Another design of unique identifier, that is unique as part of identification services, you know, whether that's linked into, to your security or your location, etc. These are become a component of the enterprise-wide identity that goes beyond just is this a person that I know, but rather is this a person that I know and how can my knowledge of that person be used to, whether that person is authorized to do this kind of activity or to get access to this kind of data or to purchase these kinds of products or to describe these kinds of medications, etc. So there's the assignment of the unique identifier and its integration at the higher level into management capabilities that may already exist. The enrichment, so it's not just, but rather, rather at what does the application need to append information about what an example might be. I've got a new customer file that came from the people who signed up overnight for a new customer account. Now I want to go to find the master environment to see whether they existed from all the other customer accounts that we've accumulated through all of our corporate acquisitions and then update these new accounts to indicate whether we already knew about them or whether they really are a new account. So that's an append or a reconciliation of identities that is an enrichment or an enhancement of that data. As cleansing, I look up the record to see if I street address correct, whether it's spelled correctly, is the town where that street is, those types of inclinations. The alignment of otherwise assigned identifiers for cross-referencing where I want to be able to link the customer account for individuals that are made to demonstrate that discounts are applied for both of their individual accounts because perhaps a business rule says that their combined sales qualifies them within the household not just by individual accounts. So that's been set up and we have batch services for doing cross-referencing and matching and identity resolution and updating. Relationship management didn't really make it on. It's got a little bubble there in the middle. Functions associated with entity relate to or more entities associate that nature of the relationship. The business function level. So it's not establishing that just creating the relationship of record but rather from the business function standpoint when I'm looking at these for linking individuals or linking individuals and putting them together, etc. So those are the types of APIs or interfaces that you expose upward to the application. It's looking for where does a customer record get looked up in the subsystem, now replace that with an invocation of a search retrieved entity record from the master repository. Incrementally, you're looking for where those integration opportunities are and then replacing them with the master interface as opposed to the data directly. The high-level view. Now what that means is that we can start looking at different ways again and not necessarily I mentioned this before. You don't necessarily have to create a separate repository but rather you could get a virtualized mechanism that federates data across the different data sources and then you use that master data index as essentially virtualizing a view of that master repository without having to create a physical copy or replica of that data. So the index maps a canonical representation of that specific entity. Maybe it's customer, maybe it's provider to the locations of the data in the original source and you record it in your master data index and the accessibility in a view provides the underlying business rules that are relevant to the consumer of that data at the point that the data is actually accessed. And this is where another item I've been harping on for a while which is that the semantics associated with what key organization thinks is a quote-unquote golden record may vary vastly from what the different business functions may think is the culmination of a quote-unquote and so instead of enforcing one set of rules and then trying to shoehorn that into everybody's use, one set of rules that are federally repository and apply the different business rules within the different contexts when you're delivering the data to the consumer. So when you're the sales department and you're looking for the customers that can afford to buy your product, that would be different quote-unquote set of customer records than a customer support and you need to find out for anybody who's calling in to have the right to get phone support so they need to upgrade their support contracts so that they're able to get that level of support. And that means looking at a different set of criteria as to whether an individual is or is not a verified support customer. So there's different semantics and you might apply those semantic rules, the business rules, not when you're putting the data into the repository, but instead you are applying it when you're taking the data out of the repository. Understanding how data virtualization and federation works. And here's kind of an example where, number one, it's a bunch of consumers of that master data. And then looking at those master data services, which are the same ones that we looked at back in this slide above the data access, go through some canonical relational view that transforms and maps the request that goes through the master index to the original sources here, pulls the data back at number seven, applies those rules, whatever the business rules are, either for parsing standardization, normalization of the data, cleansing the data, reorganizing the data, filtering out the data, and passing it back to the layer that provides that relational, the canonical version of the view and feeding that back through the master data services to the client who will then be able to look at the data in the way that they anticipate or are expected to be able to see it based on their own set of rules. So depending on who the consumer is, number one, then different rules are being applied when the data comes back at number seven or maybe even at number four where the filtering of what the actual requests are that go through that master index becomes as well. Let's summarize somewhat. Number one, if you're looking at a strategy for developing your master data services for application integration, the first thing is look at your business processes. So you can understand where the uses are of shared master data if you're a business process, let's say you're a sales process. And we're going to figure out whether that customer already exists within your system or not. Do you need a new record? Do you not need to create a new record? Can a customer make a purchase with a guest without actually registering with the customer? If that customer does do that, what information do you need to capture? What are you allowed to keep track of, et cetera, what are the policies and governance policies associated with personal or private information? These are all the types of things that are at the business process level depending on the uses of that shared master data. Second, look at information points within the business processes. That would make use of that master data. So is it a search? Is it a retrieve? Is it updating a customer record? Is it leading a customer record? Is it merging a customer record into a single customer record? Is it a match? Is it a duplicate analysis? Is it a mail merge, et cetera, the different integration points? And then look at the user. How is the data being accessed? What are the performance criteria? Does it need to be a database join in a database query, or is it an extract of the data which has been interacted with with another data set, or am I loading it, taking it out of a big data repository and dumping it into an analytical engine, et cetera, how am I using that master data? Then the term required function is that it's not already provided either by the business function application or by the MDM vendor's product. And look at the existing systemic support is for that required functionality. So an example might be, I want to get access to the data. The master data is actually sitting in two different repositories. I want to run a query from D2 or Oracle to the data that's sitting in in a SQL server, the master data is in the SQL server database. What does SQL server provide in terms of federation to allow me to run a query that crosses multiple data sets? Now once I'm able to identify what system support exists and what tool support exists and what doesn't, then I need to look at what the application integration services are and what the APIs are going to look like and then I can design and implement those application integration services. A little bit of work needs to be done. Typically it's going to be on top of capabilities that already exist, but are definitely going to need, I guess I'm going to call it a quote unquote cool tool to make sure all the pieces that that exist fit together. I mean it's interesting you might find that one capability exists on one platform and another one exists on another platform, but two platforms are not compatible. So that's something that you may need to create or get access to some kind of application or some kind of data integration probably allows you to get access maybe through JDBC or ODBC from one platform to the other platform in a more seamless way so that you don't need to be concerned about the fact that the two platforms don't actually talk to each other. That may say you need to go out and get some product and then embed that within your environment. Layering all these components together gives you a practical approach to understanding how you can put the capabilities together to allow an application that is already in production to eventually transition from its own copy of data to full integration with that master data repository. So that's the general just here and I think we have a bunch of questions that already came in. There's certainly opportunities for other questions to come in. If you have questions that pop up, please feel free to contact me directly in my contact information here and I encourage you to point your QR scanner at these books and go out and buy two copies for yourself and one copy to share. We'll have a couple of news about what is called, I'm looking at it right now, one is called Data Analytics from Strategic Planning to Enterprise Integration with Tools Techniques NoSQL and Graph but if you go to a search for Big Data Analytics and Lotion that one will cross and then there's one that I wrote with a colleague, a rifle for calling, using information to develop a culture of customer centricity. We'll look to try and move up the frame and stop talking about the data itself and look at the data in the context of how we want to be able to use it and I think that any conversation about master data management certainly has been to look at at the use of that information for purposes of customer centricity so if you're interested in those ideas also please drop me a link and give me a call et cetera and I will think the University and Shannon and Tony and all the folks over at the University and I encourage all of you to read it out for EDW and I will see you in Austin. Fantastic presentation if you guys have questions make sure you get them in the Q&A and one of the most popular questions that always comes up is asking if you'll get questions a copy of the slides in the recording which I will send out within two business days so by end of Thursday you'll get that in your inbox I'll also include David's contact information and the list of books here for you make sure that they're in our D-diversity bookstore which is through Amazon we also have a partnership with Morgan Kauffman so we can give you even a discount on David's books there so I'll make sure and get all that information to you so David coming up in the questions here should I manage changes to master data master data in master data repository or does it belong to analytical applications which the concept of slowly changing dimension has been already established? That's actually a good question and one that I've heard numerous and I think that the challenge there is that if you are if you know it's kind of like a trade-off right which is that we already understand how slowly change the dimensions work for the analytical environments like a data warehouse or a data market and necessarily bubbled into the engineering of the master data repositories because the people who have built the master data repositories have you know and I'm just analyzing here I think they've largely they're focused on the mechanics of of identity resolution with with management of the data as a byproduct so I would I would prefer though that the master data repository also be able to handle the whole concept of the slowly changing dimensions and that when the data is syndicated out to the data warehouse that you no longer have to worry about it as you know a priori from the data warehouse perspective because the historical changes are inherent in the way the master data is being we look at the things that we haven't done enough of which is looking at the master data in the context of how that data is being used then we'll certainly start seeing things that we really thought needed to deal with in the master data repository like like I said in the mention or historical timeframes and and and and so finding ways of of relationships associated with with individuals at different points and directly so my answer is if we're going to be managing a master repository then we'll try to to to put that within the repository next question consider that you may not manage all attributes of the master data subject in the MDM the answer is then where the in question managed managing dimensions in the MDM would add a level of rigor to the process managing change so you know I look at this as if you've got a little dial on your your your data model where you can store a very little amount of of information in the in the in the in which that is that you're you know you might store it the the least amount that you need in order to do to do resolution of identity and you could turn that dial all the way to the right and and eventually keep all the information about the entity in master master data repository and yes you're right it would add a level of rigor to the processes managing change but I think that that's the that's exactly the the that we haven't addressed over time which is which is looking at the master repository as as a core is a part you know a both the the transaction slash operational applications and the political applications so when we look we're making a copy of data that we can use to do blah to do data or to you know do reporting or whatever to do identification and now we're we're down the whole concept of being able to merge data together because because we're using it in in its validated form and begs the question it's whether you why you would be consolidating it in the first place so that's why you know I look at it from two different alternatives one is you're going to build a repository then build a repository and and embed it in your policies the position that says anybody who's using a anything that's competence repository has got to has got to now comply with with the the rules of the repository for the the other side which is my requirement only includes identification information all we're doing is we're federating the original sources in in their original place we're not actually moving the data at all and we are materializing the view well on demand as opposed to making an extra copy so you know like you can adjust accordingly often I've encountered questions from my clients who are looking for quantitative results for beginning an MDM solution they want the bottom line what's the ROI well that's a small topic I mean I'd be to point you to my data quality improvement book I have a chapter on the on assessing the business value of of of information in fact I'm looking it up right now because I don't recall what I actually wrote but I do know that I put a the first chapter talks of the business impacts of poor data quality but really it's a it's a measure for really for understanding how to characterize what different different areas of the business and their depends on data and come up with quantifiable measures of what they really means from a financial perspective or from a risk perspective and and you know come up with a return on investment you really need to know your expected turn and what's your investment and you know somebody when when the clients are looking for quantitative results for beginning a solution you know really tell that unless you've done an analysis that says that says what are the gaps or what are the fail failure points that are associated with your inability to to probably resolve unique identification fixing it and number three when the business has something to make the organization changes that that are necessary to make make use of that so that's why I would point you to this other book which is information about cultural customer because it really talks about well yes we do need customer data but we also really need to do that says that we know what we're expecting to get and that we can our staff to to to work with the customer in a way that the customer it was from that 360 degree view if I actually had it question what is the lowest MDM data out latency achieved with a non federated approach I've heard 15 minutes at best well that's why don't we turn into a a blog post on the University site to say to make it a challenge tell me what you've done how much data you've you've pulled out and what was how fast you did using it depends on what you're using depends on how many machines you're using it depends on on what's the speed of your network it depends on how smart your your your your data analysts are in terms of writing queries I mean there's data that you can get out really quickly but you know if you write the query the wrong way it could take you five days so like there's so many variables to that it's impossible to really it's really 15 minutes sure so how is MDM different from reference data management RDM from a data governance perspective is the industry moving MDM and RDF away from being part of the demo wheel okay first part of the question is master data management different from reference data management because your reference data management is master data and once you're storing master data as master data it effectively becomes reference data so the answer is it's not really shouldn't really be that different than people who are telling you that it is well you know send them to me so we can and we'll have a panel there's a question for a conversation but the second question I'd like to direct you towards the people who are putting together the data and they can tell you they can answer that better than I am and that's it those are all the questions we have and we are running out of time David thank you again so much for such a fabulous presentation again you can meet David in person in Austin, Texas Enterprise Data World 2014 be sure you check that out EnterpriseDataWorld.com and as mentioned I will get a link out to links to the recording of this event and to David's books and information and so you can get all that information by his books and have information. David thank you again and I hope to see everyone in Austin.