 Welcome my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. Thank you for joining the latest in the monthly webinar series lessons in data modeling with Donna Burbank. Today Donna will discuss data modeling and data integration. Just a couple of points to get us started. Due to the large number of people that attend these sessions he will be muted during the webinar and we very much encourage you to chat with us and with each other throughout the webinar. To do so just click the chat icon in the top right hand corner of the screen to activate that feature and for questions we will be collecting them by the Q&A section in the bottom right hand corner of your screen or if you'd like to tweet we encourage you to share highlights or questions via Twitter using hashtag lessons dm. As always we will send a follow-up email within two business days containing links to the recording of this session and additional information requested throughout the webinar. Now let me introduce to you our speaker for today Donna Burbank. She is a recognized industry expert in information management with over 20 years experience helping organizations enrich their business opportunities through data and information. She is currently the managing director of global data strategy limited where she assists organizations around the globe in driving value for their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. And with that I will turn the floor over to Donna to get today's webinar started. Hello and welcome. Thank you Shannon. Pleasure to be here and just to highlight I think what Shannon already mentioned if you are a Twitter fan the hashtag for today is lessons dm and I personally are on I'm hang on Twitter at hashtag at Donna Burbank. We often have a discussion going online there as well and please also feel free to use the Q&A. So many of you have joined us for a month a month which is always a pleasure and thanks for continuing to be faithful and then joining each month and for those of you haven't we want to let you know and I already saw a question coming in about the slides being available. This is a yearly series and all of the presentations that are in the past are all on demand both the slides and the recording. So if one of these peaks your interest you can go back and see some of the previous sometimes I refer back and forward to different presentations because it's sort of a series. And as you'll see in the series the title is lessons in data modeling but although I will be a self-proclaimed fan of data modeling and could talk about it all day was interesting about data modeling it's not data modeling itself but its application to the enterprise and the application to the different initiatives across the enterprise. So you'll see kind of a wide range of topics today and today's is integration which is also a very broad and wide topic and that's one of the beauties of data models because you can kind of go across all of that. So that's what we will be talking about today. So just to start out I mean one of the things I love about data too is that it really is one of the key technologies or piece of technology that is both business and technical and when we come to data integration often we think of things like ETL and data warehousing and you know data virtualization but so much of integration really is from the business and the beauty of a data model is that whether it is a technical transformation or and or business transformation because they're closely linked the data model could be that common reference point you know what do we mean by customer what I mean by product what is sort of the canonical central model that we're going from and we'll talk about each of these and you'll see in the middle I sort of put etc because we could we could talk a week about all the different ways to integrate and all the business drivers but some of the top ones that I'm sure you can think of others we talk about technology data warehousing I already mentioned that seems to be at least historically what a lot of people think of you know sort of the traditional take from the source due to ETL put it in the warehouse staging area warehouse report that sort of thing but there's other things as well now the data lake seems to be getting some favor partly because we can store different data sets and things like that MDM or master data management trying to get that single version of the truth whether it's customer product vendor etc these are interrelated you know you might have an MDM that feeds the warehouse or lake or you know but you know sort of to call them out and one that I will say has to be forgotten sometimes amongst data people is API is an application integration and we'll talk a lot about silos in this presentation and for whatever reason and I hope it changes that application developers and data people sometimes sort of don't talk to each other and then the applications use data and an API is a way to get that in and out so that is also something that can reference a common data model but maybe more interesting which is you know the why we're doing it is the business drivers so mergers and acquisitions that's a huge one right so I'll talk more about this but sometimes the main reason people buy companies for their data unless you understand that data what helps you do that is a data model you know we'll go through these efficiency and agility you know how many times do we need to rework the same data or try to dig down what something means and if you're spending your time doing that you're not able to do the thing on the left which really is you know the innovation and collaboration which you know if we all we all everybody's now talking about data driven business but if you don't know what you did to your have what did you have or what the data means it's harder to do that and the term I like to use really when you're talking about your enterprise this idea of an enterprise knowledge inventory you know basically what we're doing at the end of the day with data model is that diagram that reference that roadmap that whatever sort of metaphor you want to use for where our data is and what it means and if we're thinking of we'll talk a lot about this in the presentation for thinking of data is our core IP which it is we better have but I know where that is and what it means so we'll go through each of these around the circle as we go but one more thing to sort of set the stage and and if you've joined some of the other webinars you'll you'll find this familiar this type of what is it the pyramid that we should have go through because when we talk about data models there are different levels and data models mean different things to different people but if you look at the left when we talk about the data integration team usually made up of several folks from down to the database and developers that are actually at the physical table level up to business stakeholders that might be doing that merger and acquisition they still need to know what do we mean by client versus customer versus asset all these sort of core terms so everybody across the business has one view of a data model that might make sense to them so literally up at the subject area view we might just be doing basic scoping you know do we have asset data what does that include is it physical assets is that I intellectual property assets you know all these sort of definitions we have customer we have product and so no company no matter what your funding could boil the ocean and do everything so often that's a great way to start what's the most important thing and say we are talking about an acquisition what's the most important thing well we bought this company because of their products let's get the product data right and integrate that first you know for an example then we can go down to conceptual layer which you know the business layer I often like to say because that's really where you're getting some of those core concepts and rules what do you mean by location you know some of these things and if you're new to data modeling and you're joining this call I used to think the same thing you know people would that was always the common one to be at a data conference and people would say you know let's try to get a single view of customer and everyone would laugh and I'd feel really dumb because how hard is that so we have predictionaries for right I know what a customer is but anyone who's worked in any company for any length of time understands that something is simple the customer is the hardest thing to get everybody has their own view is it a current customer is it a lapse customer is it a premium customer is it a customer who's in default on their payments is it a customer that's on maintenance your software yes so many different layers of something so simple that you want to make sure not that they you can't have these different players but at least know which flavor you're talking about when you're having those conversations because you've gone to report to the business or report to the street if you're a public company you have given numbers right and then there's this layer the logical layer we're still business focused and you'll see that you know business people like business analysts might be looking at a data architects but that's where you get a little more clarification on the detail your kind of customer have more than one account you know in data modeling language your relationships and cardinality and things like that putting attributes and data types and all of that sort of thing you know there's discussion in the industry is this only for a relational database I would say no because he often you can that's a beauty of a data model you can generate from many data modeling tools a physical database from that you may want to optimize that for performance or a normalizer denormalizer whatever you do on a physical level but that is one of the nice things of this but it isn't just for relational database because those are the core rules your business kind of customer have a one more than one account or can't they you know that's something that's not the databases and decide that where it shouldn't something the business decides and that should be true across any of your applications I and a lot of the data modeling tools can now translate logical models into many different formats XML JSON whatever it is it doesn't have to be DDL for a database so this is kind of the set the stage and the main point is everybody when we're doing data integration it's a business problem it's a technical problem and everywhere in between and at each level you can have a data model that should be able to fit your needs and again I like this term this idea of an enterprise knowledge innovatory so by my day job as a consultant and honored I would say to work with a lot of the largest companies in the planet helping them with their data strategy and I find it fun which is why I'm still in the business that so many companies now are coming less saying you know we just need to integrate a data because we have a lot of it and we want to stay one cost you know it's more I want to do something strategic with my data and let's start there at the business level see what new things we can do which I get a kick out of and I think a lot of people do it is all the all the technology we utilize now for data but if you're going to do that you need to understand at the very basic what your business landscape is around data if it's your IP you better have a document if I have this great new product and I have a patent I want to register that right down somewhere right we all know that financial asset and money is an asset well we have a whole accounting department accounting systems and charts of account all could to use that so same thing with data if it's an asset you want to manage it as such and I'm a big fan of just starting at the very high level and this is obviously made up and someone could probably find fault with some of the relationships and things but just as a bear with me we could all do that as modelers any company and how big I don't care how big to be the US government it could be massive you should be able to do a one-page kind of enterprise business model there's just an overview of what the business does and you know the another ever-ending question is can business people understand data models resounding yes and I've often found business people like them almost more than technical people because they are simple and even if you've never seen the data model or you don't know what this company is you can pretty quickly look and say okay it's some sort of retail company probably because they have customers and products it's probably a type of company that they probably don't sell ice cream cones because you wouldn't invoice somebody for ice cream cones so they're you know some sort of either wholesale or large retail they have stores at a location with staff so it's fine not you know only online etc etc if weather and well could it be online sporting why are they tracking weather right a lot of things you can tell just by this one page right and once you get into it there's relationships so you know a staff is only at one location okay so they don't they don't change locations they're probably sitting in a physical store it's not like a sales rep that's probably going across regions right so I could tell a I could talk I know I could tell a whole story about these you know six or seven little boxes right and that's the beauty of a model that it really should sum up a business on one page so you know provide a little note there you can tell this probably isn't a health care provider because they probably don't have products and maybe a patient's a customer right so again they're very powerful and it helps to set the stage so if we are a data-driven business how could we use this weather data for something different can we predict sales patterns from it right we might be using it for insurance purposes to see how much we have to ensure the store by the ocean for for flood but maybe the product people say you know if I knew that then wow maybe we would predict seasonal sales or something right but if nobody if you don't even know the date you have you can't do that so this is basically your basic inventory of your biggest asset which is your data so and and you know there's been history and a lot of folks who poo poo them where you know maybe we data folks that are old like me you know there have been enterprise data models that took a year to build and pick up six walls of space and no one can understand them and I don't think it had you I would say never skip this right you could on a whiteboard in an hour probably if it's in people's heads at least get a good start they can really start the conversation and you can be agile and you can iterate but I think there's a lot of this in people's heads and just doing that even on a whiteboard can help you know clarify a lot of things do we invoice customers I don't think we did I thought they just buy it online right so that can generate quite a lot of discussion just with these simple boxes and lines and the beauty of the box and line is that people can this is people can generally understand it this is many things and this is one thing right they're not that hard to understand so the other thing is I'll just iterate this again if I haven't already that a data model not only describes a business but it describes your business right so I've seen so many companies struggle with this and maybe nobody's fault or some people are doing it out of malice I don't know so yeah so sometimes you'll have an ERP system right we might say that's my customer master because it's here I don't need a model the ERP system is in there but that is not necessarily your organization you you know not no organization is cookie cutter I might be an insurance company I don't just have I probably do business differently than everybody else and I might use terminology this bit different everybody else that's why you are a company and that's why you're in business and that's why you have strategic advantage so not that you shouldn't use these tools like ERP and EHR for health records and partner organizations might need to share information but you shouldn't have to redefine your business just using the ERP system I guess is what I'm saying and I won't put a little blame if there's always vendors in the call don't let the vendor hold your data captive right because it is an asset and I think many many vendors make it difficult to get your data out and stop standing for that so one nice benefit of having your organization's model I at least know my data I might have to transform it to send it to the agency that's funding me or I might need to transform it and send it off an XML to the partners I'm working with or maybe the government needs to report it a certain way or I'm getting it from EHR or maybe the ERP systems managing my invoices but they shouldn't redefine how you are doing your business so that's some of the reasons transformation comes in but at least you can have that conversation so and I often recommend clients when they're buying a product ask how do you share your information and can you share information I shouldn't have to pay to get my data back out and could you publish a data model a lot of vendors are getting that more and more and it's great they understand that that's an asset and you want to use it and it'll show how you can integrate and that some of them are actually marketing that way of how easy it is to get out and thank them for that because it is your data I should be able to have it so that is the value it is is valuable to have your definition of your data so similarly a similar question is can I and should I use an industry standard data model you know I'm trying to get this view of my company and I'm sure I'm not the only person that's ever run an insurance company or a health care company or a non-profit organization should I just buy an industry model and use it because I could save a lot of space yes and no so I think they're great there's a lot of ones you can purchase there's a lot ones that industry have built themselves there's open source there's a lot of different things great as a guide because there's nothing more scary than an empty piece of paper right so just to start and say you know I've run a lot of workshops to do that and it shouldn't be scary you can start with a whiteboard as I mentioned but it is anything I you know I write a lot but starting with a blank piece of paper still scares me right so nothing and again you aren't the first person that's probably ever run an insurance company so there's an insurance statement my all means take a look don't just take it off the shelf because again you probably run it in your strategic advantages the things you do differently so you should customize it and I'm sure nothing you know is a hundred percent so that's really your choice I'm not for them I'm not against them just use them wisely I guess I'm saying there's also data model patterns and there's books out there that folks have published again how do I how do I do a model for invoicing I'm sure not the first person to have thought of this so yeah use other people's ideas but just make sure your model fits your organization which may be obvious but sometimes things aren't until you step back and think of them that way so I talked a bit about murders and acquisitions and this has always been the happening beginning of time at the beginning of since we've had companies I'm sure but I think more and more I've worked with several companies that you know they've done acquisition and I said we bought it for their data either indirectly we bought it for their customers and their customers are for data and but more and more companies think they've got great data and we need that data well as you data folks know this data isn't just this nice clean thing that you pass over and so the the benefit of that is that it does hold sort of the rules and the history of a company and it really is your IP of a business so you need to take an inventory of that I'm sure when you buy a company you need to do an inventory of products and you know so you still need to do a data inventory what data do they even have and that's nicely done through a data model that process of you know reverse engineering you can a lot of the tools in the market just point it to the database and you'll get a model back which is great I mean so that's a great start at least I know what they have if you're lucky enough and I know that even though we we are a very automated company I actually was working for a company just this week that acquired a very small company and they asked for the customer list and it was literally in binder notebooks on paper so yeah that still happens you're a small shop you know they bought them for a strategic reason but it's not like they could just integrate their ERCP systems right so but I think even if you can't get the data and everything's in everyone has SQL server and they're all they all have client information and you just reverse engineer it doesn't stop there because I think the key part is there's just again you have two different companies so they're going to have different disparate business processes and these are often shown through the data they're often using different terms for different things and if you ignore these and you just say guys weekend this and you know sometimes there is reality we all been in the business we know that you can't you know everything can't be perfect and you want to integrate these companies as fast as possible so you can start making money but I've seen too much that we skipped that step and then one of these differences in business processes comes up in a year from now how we're actually tracking customers or invoices or pro how we account for products and it's a headache down the road and people spend hours and weeks and locked in rooms literally over weekends trying to figure this stuff out so that is where data model can help so the title of this is lessons in data modeling and we don't generally it's not a true classroom lesson but we're going to do a little test here and no one can fail this test it's more of a discussion and I will finally look at the chat I'm horrible looking at chat while I talk but I will stop and pause so I thought this might be a helpful example so you have a hypothetical organization A on the left in blue and an organization B on the left in pinkish whatever that color is and these are the two data models you have one customer if you've ever read a data model you know a customer can must have more than one login and a login associated with more than one account and a client may or may not have more than one account so just looking at this without any other background what issues might you think might arise trying to just integrate these two customer accounts or two lists of customers from the two organizations you want to give a pause and let people kind of type stuff in the chat somebody has to chat you guys are always very verbose until I ask you something if not I'll just keep talking you know I can do that thoughts well some of the obvious ones there's something called the client and there's something called the customer right do we know those are the same things we could assume they're the same things but maybe they're not maybe really a client is a different thing because here we have a customer has a login right someone else picked out something that I thought might have been the harder one but actually picked up right away that someone on this side see if folks are familiar with the IE this says a customer may or may not have an account you can have zero or one or many and this side must have an account because that seems like are we just doing semantics is that a little thing that could be a huge deal right it's on one side could it just be anybody who we've ever talked to is considered a client it could be a prospect it could be an existing customer maybe they don't have an account yet but we put him as a client just because we've been talking to them at a bunch of conferences or something and I've seen companies I've worked at companies where they've made hugely embarrassing mistakes sending things that they thought were just in customers to prospects and vice versa you know how many of us have gotten an ad for a product that we already own right and someone says yeah try asking sales and finance for the definition of customer right there's going to be a lot of different organizations you know someone else has a good point of is the customer what are we talking about customer is it an organization is it a person so we just have boxes here we don't have attributes so that's a great one are we talking about people maybe the client is the bigger organization and then the customer is the person within that organization somebody else mentions that assuming they're the same or that they're using the same ID you know when we get really down to the physical level are the data elements the same so yeah couple people brought that out these don't have attributes so it makes it hard yeah I'm just keeping it super simple but that's often when you start looking at the same they might be tracking different things I'm trying to read and talk but you know I can't do a talk to that yeah could they this one I thought most folks as you mentioned earlier this idea is that we have a login and an account so how are we identifying this customer could we some people identify it from the login some people identify it from the account are these really the same things account and maybe someone hit on it I hadn't even thought of that that maybe account is the big enterprise account with clients and then customers have logins and maybe their account is really because there's something different but what you go on all day and that's the view of data model it really has all these questions that's the type of stuff when you when you're doing mergers and you need to integrate data from the two companies take the time and often the data model is a great way to do that right and so looking at these parent child relationships you know and looking at some of these business rules can can really clarify a lot of things you might not be able to solve it right away but at least know that these have a different different rules someone just typed in that's a good one too that organization B maybe they have no brick and mortar customers they could complete online and maybe the organization A has both retail and online customers or maybe they just have a retail customer right so you can infer from what on the one side two little boxes and one side three little boxes with some lines and shapes a whole lot about a company which I think is kind of fun about data models and I love their concateness and logic but that is so often why you know and we all get it right we're all busy and and your boss says hey we have a month and and often the people doing the merger I'm sure aren't thinking of data structures they're doing them every thinking of data guys we need to just merge this customer list and we get it done by next week and so it's probably just very we have to do it you know here's the client and here the customers good luck to us hopefully there's the same thing and we merge them together and then later we find that one are very different things we probably should have asked on the login because that's really what the client is or whatever so spend a lot of time on that but we should right because often it's the technical stuff that's the easy you know I can I can make the data types the same and put them in a table and you know kind of munch them together but it's the business stuff that's often the hardest thing to find and that's the beauty of a data model because if you just look at the databases themselves that might not jump out as much but abstracting into a model like this really helps and business people can kind of get that you might just jump out and they're like why do we have logins for a you know corporate company organization that only buys at the store right so clarifies a lot of things okay so efficiency and agility and maybe you guys can all relate to this or at least one of these people but every single company ever worked with has these people and some sort of form and you can probably relate to one so it just takes a lot of time when you have this you're trying to integrate data and you're probably trying to integrate data to do some really cool stuff can we integrate these two customer lists and do this great new marketing campaign on these new customers we just acquired yeah but the data types are all wrong and they're in seven different formats and it's going to take me like two weeks to get the stupid list together because I'm just doing formatting things and I don't understand is a client the same as a customer and I have to ask six people to try to figure out right so it is harder than it seems on the surface so these silos often exist not only that of malice but just you don't know there's no where is the roadmap for where this data is and I see so many companies where people are some data detectives you know I want to find out this information and you call six people and I think Joe knows and didn't you know et cetera et cetera so you've got the sort of I don't know these people full disclosure but the sort of annoying lady you love to hate in the lab she's happy you know she's got her spreadsheet that works and she's this great spreadsheet with customers by region and age and income level and I love my spreadsheet and she has it and she publishes it and she just thinks she's the cat's meow and then you got the person in the middle who's just would love to get to that level of everything being formatted but she's spending like three days just trying to mismatch you know match up region codes which are different across each group and it takes their own different spreadsheets and ones in SQL server and ones on the data lake and she's obviously frustrated and then guy on the right he was hired to be this you know great business analyst and he wants to find all these great new insights but he can't you know I have to just get income levels for our customers and you can tell us body language this is so dumb you know I did all this great studying at school and I'm stuck here can't even get the great stuff but this lady over here the one that's got kind of smug you know she has all all the information and this guy just doesn't know that she has all the information and I'm sure she's not hiding it you know she's smug and annoying but she's not mentalist she says you know she's got her act together and this guy just doesn't know that they have it so if you have this sort of common data model where at least there's an inventory of what people had that would help I mean that at least is that common standard so folks are saying and we need mdm well yep but the first step in mdm is getting that common data model right what information do we even have what's our roadmap of information if I could just pull from that inventory at least know what we have if we have common and region codes all in one place and format with the same stuff so yeah it's not earth shattering excitement unless you're nerd like me actually finds that kind of fun but yeah it's necessary so one of the folks said and I will disagree with that person mc that the employment rate might go out if we solve this one I would beg to differ because I think it's not that people get rid of people but people can actually do work that's valuable and they were probably hired to do you know we have so many people saying I'd really like to actually use the date of insights and if I could show my boss all this great you know if we could segment our customers by income level and give them products at the right level I'd be here open now I'm stuck me on my desk you know munging spreadsheets so I don't think work will go away I think it will just be more productive and we can actually do the cool stuff of why we were probably hired for our job so so much of this if we had a common data model and generally these type of people want to go in and talk about data modeling they say yes thank you if I could just have a standard format for customer data and you know we have standard drop downs for region code lists and you know validation for male female codes and that kind of thing that is really a huge benefit of data models which leaves me kind of the point I just kind of hit on it is that idea of innovation and collaboration so if we think of the enterprise data model as a sort of catalog so you have this enterprise model and ideally if we think of that pyramid at the top you'll be able to drill down from this high level down to the logical down to the physical to actually see where this data is but at a minimum you know it's there so here's this she's happy she's she's she has no problem because you have the data model it might be as simple as oh I didn't realize the insurance department with tracking weather you know she's handling product information so she sees the little kind of box down here on the right and she goes wow when I see this data model we have weather data that I could access that is really cool so you know our company we have it because the insurance department wants to know about weather events for our locations and should I have flood insurance in North Carolina and you know wildfire insurance in Colorado where our stores are but she thinks wow if I could picture you know maybe people could I see trends do they buy more stuff when it's raining because they go to the store or I'm an ice cream store do they buy more when it's sunny because it's hot you know so anyway but she wouldn't even have known that was there and maybe they need to do something like this and create a relationship between product and weather right so at least it's that innovation and I think a lot of folks and I think of agile I think of innovation and collaboration and all the new sexy stuff they always start with a data model which is wrong because you can't innovate if you don't even know what's there so having that catalog is a great way to say if this is our inventory of the company's data can we all see it and that's kind of if we go back to our grumpy guy here on the right why can't we get income levels for customer well I don't have attributes on this next model but if we did or income was a table he would know he would know that the smug lady on the left back there she's already has it he just doesn't know so again if you had that public data model you can at least see what's available yeah I know this data access issues and maybe you can't get access to that database but you could ask you know but if you don't even know what's there then that's a start right so you can start to see and that's a beat of a data money can start to see these connections that maybe didn't exist before oh we've got social sentiment analysis I didn't know that let's see what people are saying about us on twitter right so again that's kind of you can start to see those trends I mean or integrations okay so those were some hands and we could go on and on but I think you know almost this last one is the biggest one especially now that everyone wants to be data-driven everyone wants to do all these great new cool things artificial intelligence and all this great stuff which is awesome love it but starting with the model it helps you kind of this is our building you know if you're billet baking a cake these are the ingredients right what's the data we have that you can do all this great stuff with and I've talked to the folks that you know a lot of my customers are doing cool things like AI and machine learning and predictive analytics and all that they still have a model because you need to at least know what you're working with so a big fan of model so we get back to the technical which again is probably what a lot of people think of when they're thinking of data integration here's the favorite one that sort of the tried and true right data modeling for data warehousing business intelligence my line that you might have heard before if you've joined these you know data modeling is the intelligence behind business intelligence so what people see if I have a business user could you just show me all customers by region and he sees the report and more and more business folks are becoming more data savvy and they kind of know the complexity they know the you know what do I mean by customer what I mean by region or is there master data reference data for region codes all that sort of thing but at the end of the day like anything I might be a fan of cars but I really just want to start in the morning I really don't want to have to know all the details of what's happening in the end just to get to work so this guy has his day job all he really wants to know is show me all the customers by region please and I want a nice report at the end but we in the data world know there's a lot of things to do that so traditionally we'd have the data warehouse where you might start with just when we think of that inventory we can quote reverse engineer the source systems and get sort of your relational physical and logical model about what is the definition of a customer where is the data stored I'm sure customer isn't probably in one nice clean table it might be you might have a CRM but I'm sure people have spreadsheets other stealth sources or at least we know it's in one place but generally it's kind of scattered across many places so what does it mean where is it stored very importantly how is it structured so if you have more than one source I'm sure you have more than one data structure that's you know even it's something as simple as data type which can wreak havoc when you're trying to integrate when we think of stewardship who owns the data right so can we integrate from these source systems who's the steward of it that knows what that database means by the definition of customer all that sort of thing can be you can tag you know stewards and data models and that sort of things so just trying to understand the source systems which you know we're kind of skipping maybe some ETL and staging area and that sort of thing but for simplicity you've got a warehouse and you'll see that the tiny little model down here is formatted differently so I know star scheme isn't the only way to do a data warehouse but very common one because if you want to kind of slice and dice for bi you know what I want to report on I want customers by region by sales rep by year by month right so that kind of helps you build that out which is a very different model structure than the source systems which that's kind of the idea so what are the key definitions of KPI I've seen this so many times right so we want to say total sales by region and we get six different answers right because what do we mean by total sales what do we mean by region how we how are we summarizing region so so many problems just with core definitions and I think you plenty of people think of the grumpy people on the previous page I think they would all be in if we can have common model just to say what we mean by region like it not be spending six days on that I can analyze the report and then you know how can I optimize the database to get them to run faster and that sort of thing so this is almost your classic it's almost everything combined in the model it's getting the scope you know this which is the high level kind of subcarious I want to do customers and regions but logical what's the definition of customer what are the differences of the terms down to the physical how is the data stored how do we need to transform it both from getting common you know common way to look at the data common reference data codes as well as kind of optimizing for performance which leads me to my next question or discussion point we probably all heard this so I want people to laugh true or false we do not need data warehousing anymore because storage is so cheap and processing power is so fast with today's modern hardware they don't have a full survey here but anyone want to chime in with their thoughts on whether they like the statement or don't I heard a hawk right away a big old false may or may not false false false there's a true there we go it would be lovely if the bi customers knew what they want at brook kpi we wish right so I think I'm with you guys it's true and false right so we don't necessarily have to have a warehouse for performing in the past you just couldn't you didn't have stores and part of the reason you had to do a warehouse was to break it up and then but I've heard this from real live customers from cio's from chief data officers even and and some of the vendors want to say this as well right so there is some benefits of horror but my analogy is I would say false a false and it depends I mean this settles is the end thing right so my analogy is I'm here and I'm trying to find some things in my file cabinet it's just a bunch of papers you know we all have friend I am not like this because I'm a datamod where I have I have file folders and they're all organized by themes right but some people don't time you throw a bunch of papers in the file cabinet you're complaining I can't file anything and someone says don't worry just get more file cabinets right so that's kind of like throwing processing power at the problem data warehouse because that's a lot of you who are sort of saying you know much of the value in data warehousing is making it consumable and understandable for the ease of reporting what do we mean by these terms how do we organize it in a way that you can slice and dice and understand all that so we're kind of maybe kind of spirits here but I have heard that statement earlier over and over you're going to hand my two cents so yeah we have a lot of processing power but you know if you just haven't disorganized file cabinet and you get more file cabinets you just have a lot more disorganization so there are a lot of great things we can do with processing power there might be way reasons to go to open source for price and things like that but it's not that you don't need to do the active warehousing and someone I think said define I love it because it must be a model or define your definition of data warehousing and sometimes when I question a customer who says that oh I I didn't mean warehousing I still want that thing to do my reporting I just meant the place to keep it you know maybe I'm off sourcing off to do for s3 I need to be west or something less fine I wouldn't call I wouldn't call that warehouse so a lot of it comes down to terminology and I'm always finding so much in life comes back to modeling what do you mean by certain terms right helps with certain certain clarification and some folks bring up the data where the data lake they swamp yeah so data like not bad there's a lot of benefit but you need you need some sort of roadmap to them and some sort of structure to understand them which we'll get to so good so that's just my my color commentary and warehousing and as we all kind of a lot of us seem to agree with metadata matters right so even though you have these advanced hardware and storage office options you have self service bi tools and data science you still need to have quality context and structure on the data aka data models metadata right so here's some quotes people don't believe you when you say that the left is from the data center journal that data scientists and bi they put them both in a similar category and they're spending 50 to 90% of their time cleaning and reforming data so this is our grumpy lady who's in the middle before you know she wants to be doing all this great new analysis and she stuck just cleaning up data and we do talk about data science you know even these data scientists that are hired and to do all this great stuff they're spending 80% of their day just trying to clean up data which I'm sure is not what they felt they were hired for right so and I've heard that if we think of misperceptions well we don't we have data science now let's just put it out and sometimes that's true there are certain types we just want to look through raw data and see trends and all that sort of thing but everybody data scientists bi anyone would probably rather have a clean list of customers without errors to do their analysis on and having to spend time doing things like codes and things like that and that's what a data model helps with I can move my slide that would help MDM was brought up that's another big place for data models fit in we actually have a whole webinar next month on MDM so I won't go too deeply into it but it is so key today integration we had to mention it here so there's many approaches to MDM the centralized one is by the classic where I want to centralize it in one place and literally transform it into a hub and I can either use that as a rempers one of my dimensions in the warehouse or I can use it directly report on it depending on what I'm doing but this is your I'm taking from all the disparate sources creating that golden record in the center and transforming it and having that that golden hub and all the stewardship and governance all the things around it you can also do that more of a virtual way and kind of have the data staying in the source systems as long as we know that different pieces live in different areas this pros and cons to each but regardless you need a data model you almost need more need a data model with a virtualized to keep track of things and again we have a whole webinar next month but I mean the idea of it is say we say we're a healthcare company right and we're talking about patients there's a lot of stuff about patients there's certain things that probably everybody cares about we probably want the core demographics name date of birth hopefully not through the death that kind of their basic gender and things like that but you know different teams might see the different person one person might understand their kind of personal information their marital status some folks this might be both physical health and mental health so maybe one group is dealing with their mental health issues and one is working with their physical health issues so when you get back into stewardship you definitely want to have a model that breaks down the attributes to that level A are we getting everything we don't want to have a MDM and you're forgetting a whole group oh I forgot that we did mental health we don't want to you can't do that you should definitely have make sure you handle all the teams and then who's the maybe everyone wants to be able to read this but only certain teams are the steward in terms of making sure it's accurate and correct and sourced from the different systems and all of that requires a data model MDM is sort of data modeling on steroids so but you have to start by even understanding it and this is you know great workshop when you're starting to do MDM get everyone together and this is a nice clean way to say it and this is where some of the battles do I own this do you own this do we steward this are we sharing this because you don't want people to step on each other either and that's why identifying those core not standard attributes are good as well um big like big data like someone mentioned that as well so it kind of means a lot of different things to different people so if we're thinking of kind of your say Hadoop infrastructure where we kind of have an FHS file system that's often what they call kind of that schema on the read and there are valid use cases for this I do I'm taking my sensor data and I have massive volume and I don't want to pay to put that in very expensive relational storage or it won't scale and I maybe I don't know what I want to do with it later and I literally do just dump dump it there and it is that I schema on read I don't know how I'm going to use it but let's put it here to decide and that's fine but at some point when you do start to use it you might do something like create a hive structure which is full discussion discuss this probably that would be on the scope of this but think of that as a relational cable structure on top of the Hadoop file system so I've looked at all these files and oh look it's sensor data so it's meter and meter reading and I want to take out certain summary data and put it in the table so other people can query it because if you really want to focus to see it and you want to have data quality you want to integrate it off and you want to put that in a relational structure not always but often especially when we're talking about reporting and most data modeling tools can handle hive structures and you can kind of think of it as just something else it's probably not as robust as your typical relational database yet but they're doing a lot of things to kind of improve that so that's kind of the schema on read doesn't mean there isn't a schema and that's where I think when folks, a couple people in the comments that is sort of exactly that you don't want to just dump all the files and say good luck to you that's that file cabinet where yeah I get a lot of stuff and I can have more file cabinets more places to put things but if I don't know what those things are it's not going to help me then you want to be a data hoarder just putting junk out there so another way a data model can be handy with the data lake and I've seen a couple of customers do that when you start thinking of that innovation and discovery again just knowing what's out there so I might be someone and I'm a data scientist and I just did this great new integration where I can get stuff out of Twitter and I can see what customers are saying about our products and I might be somebody else who's just building tables for our staff and is able for our product I'm somebody else kind of building center data for our product because our product is a I don't know could be a health monitoring system and it has all the heart rate information or whatever right that's kind of the internet of things data from our products and we could be doing analysis of that maybe we have NOAA weather feeds from external data sources and so some of these may have a relational structure like these hive tables you can probably have a data model for that maybe Twitter feeds don't but just even having the mapping of what's in that lake so then this lady on the right here oh wow I didn't know that we tracked weather events if I had that I can do some cool analysis oh we're getting Twitter too wow if I could link that if I could link weather and Twitter and product so the people I see people saying hey I'm bored I'm going to go to the movies and we're a movie company and I'm bored because it's raining well I can find all these three things so that that often is and again some of this can be automated linkages some might just be a sort of a static model that is just sort of your inventory think of a store and kind of where what we sell in the store and think of that as almost your inventory for that data lake because you know not everything in the data lake almost by definition will have structure but at least if you know that there's raw central data feeds that you might want to get things from that's a big plus because for a lot of folks that doesn't work very well okay another one I said again is sort of the forgotten child in our two often I think how many people get it but if you get your data on the left that's kind of maybe we have a list of members and the social security number and things like that and then we have applications that want to get that data and I have maybe a web app where I'm maybe I am the user and I want to send my information or maybe I'm a consuming information from your model and this is kind of that that face you put on to show to the world and their application developers building these and so it might just be something like a get person object or a put person object and that might be everything in the application on your iPhone sees but that has to map to the enterprise model right so integration between these API design and the enterprises it doesn't mean they're exactly the same when we start building the API's you know one's literally from the user perspective what would a user either and a user could be another application does not actually always a physical user integrating it could be a partner you know what do we want to share information with and so that API is kind of from the user perspective and how do they map to your enterprise model so again some can be automated some cannot but at a minimum everybody should be I think of that at that high level business model is kind of the canonical this is what we're looking at and we all talking the same language whether it is in an API or in an ETL script or or in a SQL query right we should all be going off the same script and again just from my experience it seems like the API team often is kind of separate I've seen them become more integrated I've seen data models be part of agile sprint cycles and application development where everyone says you know we're going to add in fact we need now first name last name and we want you know avatar name maybe in a long online gaming company right well if we want to track avatar name do we add that here to the model right so it becomes an iterative because models aren't static they're iterative and so I think later in the fall we're going to have a thing on agile and data modeling because it can be done and it often is done so in summary hopefully again it was kind of a broad brush of a lot of different things some of the topics if we want to go back to the list of the yearly events we go deeper in other sessions but I think just that broad idea of integration was helpful in effort in its own right the big message I'm a big fan of is this idea of the data model as being your enterprise knowledge inventory that is the place where we not only standard what we have but how it's stored getting those common rules definitions and that's just kind of cause so many less headaches in the organization and it should be done because if data is your IP and your main asset you want to track it and understand it and then there's a lot of different ways to integrate and we didn't even cover them all we didn't talk so much about virtualization and all you know there's a lot of different ways to integrate almost regardless you still need to understand the data the source the target and how you're integrating and whether the meanings match and whether the data types need to match or not but at least you have to have an understanding and so again that's the duty of data models you can get both of that business level and the technical level I do this for living if anyone needs help let me know there's my plug here's me if you have any questions don't hesitate to reach out and then just as I mentioned next month is MDM October's agile and then in December we'll talk about I think I saw a few comments about data quality and we will actually speak specifically on data quality and governance and how that relates to modeling so without further ado Shannon we can open it up to questions Donna thank you so much for another great presentation and thanks to our attendees for being so engaged I love all the chat going on throughout that's just awesome that's what it's all about is community and helping each other so let's dive right into the questions and to answer the most commonly asked question I'll be sending a follow-up email by end of day Monday with links to the slides links to the recording and anything else requested throughout the webinar so Donna how widespread are glossaries are they business or technical or both or who maintains them great question so I think glossaries are very widespread and very helpful I would say there's a difference in the two questions I think you asked in my mind at least to a glossary I would say is definitely the business level and that's going to be your core terms I think that that business level data model can inform the glossary the glossary is probably a super set so if you have a business data model you're going to have your core things like what is a customer what is a product what is an asset are there different things between physical asset and intellectual property asset so those things would definitely go into glossary a glossary probably broader it might have your KPIs it might have similar things like API is application programming interface you know a lot of the acronyms people use so that's definitely the business it should generally should be maintained by the business I've seen often there's a data ties into governance as a data steward and or data architect or someone helping maintain that I've seen more people and it scared me at first but amazingly seems to work at more companies more going kind of the crowd source route kind of Wikipedia rather encyclopedia approach where everyone can contribute in a kind of eventual consistency some people cannot do that it's much more locked down you can give an idea but the kind of steward owns it and I see that different from the technical side which is I would probably call a data dictionary which is easier tables and columns and structures and that would definitely be sourced from almost 100% from the data model and whether it's published in a data model or kind of published out to a web page or something like that so yeah excellent question and they're they're separate but very related and a lot of the data modeling tools are actually adding glossary functionality because they are kind of so closely related yeah and there was a second and add on a question to that how do you see glossaries and data dictionaries models get maintained and keep in sync the glossary in the data dictionary so I would probably say it's the data it would be the business data model in the data dictionary keeping in sync so I do think the glossary is more of a high level that probably the super set you know I'm again what is an API or that probably won't end up in your physical data dictionary because it's more of a but if we're thinking of the business data model a lot of the data modeling tools are integrated between that logical conceptual physical layer and you can do it that way but I think the real answer comes down to governance where there has to be communication and some or and or SDL software development lifecycle to really have that happen but yes so the tools can do it I would say probably not the glossary level but if the business data model with a physical data model and then the governance is what has stewards to kind of keep track so you would own you know you may own a subset of the you own the customer terms and you own the product stuff and make sure that's in sync and collaborate with folks so kind of a long word to the answer but I hope that answered it and you know and I love the chat is just on fire I love it if you want but I'm trying to sort through it to make sure I don't miss any questions if you have questions do submit them in the Q&A section in the bottom just to make sure I don't miss your question so don't help to convenience people that SAP HANA view model is not the LDM which is necessary for mapping and integration could you read the first was it to convince business people how do how to convince people how do you convince people that SAP HANA view model is not the LDM I think I slightly facetious answers to show it to them and a big fan of SAP for what it does and doing awesome but if you physically reverse engineer those tables there are thousands of tables with German technical names and God bless you if you can understand them and there are people in this highly paid consultants that can finally figure out those tables and they can do that so that's one way just try to show them like it's not listed by here's my billing header and you know I can't see it that quickly that said there are tools that can and I don't want to say tool in this but you can message me after and I can give you some ideas that can actually create a logical layer of SAP so it can actually say table two one six one nine is your billing header or your product info or whatever it kind of both create subject areas so it will do kind of that subject area level and then it will create logical definitions of what the stuff actually means but then even if you do that you might still find issues with the way SAP looks at your data and then the way you look at your data and then that is where I would see if I can find that screen quickly well that example we gave with the merger sometimes showing a simple model like that so if you are lucky enough to get to the business life showing an example SAP wants us to have a log in this isn't true for just as an example SAP wants us to have a login for every customer we don't do that we don't have login you know maybe it's a little piece of it to show them this is what I'm saying we're trying to put a square peg in a round hole I think all of those different tactics may help but yeah try showing them the model and then they might get it I understand that because it is not easy rather than a great tool but really hard to understand I think everyone suddenly went quiet I don't see any additional questions I'll give everyone a couple of seconds here to type any additional questions you've got for Donna before we wrap it up again I love all the chat going on and just a reminder I will send a follow-up email by end of day Monday with links to the slides and links to the recording along with anything else going on throughout and Donna can you comment about the current state of data vault what happened to it I don't see much discussion about it as before data vault is still a valid method there's data vault out there so kind of a different way of modeling there's a more agile there's a few Dan Linstead is a big and then oh my gosh he's going to kill me hands Holt Holt's grin who's actually here in Colorado they do a lot of training a lot of videos and huge proponents of that so those might be too good resources to go to they both kind of have their own view on it a lot of good resources yet still alive and well that's another one I could have mentioned a lot of ways see one more I'm going to read myself because it's staring right in the face can the data model not be specific to schema or scheme agnostic how do you how do you understand the data inventory if it's not specific to a schema and they're kind of two different things I wanted to be clear so the data model the business level should be schema agnostic that's how your business runs can a customer there's a customer need a login or does the customer not need a login the database that's the business rule you define when you do that inventory yep that's going to be scheme that will definitely be schema specific so I'm reverse engineering from sql server that's going to be specific to that database and there's one from terror data and oracle so we'll be and that's we're kind of that meeting in the middle one of my other presentations we've kind of that idea of the top down and bottom up into reality to get it all together and there's been a few questions on that there is kind of the in the middle to make sure those maps but that's the beauty of having both because if you just reverse engineer that's what the database looks like that might not be how your business operates or vice versa and they can inform each other you might find something in the database I've had that you will finally reverse engineer something from db2 for ages and they find a business rule they should have forgotten about or a piece of information they've forgotten the hell right we used to have that you know so it can go both ways but they are separate things but they can inform each other hopefully that helps definitely and can you see how they display how people display their glossaries to users SharePoint web you know the above I've seen SharePoint that's probably your low end easiest way to do it there's whole some of the data modeling vendors have you know I've done just low a client did one the other you know last month with me that you basically you know ABC almost like you can organize it by letter and it's kind of your typical legacy glossary at the back the data modeling tools have sort of published glossaries on the web a lot of the metadata management tools some of the governance tools themselves have glossaries so the glossary there's tons of options it's almost what you want to integrate the glossary with that might you know SharePoint is great it's kind of standalone if you want to integrate with your data model I'd look at the data modeling tools if you want to integrate it with your governance something like some of the governance tools out there metadata management tools have glossary because it's so related to everything else for the person's previous question a lot of the tools have kind of added the glossary layer but the glossary can I know we're getting close to time but a lot of the tools can be you can do your basic list of terms others can do hierarchies can do whole semantic modeling within you know glossary in itself can have a whole way to model it and you might want to get that fancy but or you might just want to start by having the terms that everyone can see but yeah something on the web so people can see it definitely and the host tools have something like that all righty Donna well that brings us right to the top of the hour thank you so much for another great presentation I just love it and as you got highlighted there next month we'll be talking about data modeling and MDM another great topic I hope to see everyone then and thanks again to all of our attendees for being so engaged in everything we do we just love all the chat going on and all the great questions that you've had coming in and we hope to see you next month hope everyone has a great day thank you thank you