 Welcome, my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining the latest in the monthly webinar series, Lessons in Data Modeling with Donna Bergbank, and sponsored today by IDERA. Today, Donna will be discussing data modeling and business intelligence. Just a couple of points to get us started. The large number of people that attend these sessions, you will be muted during the webinar. We very much encourage you to chat with us and with each other throughout the webinar. To do so, just click the chat icon on the top right corner of the screen to activate that feature. For questions, we'll be collecting them via the Q&A section in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our questions via Twitter using hashtag lessonsdm. As always, we will send a follow-up email within two business days containing links to the recording of this session and additional information requested throughout the webinar. They'll let me introduce to you our speaker, Donna Bergbank. He is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Limited, where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. Speaking of, you can meet Donna in person at Enterprise Data World 2017 in Atlanta, April 2nd through the 7th, and she'll be starting with a tutorial there on practical steps to implement a metadata strategy. And with that, I will turn it over to Donna to get us started. Hello and welcome. Hi, Shannon. Thank you and thank you always for the great introduction. And so today we are talking about data modeling and business intelligence. I'll talk a bit in a future slide, but this is part of a series if you haven't joined before. We have one every month on various topics in data modeling and how data modeling applies to today's enterprise. I'm sorry to interrupt, but you're breaking up a little bit. Of course, everything was fine leading up to it, but now you're breaking up again. Everything was fine. Is that any better? Yeah. Is it better now? No. Oh, well, perhaps we put the meter back on and I will dial in on a different phone line. Can we try that? Okay. All right. Thank you. Sorry about that. Sorry. It's of course everything was great. She was loud and clear and perfect before we got started, but something went awry in the process. I see you on Donna, are you? All right. For those of you just joining us, if you have an audio issue with Donna, she's just working to get that fixed. I have you on mute, but Donna, like you went back on mute. Is that any better? Can you hear me? No. We can. Loud and clear. All right. So sorry. Sorry about that. Should we just kick it off from the top for your recording? No. Go right ahead. All right. We can clean it up after. I apologize for that. As we're saying, thanks for letting us know about that. The folks in the chat, there is a chat for questions as well. First of time, Shannon already introduced me. Just a little more background. I am on Twitter. If anyone falls on Twitter, I'm at Donna Burbank. There's also a hashtag for today. There's always some great conversation about the event, and some people like to continue that conversation on Twitter. So please do with that hashtag. I guess the other interesting part of my back time for today's conversation is that our sponsor today is Idira. I for many years ran their product management for your studio and was back with Embarcadero Technology. So know that dear and dear to my heart, and we'll talk a little bit about various technologies today. So moving forward to the topics at hand, as I mentioned, that we have a whole series of data modeling topics this year. If you missed the one on data architecture last month, that was actually very popularly attended and got some good online feedback as well. That is all recorded. And then coming up, you'll see we've got a wide range from conceptual data modeling to metadata, which was actually a replay from last year or a re-request from last year based on popular attendance. So hopefully you'll join some of the others coming up. But today we'll be talking about particularly data modeling and data modeling in the world of self-service BI and more traditional BI, and really how I like to say it, how data modeling is the intelligence behind business intelligence. So we're not always at the forefront. We, the data modelers, the forefront of the flashy reports that maybe the business user sees, but if you don't have the back-end part right over the right business meeting and content, understanding the data structures, your reports aren't going to be as effective. So we'll talk a little bit about that moving forward. So here's, I like the slide. I made it, right? At least in the U.S., we are oddly fascinated with the idea of the bumper sticker, where we put these things on our car that we feel very passionate about. And one, if you haven't seen it, is if you can read this, thank a teacher. The idea that if you can read, then a teacher was behind that. And then it's sort of been overtaken by other groups that have musical notes. If you can read this, thank a music teacher and that sort of thing. Well, as I thought about it more, when you think of a nice clean report or you're doing self-sort of reporting and the data is just nicely formatted, behind that is generally a data modeler, right? Or someone who actually helped build that data in a nice clean way to make sure the data was correct and that it was architected correctly. So in a way, maybe like Teachers Feel, we're sort of the unsung heroes, making others look successful. So that's just, I thought it was cute, so I put it in there. Really, while we're doing this and why I'm still in the industry and still having a lot of fun with it, actually, is that I think more than ever, I mean, we've always been data-driven in business, but I think business folks are really getting it. I think the technology is such that it's more accessible to everybody. And this was just, I just did a quick, you know, some of the top business magazines that we should be familiar with. All are talking about data forms, how the data-driven business page, The Wall Street Journal, has things about the data-driven business. And if you have not been living under a rock, you've probably heard how the data scientist is the sexiest job of the 21st century, have my doubts and their definition of that. But it is something to be said that data is now hot and then a lot of people are paying attention to it. And the reason is because we really see the business value. So sort of a corollary to that, we see the rise in self-service business intelligence, I think makes a lot of sense. A lot of companies want to be more data-driven and they want to get their hands on their own data. I think also a lot of technology has really come a long way to make that possible. Whereas, you know, I think the business people have often and in the past understood the data and the meaning of the data, but some of the tools were either so complex or they just didn't have access to them. That wasn't really feasible. So, you know, some of the drivers of, you know, the idea of not only self-service BI, but self-service data prep. Maybe they did munging or whatever the vendors, various vendors want to call it. And I have to say the tools are slick. I mean, there is some neat stuff out there and it becomes a lot easier to manipulate data that just took, it would take months in the past and things are a lot easier. There's more data that's more accessible. We'll talk a little bit about things of open data where you can just go to the web and, you know, governments and scientific research agencies and things are just publishing the data that's open for you to analyze. And if you're a geek like me, you can waste a lot of time doing that. There's some neat stuff out there. Or even things like social media. I want to see, you know, Twitter sentiment. There's a lot more data available and I think people are trying to get insights from that. Business users are becoming more tech savvy as folks just use technology in their daily life. And I think to some extent, business users have often been tech savvy. I mean, I've seen some amazing spreadsheets out there and some very complex, you know, pivot tables and things like that. And I think a lot of the business users looking at some of these new tools, well that's not hard in a spreadsheet. You know, I understand the concept of data manipulation. I just never had the tools before. So that's great and there's a lot of opportunities that can also be fraught with challenges that might be facing some of that as well. Of the reports are only as good as the data behind it and is all data accessible in a format that's reportable. So that's really an opportunity for folks like data modelers and the models and the metadata. I'll talk a bit about that a lot today, the metadata behind the models. That really makes the job of business intelligence easier for both the BI professionals that might be building the warehouse and generating the canned reports that everyone needs. Or maybe there's the casual business intelligent reporting user who just would love to have a trusted data set and be able to query it easily if it existed. And I think business users are often frustrated with self-service BI, so you might be wondering about the pictures. But sort of my analogy is, you know, I think they want and the promise that the tools is there. I have this great tool and I can slice and dice and do all these fancy reports and visualizations. And I want it sort of like a salad bar where I decide I want the salad and I pick a little bit of lettuce and a little bit of beans and whatever I pick. And it's this beautiful salad. And what we hand them is sort of a shovel and some seeds and a watering can. As we know in the data profession, it's hard to get data in a nice, clean format. And the end result of the report that people see took a lot of work to get there, either aggregating the data from different sources or making sure you have common master data or reference data sets, et cetera. So I think that's an opportunity for folks in the architecture space to build those trusted data sets that folks can use. And I think, you know, often there I've seen in some organizations kind of an us versus them. You know, why are these people getting into our data? Well, it's not your data. You might be managing it. But, you know, they want to see the data, too. And they're equally frustrated that they'd like it to be a lot easier. So I think that's where a nice architecture that can be easily queried really helps everybody. So here's an example. Data is only as good as the metadata. It's only kind of being a self-proclaimed nerd. And I think that's a great thing. Don't be offended by a nerdery. I think nerdery is a wonderful, wonderful trait. We're the sexiest job of the 21st century, right? So I actually went through this a bit myself and kind of put on the self-service BI person, partly to play with some of these tools I've been wanting to spend more time with. So one out, it was actually the UK Open Data site. And that will be kind of a fun example to visualize and play with some open data to show how easy it is or not easy. And there was one sort of on road safety by vehicles, by making models. I thought that could be kind of controversial. Does a Porsche have more accidents than a, I don't know, than a Ford? I don't know. So I thought that'd be sort of interesting. And this is probably very much the experience that the business user might have. And that sounded really neat. I just thought I could point one of these self-service BI tools at it and have some great slice and dice pie charts and bar charts. And I open up the data. And it's probably, you know, every day to model this nightmare. You have the typical field one, field two, field two. There was more. This is what I said. There's like through field 100, right? And if you look at the values, it's not very intuitive. There was no make and model. There's some sort of reference set or reference data set where these numbers mean some sort of make and model of car. And then you have numbers, whatever F10 means. I wasn't quite sure. But I undaunted, went through. I did some great beautiful reports. On the right, I can see that F13 is really great. There's 250K F13. I have no idea what this means. And then I did some more visualizations and found F13, and down there at the bottom, is definitely doing something, but I don't know what F13 is. And that's sort of a very classic metadata problem. To be fair, I also sort of gave myself a time limit, not being a busy person. Actually, I highly recommend if you have some time. A lot of these tools or the visualization tools are downloaded for some free trials, and there's amazing amounts of open data. It is kind of fun to play around. And there's the good side of self-service BI and exploration as well that we'll talk about. I sort of gave myself 15, 20 minutes. How far can you figure it out? And as a business user, I just need a report for my meeting. Can I get this out quickly? As I went through the data, I did see some insights, and I could start to figure out the patterns. And I think that's what a lot of business data scientists do. So there's the pros and cons of that. Sometimes munching through the data, you can find insights. But I didn't have time to find insights. I wanted just a nice green report. And even with understanding trends, I still don't know what F13 is. And some open data sets are better than others. And not to knock this particular open data set. I mean, some are very clear that we're just putting it out there. We don't have time to curate it because, as we know, in this industry, curating takes time. So most open data sets actually are pretty good. They are metadata of how it's used and what it's used for, et cetera, et cetera. But I thought this was a good example of, you know, that's great. I can have the best tools on the planet to look at it, but I don't know what it is. You know, fairly data-type things. You'll see that F2 sort of looks like a year, but they see that as kind of a numeric, 2,015, et cetera. So just kind of an example of that. And it sort of hits home the point that metadata matters, right? So I think with this, especially with this I rise of self-service BI, self-service analytics, more and more attention does need to be paid to the quality and the content and the structure of the data, a.k.a. data models and metadata. And if you look at some of the quotes at the bottom, this is a known problem, and you're probably facing it yourself or hearing others complain about this and certainly being written about that you have a BI professional or a data scientist or a tech-savvy business user, and they want to get the results. They're spending 50 to 90% of the time just cleaning the data and reforming the data. And that little example I went through, I was interested in the results, and I sort of would have gotten that problem. I would have done more research and what those numbers meant and then probably tried to create some structure behind it and all of that. That would have taken time. I really just wanted to know how many accidents happened by each vehicle time. And to the right, data quality might take 80% of their working day, and that's probably for the data scientist, which is I know different from but closely related to BI. So least favorite part of your job. You want to create these great reports and see the great insights. You probably don't want to manually change data types or try to go through and see what data means and that sort of thing. You're not getting to the insights you actually want to get to. And that's why they pay us the big bucks in the data model and metadata world not to fix this sort of thing and make everybody's job easier. So this is a bit of a plug for a research paper we did last year, but I did think it was interesting, some of the insights. We just did a survey on what some of the trends for metadata, which is related to data modeling closely, were for metadata, and who were the users of metadata management. So you'll see here that no surprise, very high percentage of the BI reporting team, the data scientists, and business users who we just said business users, we didn't go much deeper than that, but probably the self-service BI or folks that are looking to look more at the data are some of the biggest users and requesters for metadata management followed closely by data architects and data modelers. And I kind of put that slightly different one is the prep and one is the consumers of. And then you'll see the DBAs and developers are actually closely related as well. So it is something that having good metadata and a good model and a good structure really does help everybody. And I think the more people look at the data, they're sold in that idea pretty quickly, that, wow, if someone could just structure this for me very easily, my life would be easier. If you could tell me what the data meant, that would be helpful as well. So if you look at sort of the classic, even with the rise of self-service BI, I think we kind of hit that point home, is that you can find insights on data, but it's a lot easier to find insights on data that's been managed and manipulated and cleansed for you. And there's certain use cases for this concept of data warehouses. I know the industry loves to create controversy and say things like, is data warehousing dead? Now we have Hadoop. And don't get me started, right? I've probably ventured enough on some of those things and other webinars. But there's a use case for each technology. And I think the idea of having, even in an organization that's very driven on discovery and innovation through data science and looking at raw data sets within and out of things, most companies have something that needs to be, you know, your financial reporting, for example. It needs to be curated. It has a lot of volume. We need to manage it to be more effective in the data warehouse. You know, there may be innovations in how we format the data warehouse or how we populate it or what the source systems are. But that concept of managing the most important data in your business, I don't see going away. And again, my little phrase where the intelligence behind business intelligence story may kind of explain that. So you might have your average business user saying something that seems so simple. Could you just show me all customers by region? Can I maybe have that report by this afternoon? And I don't think, you know, well, I think more and more business people do understand the complexity of data. But often that doesn't seem like a hard request. But to think to get that, you have hundreds, dozens, thousands of source systems that probably have customer data. Something as simple as what's the definition of customer? How is this stored? Each data store probably has a different structure. Who's the owner or steward of that data? Do we truly understand what that source system means? Is this sales calling it a customer that might be actually a prospect? And is it, you know, finances calling it a customer who actually owns our product? You know, all of this difficulty that we need to understand in the source systems, but then put into data warehouse for a, doing that ETL or the transformation or the munging of getting all that data conformed. And then as well as thinking of things like performance or, you know, things like a dimensional model, I know that's not the only way to do a data warehouse, but it's a common one to really focus on, I'm going to use this data for reporting. And then for reporting you need to model, you know, what are the definitions of these key pieces? What do I mean by total sales? What do I mean by region? Is that, you know, is, I don't know, North America, Canada and the U.S.? Is it just Canada and the U.S.? Mexico, you know, there's a lot of different subtleties on how we define regions and how I make it sort of sufficient in reporting correctly. So this is probably your classic, you know, from source to transformation to warehouse to target the information on your BI report. And I really see a data model kind of hitting every step of the way there. So I would, you know, we've talked a bit about that idea of almost old school, new school, right? There's this idea of there's so much open data. The tools have come such a long way. People want quick insights. Do we even need modeling anymore? Are we just getting in people's way? And I think, of course, yes, we still need model, but I think there is the balance of modeling what matters and have this idea of a trusted data set. And I have worked with a lot of organizations recently, actually, that are struggling with this something like a data governance program. And I think there's only conflict when we don't actually spell it out and see what we're trying to do. So I think it's managing what you need to manage and leaving it alone where people really just need to explore. So, you know, I'm just trying to look. I've launched a product for marketing. I know that the results of that marketing campaign, when I send it up to management, we're going to report on the street or whatever, those financial numbers need to be super accurate. But I just want to get some twin or seven analysis to see, we just launched yesterday, what are people saying? And get some trends? Leave me alone. I don't need a whole lot of governance around that. I want to actually go look at the Twitter streaming data and then see what's out there. So you don't want to hamper people's creativity or hamper, you know, the right tools for the right job. But you do, whether it does need to be a set, to give that example I found something on Twitter, what customer said this. Well, I hope there's a master list of our core set of customers. And I hope there's a reference data set for the different regions. And I think you won't get too much argument around that. I noticed a lot of the companies I've been working with that do want to be much, very much on the right. And do, you know, they might have a cloud-based data lake where it's all, you know, innovation. But I think as soon as people start to look and to do some of this exploration, they'll say, if I just had a nice clean reference set, I could compare some of this against, and if there were a master data list when I need it, I think that's a win-win for both parties. So for me, it's just understanding what does need to be modeled and then what doesn't, you know, allow that to be what it is and do the exploration. And the real value of that report is when you can compare both of those together. So what are the customer trends? How does that relate to my specific customers? What are the trends by region and how do I define a region? All of these different things that need to integrate with the report can be done with that balance. So I think that's an important point to make. And I sort of get annoyed when, you know, I think it's got a lot of media and that sort of thing. You're trying to get, you know, clicks, right? So they try to create controversy that one is better than the other or one is going away. And, you know, they're all good. I live in Boulder, which is kind of a hippie town. It's a common phrase. It's all good. It's all just the right tool for the right job. And making sure the touch point integration has made sense. So the other thing to remember is the different modeling levels. And I tend to show this a bit often in the presentations that I gave, but whenever I leave it out, we kind of wish we had this to refer back to. So just when we're talking about modeling, there's different levels. So you might be at the conceptual model level where really you're just talking at very high level business terms, definitions. What do I mean by customer? What do I mean by product or region? And that's super important as we're looking at a report. Because if we don't have common agreement and we say what are total sales by customer and what do we mean by a customer? Is it a current customer, past customers, you know, gold level customers, then your report is meaningless. So especially with BI, that high-level conceptual model makes sense. You know, the logical, that's more, it's still you're talking about the business, but you're adding some different rules about it. You're adding some attributes. What are the attributes of customer, name, address? And we're talking about star schema that we're talking, you know, a little more about what I'm going to report upon. So you're getting a little more detail that the focus should be my business goals and objectives. And the physical is really at the physical database table level where either you're doing the inventory of my current sources or you're creating a new source and really trying to optimize for performance and that sort of thing. So on that conceptual level, I mean business meeting and context I think I touched upon is so critical. So again, the business person might ask what seems to be such a simple question to show me all my customers by region and the good data architect dreams and gets excited about this sort of thing and ask all these questions that other folks may think were strange. But once they start understanding, I don't think they'll think it's strange at all. You know, does this include current customers or just lapsed customers? How do you find a region? Can you kind of customer have a billing address in more than one region? So maybe you were building one region but you purchased in another. What do you mean by region? You know, all this type of stuff when you really want to make sure that a report is correct, you need to think about that. Or, you know, we're thinking we're in the age of compliance and things like GDPR or, you know, personal privacy is a big deal for folks. We have customer information. Should I obfuscate or hide some of the PII? Do you want trends of customers by region? I really can't show you customer names by region. That's sort of invading people's privacy, right? So there's a lot to think about on just a simple thing, like show me all my customers by region. And that's where a data model comes in or at least the rigor of that type of questioning that we tend to do in data modeling. And this is a cartoon that I tend to use a lot. But you know, there just aren't a lot of cartoons about data models. So even though it's really not that funny, I use it because, you know, it's a data modeling cartoon and there just aren't enough of them. But I think if you have been in this business, you might find it funny is that, you know, we're here. We're all done with user acceptance testing and everything looks great. We're about to roll out this new marketing application. Just one question. What was the customer? What do you mean by customer? And that, you know, might seem kind of strange, but I've been in groups where we haven't been clear about customer and it might have been that we're only looking for online customers and not brick and mortar customers and the whole campaign we built and there's no sense anymore because nobody asks those basic questions in the beginning. So that's why we always like to have a data modeler in the room or an architect to ask those sort of questions. And I have seen more and more companies really sort of get that and even, you know, I've been in work with some companies that might be a product development. You know, they might be developing tires or, you know, and they have kind of, I guess what we would call agile or scrum meetings type of thing of, you know, the product launch meeting and more and more there's a data person in the room. How is this decision going to affect data? Do we need more information to report upon? And I don't, I think that probably always happened to some extent but I'm having it happen more and that I know some of my customers were never in those meetings until the past year and I think now more important people are getting that we can't wait till the end of our sales cycle or our product launch or our campaign or whatever and then start thinking about the data. You need to think about it in the beginning. So here's an example of a conceptual data model in this particular tool. One of the things I like it about it is that it just hides all of the detail and really just focuses on the business definitions because each of those model levels has a purpose and the purpose of conceptual data modeling is really to get that communication. So this could, you know, it seems very simple but right, you know, quickly a person could look at that and say, okay, a customer is a personal organization who's ready to move. Wait a minute. We don't sell the organizations. We just sell the individuals. So right there that could be a big business change or, you know, business clarification that we needed just by showing it clearly on the diagram. Because the data model metadata, both technical and business, can be used by a lot of different roles. So it could be just as something as simple from the business person and finance. What do we mean by regional sales? How do we find regions? Is it by geographic reasons? Sales reasons? Political reasons? You know, a lot of different discussions there. I might be a data architect and I'm trying to build something new but is there an approved data structure that I don't want to reinvent the wheel? So if you can make my life easier and give me something that exists, thank you. That would make my job easier and might as well have a standard. You know, it could be an auditor that looking at your report saying, please not just verbally tell me how total sales was calculated. I absolutely need to see the lineage and any of you in the financial industry know that there's regulations around this that you can't just say, I don't know. I actually need to show and show that data lineage of how that data was calculated. Again, you might be building a data warehouse and I need to understand the source-to-target mappings, how that data was created. Or, you know, I've seen companies use data models in the business to refine it, something like HR. And I'm trying to get my staff up to speed on the company's new business terminology. And if you've done a data model, especially at the conceptual or logical level, a lot of it is. Do we call this a client or a member or a customer or, you know, how do we term these things? It's really the lingo and the way the company does business. So I have had seen companies use a data model or a version of it. It might be turned into a glossary or some have used the model itself at a high level because it kind of shows the relationships. We have a customer with a care representative which is different than your sales rep or et cetera, et cetera. So it really can help non-technical people as well. So another thing you can do when you're doing these data model levels is kind of map or show the lineage between them so it could be something like, you know, especially with things like regulations or a report, you know, I have this concept of client on my report. Where did that come from? Well, you know, in logical model we might use the term customer for that. I know you guys say client but in our world it's customer because on all the different tables, Oracle called it cost and Teradata called it customer and DP2 it's C table underscore 16. And being able to see that lineage of, you know, where is all my customer data and how is it used? And this is just showing it at the very high entity level but you can do the same thing because I don't want to think of reporting down to the attribute level, the business term or the core data element or whatever you call it, the metric to really see what do I mean and where is this data stored. So the other part on the physical level, really that's your kind of active inventory of your data assets and this isn't insignificant. So one of it is just knowing what data you have. So think of that example where the, I sort of tease him but the naive business user that just gives me a list of customers by region by this afternoon and we don't have anything in place. Customer data could be in thousands of sources across the organization and one of the nice things about a data model is most of the data model and tools can sort of point the data model to these sources and create a picture like you see in the middle automatically and there's been some amazing aha moments of wow, I didn't know that we had this data or this data structure and then when you start in a model repository you can start to do some impact analysis off that of these 16 tables all use the same data, I mean the same data element for example. Know what data you're mean so once you have the data models you can link it to that logical or conceptual model either top down, bottom up or some sort of mix between but then the beauty of that is that your business requirements are linked to your actual technical infrastructure and then it can support data consistency so in this example that we might have reverse engineered 100 databases and we see that the date field is stored differently it's a character field and some is the date field and others they actually start to create these new standards or business rules the technical rules maybe dates should always be in a date field and then when you do new development or you want to go back and clean up your databases you can use do that from the model so you have kind of these core standards that kind of make the data more consistent think back to my example of the open data where they had a date field as a look like a numeric didn't make any sense so if we can just get that stuff out of the way so you don't even think about that a date should be a date and when you're doing the reports that's just not ideally the business person would never even know that was an issue it was just as nice and clean and then it adds that context and definition I kind of already mentioned that the beauty of how you know some folks might say well I have all that stuff in my database and I'm the only using Oracle and you know I kind of know the data structures I'm a DBA I get that but most data modeling tools can add a lot of metadata around it I talked last year in the metadata section they can almost be sort of metadata repository light in a way if you're only using it for a very focused set especially around things like BI you can add a lot of either tags or fields or whatever the tool calls it so you can have whether this is a required field whether there's some business rules around it whether it's private or secret you can add a lot of context around data that might not have been known or simple things you know city what do I mean by city that seems very obvious everyone knows what the city is but in this context is it the city where the customer lives or where they purchased it in the store where it was located again if I'm doing the report and I want to see sales by region that is a big deal it could be that I live in New York but I was visiting London and bought this pair of shoes so what does that mean by sales by region you'd think I would think there would be a London sale but again that's something the business needs to define and this is just another example of that so you know in technical metadata that's almost your DDL there on the left where it actually would be your DDL a lot of the tools can you know forward engineer as well so I create my standards in the modeling tool kind of forward engineer that onto a platform and the nice thing is that's linked to your actual business data what does an employee mean what does the customer mean and then as we know that's separate from your actual data but not really and especially when we think of things like personal privacy it's always good to remember there's actually a person named John Smith that we're sending a reporting about we're sending a marketing campaign to you've got to keep that in mind so here's another data worthy example we kind of showed it this kind of the classic source of target mapping to generate a report but I thought this part was interesting it really shows where data modeling hits just about every step of the way so if you sort of start at the right or a lot of folks if I'm looking at a report might focus you know I'm looking at my sales report right I want to know total sales by region I might be using some sort of BI tool or self-service BI generally often that's sourced from something like a data warehouse where to build that hopefully all this either in the modeling tool or separate there's something you know business glossary what do I mean by customer there's probably we'll talk a bit more about a dimensional data model to set it up in cubes and easily report so that you can generate a nice report total sales by region by sales rep by whatever and then there's the physical tables behind that to generate the report so that's kind of a lot and the nice thing about data model is that they have all of that right there in the tool so there was probably you know going jumping to the other side that report you might say total sales by region a very simple summary might be that maybe there was three databases this came from one was an Oracle one a SQL server one a DB2 one calls customer customer one calls the cusp table you know I think I've hit that point and they're all named different things so you'd have some sort of physical data models multiple that would understand that and understand that context and then generally there's some sort of staging area that cleans this up each stage of the game there's some sort of physical model describing these areas and many of the tools in the market now can actually start to document some of these source to target mappings and these I've really messed up this picture these lines in between it as well so what's nice is that these tools can really start to see that lineage and link it to a sales report there's also tools in the market now because the space is fairly most tools on all sides have some sort of common metadata exchange so that for example I have this structure in my warehouse and I want to import it into the BEI tool I can or definitions or the lineage behind the integration and the metadata flow between these goes very well so again you might store it in a separate metadata repository you might store some of this in the modeling tool some of it in the BEI tool itself but they all are nowadays fairly easy to get that flow which is nice so the lineage I just mentioned this is actually a screenshot from my dear that shows the lineage from kind of the source to the target and you can put the business rules in there so what's nice is this you know this is more at the business level to understand the meeting because more and more people are using modeling tool to kind of see that pure as I move between the address table and the staging area for address can I just visualize that because most modelers and most people I would argue are visual creatures so it's really hard you might have written the script and you can see the script and you get what that script is doing but for me and I think for most people to kind of see that movement as a flow in a visual line that's what modeling is all about so a lot of the tools are the business rules behind it understanding the lineage but also just the ease of use and the speed of access so as I was trying to create a unpleasant reporting situation for myself to show an example in these tools I also tried to just not think at all about how I constructed my query and you know put 17 rows and columns and mix them together and sort of fictitious but similar you know computer report lap time 4 days 10 hours 27 seconds these reports if not queried correctly or the SQL isn't written correctly or the underlying structure isn't such that it can be easily queried and can be an nightmare so I will say that a lot of these tools have gone a long way the one I was using wouldn't let me do it it kept saying warning this is going to last a long time probably not the highlight of my career as I was writing bad SQL myself I did some embarrassing things like that as I was learning about joins and what not to join and how not to structure a table but that is a beauty of a warehouse that I think has been forgotten I don't have it in this presentation but I've shown it in others one of my favorite stories is Facebook that actually stood up at a TWI conference it's just sort of like a big confession the ruler of the big data space and a lot of the new analytics was sort of admitting that they built a data warehouse and one of the reasons was not only their understanding was what what it would mean by they were trying to find out how many current users were logged in at once really mean by a current user you know if I'm on Spotify am I a user of Facebook even though I'm logged in you know that sort of thing but the main reason was performance and they tried to do a similar warehouse I had to and it just was taking days to get the performance back and I think the same report was 40 minutes or something with a warehouse so performance was something that I think often gets overlooked as we're thinking about some of these technologies for reporting and then again if you really want that high quality data and an integrated set of data I mean that is part of the reason we model because as you look at these data sources you can see the overlapping data types understand the technical integration and also kind of the business meaning so you know all this work we put in is for the reason to make it easier on the reporting side so one this is not as you've noticed sort of a how to on how to data model for the warehouse there's a lot of material out there already but if this is absolutely new to you I thought this might be sort of helpful to kind of set the stage a little bit so one common way and I know not the only way to model a data warehouse is kind of the star scheme or dimensionally modeling which is kind of the Kimball there's the Kimball versus Inman and this is the Kimball way but one way I like to kind of think about it is an apologies for anyone or a grammar person on the call but what are you reporting by right so I'm reporting by by month by product by sales by sales rep for example and that's really what's going to be your dimensions so the thing in the middle the fact is what I'm reporting upon and then the things around the star are kind of what I'm reporting by and that's probably the easiest way to think about it and a little different you know the lines and this star really you know your navigation path for the reporting it's similar but slightly different from when we're doing a logical model for you know more of a relationship relational LTP kind of customer have more than one account I'm kind of creating business rules around that not so much here it's really to try to get that reporting structure so again there's kind of a summary of that so that the fact again that's going to be the thing we're reporting about so what are my sales figures we have tons of these are very deep tables so you'll have a lot of values but not as many attributes and the dimensions you might have a lot of attributes but shoe values is kind of a hand wave generalization over that but really the facts are these are my sales figures for the year you know how do I report sales by month how do I report sales by region and in a lot of ways you can roll these up and do different versions of this but it's probably your simplest view of the star and data modeling tool and again not everybody uses the data modeling tool for a star schema you know people can just build that right in the database and then report upon it one of the nice things about using a data modeling tool partly well many reasons but one is this nice way to visualize it so you can literally see the star in the middle I mean the facts in the middle and the dimensions around it and then all of the attributes you know can be defined and you can get all of the benefits you know so this idea of the data driven business is really increase the need for everything in data world which is great which is why I'm having a lot of fun lately doing a lot of the new things with data but it's also increased just the demand for BI reporting which has been around for a while but I think is in more and more demand particularly the self-service aspect but as well hope we've shown that the BI reporting really again a lot of the reporting tools are great but they can only be as good as the data underneath it and data models specifically are critical for understanding the meeting of the data making the data reporting process easier and improving performance and really critically when we're talking about something like a warehouse understanding those source and target systems and the lineage and how you get that right as you know that's a big task and having a model makes a big difference and again just to remind kind of what we were talking about is that modeling what matters modeling and metadata are great they're not the only way to deal with data by any means not all data needs to be modeled so do allow for this exploration or I'm just doing a quick report or you know not everything needs the rigor that we'd like to give the core data structures and that's often where we start and the project is really identifying what those core data structures are and that's the question if you can start with the stuff that matters then that's key there's me if you want to contact me after with any questions or anything I had mentioned after the fact a little bit about my company we do this for a living so if you need help let us know a little call out for DataVersity training sessions so there is if you're interested in metadata we have a course of colleagues that are great on data quality and data governance so you might want to check that out just for further education a little call out to the lessons in data modeling series so the next one is going to be on conceptual data modeling and either how to get the attention of business users or if you are a business user what the heck is data modeling and how can I get started and what makes sense for me to know about that data modeling and then you'll see DataVersity that's one of their most requested topics so we're trying to mix it up and hopefully keep it interesting so at this point I'd be happy to open it up to questions and thoughts ideas and anything you'd like to share and I'll let you coordinate that Shannon Donna thank you so much for another fantastic presentation always such a pleasure and just to answer the most common questions we receive I will be sending a follow up email to all registrants if you have a question and anything else requested throughout the webinar so we've got lots of great questions coming in Donna so I'm just going to dive right in here what is your view on the impact of big data on data modeling I think the impact of big data on data modeling has sort of talked about this several levels so there's a bit of the hype that anything new has to replace the old which again could be a full rant that I will refrain from I'm going to mess it up the tyranny of or and the beauty of and I guess is what I'm trying to say you don't need one or the other I think they both work well together I think big data fits in a lot with kind of what we're saying in some of that exploration mode and so you know I maybe just need all my sensor data and I need a massively performant and less expensive way to store it I might put I do for even a cloud source I am seeing more customers trying to rather than having a different places doing some data warehousing on a big data you know big data means a lot of different things but for thinking of say like I do platform there are high structures there's a way to do relational databases and because what they're seeing is when you're looking at the type of report for a financial report you still have to do that rigor to know what you need to model what you don't so if I'm really just getting sensor stream data and I need to get the statistics off that I don't really need to model that but if I'm doing my financial reports for a BI report I certainly should we had another question that was just along the same lines you know and specifically you know has it relates to the you know big data relates to the non-relational the NoSQL databases so yeah I would say for for purpose I was thinking back if I had a slide on it you might also want to catch I think it was last for those interested in big data it was last was it December or October I think we did a big data data modeling webinar which I think is still on recording so you might want to catch that as well but yeah so particularly for data warehousing I'm sort of a fan of the relational you know table structures because I think behind those is good but you know it's like most performance or I'm trying to do I don't know real-time you know shopping cart for my online system then I think you know these NoSQL databases could be excellent but it's really to me a different use case although at the conceptual level we just still know what we mean by customer right and some of those but I think again I just think fit for purpose I think modeling still stays but modeling NoSQL huge as well so kind of you know some of the document databases for example are a little easier to model they're kind of set up for that so they're going to keep value pairs kind of more of a thing relates to thing kind of a broad topic but I would point you first to that webinar and then we'll be talking more about that I think later this year as well sure yeah we do have that published in our on-demand webinar section on Dataverse and I can include a link to that to everyone out in the follow-up so what about privacy if I'm a paying customer why is my usage being manipulated I already paid and I don't want my personal characteristics to be used where do you opt out of being used for data modeling so I think privacy is a huge driver for a lot of this and I think data modeling helps with that so I think early on I had in one of my examples the data about my customer and that was one of the questions the data architect said we need to consider PII so just clarification when we're talking about modeling here we're thinking of modeling the data structures we're not talking about sort of doing statistical kind of big data modeling on purchasing patterns and that kind of thing it's sort of a different kind of modeling but I think exactly that's where data modeling can come in is that I can say these are all the attributes I know about customer I might sign up for a product they're going to know my name they're going to know my credit card information if I purchased a product but I need to ensure that when I put my credit card information only the people who need to know to see that to make the transactions see it that should not cascade across the organization so I'm seeing that a huge driver for for data modeling and I guess it also ties back to the question on data modeling for big data so that I want to move a bunch of data off to the cloud well what is that data PII unless I can absolutely know that this is tokenized and so a lot of my customers are spending a lot of time on that very question and using a data model to help track that so I have a hundred attributes but a customer is fine for marketing to use these you know 17 but don't use these 20 you know this is a generic thing but we have ten customers who are interested in skiing you know that's fine but we don't want to know that Donna Burbank is a skier that's kind of goes too far that's fine I'm not embarrassed to say that but you know I don't want a marketer knowing that or I'll get a bunch of ads about skiing well you know data privacy of course is right into data governance so you know looking at the three levels of modeling where do you see the data governance council or work groups as an audience well it sort of depends on how you define your work groups but I think all of them to a certain extent I was going to try to see how much of everyone's talking about the same one so again I see often the data governance council is often made up of business people and it should be so I think at that sense a conceptual model is excellent and I think a lot of the discussion I've found that's been super helpful in some of these governance councils is you know what do I mean and it's always amazing to me when you go to a company it sounds like that would be the most obvious thing what do I mean by account closure or what do I mean by something that from the outside world would seem pretty obvious is when you close the account but there's so many subtleties in the business that I think at that conceptual model is where that can be very helpful to people to understand that because it's not always a definition of the data itself but it's often the relationships with how it relates to something and not often can be a aha moment oh you're talking about the customer that relates to the support rep I'm talking we call those different things so I think the conceptual level can be very helpful as you get down to logical and I guess that's the business side of the logical as well sometimes you need to get to that level of detail I think the physical too I think on the council should be some technical folks and often they'll raise their hand and say I know that's what you want for the business we can't we don't store it that way or we can't store that data or you know that's PCI we can't put it in this platform or etc etc so I think a council that works well is a bit of both I think often a lot of the big aha moments are at the conceptual level but you kind of need someone who knows the physical to keep it real of you know heavy thought of this this is how he'd implement it or it's not going to be a performant or etc etc so I think all of the layers but hopefully that clarifies well I have a couple of different questions which play off that you know can you speak to the difference between while we're on the slide specifically can you speak to the difference between element and entity with respect to the three levels of modeling so that was a field and entity and what was the other one element so I'm going to jump to another one to answer that so yeah I would say what do they say the cobbler's children has no shoes and I think as an industry we're terrible about using the same term for different things they're a different term for the same thing so what one person calls an element someone might call something else maybe this one might be a little better to show so if you have a customer and this picture to me this is almost a relational table the customer to me would be the entity and then in data modeling world the data field would be the first name last name a lot of folks call those an element some people have a more generic concept of a business data element so maybe my business data element social security number and that's just a generic business element and that's a key element and we need to know that's PPII information and that might link to one or more logical models where it might be social security number and one SSN and that kind of thing and then it's actually limited as a field and different database tables so kind of linked I would say the only thing that's very different in my mind and that is generally an entity which I see and fields and attributes and kind of at that granularity but they kind of have a different use case in each tool and each vendor something different but that's generally there's kind of a business element or a business term a logical data attribute and then kind of a database field is kind of a common one so that helped Indeed and you know going back to the previous question you had mentioned physical models the question is doesn't star schema model consider as a physical model as that's how data is stored in tables Yes Yes but I want to make my case and some of the vendors will even argue that there's only a physical model so yes I would say this is one of the ideas of a star schema is for that performance and this is how your database would look I'm a big fan of those starting with more of a either conceptual and or logical as well because if we're get back to that hey, if I'm trying to work with a business user I'm saying this is what your report's going to look like you're going to do sales by month, by region, by product and then you probably have those questions what I mean by product not a bad idea to do at a conceptual level as well but yes at its core most people think of the star schema as a physical implementation of the way you're reporting upon and creating cubes from but I think there's a place for thinking about at the logical and conceptual level too There's so many great questions coming in if we don't have time to get to them I'll just keep them and I'll see what we can do to get some additional answers for you I just love this engagement from our attendees it's awesome some might argue that with a dedicated metadata repository the inventory of data assets is already available but I have yet to see a metadata repository that can effectively give the visual presentation a modeling tool typically can can you comment on that I would agree and there is that overlap of what goes in the repository what can be done on the data modeling tool and there is some valid overlap the part of the well part of the value of a data model is the process and the iteration and the visualization so I'm building a model and I'm trying to understand you know I have a customer and I have a payment and we call a payment a voucher and we change that and we can visualize it and it's more active and used and I like that representation of a model in the showing the attributes showing that I think the question earlier that was kind of talking about elements and fields and how they relate I often will build a lot of that in a data model first and then import that into a metadata if a my customer is using a metadata repository import then because I think there is a unique perspective not the only perspective but it's a unique one that a data model offers and kind of seeing those layers and linking them together gives me my conceptual view of what a customer is and we're going to decide that we're going to create standards around that and I personally am biased that I kind of like the view of a data model I will say about metadata repositories have come a long way too and I think sometimes their lineage is going to be better but that's what they're meant for they're doing a super set of not only data models so that you could probably argue any source system that they get a lot of these repositories can get JavaScript so they're getting anything that's more generic a lot of the Java program is going to like their format better in their Java program so that's kind of by definition they're going to do something more generically hopefully that rant made some sense I'm not sure it did indeed and I think we have time to slip in one more question here but again keep the questions coming in we've got so many great ones I'll get those over to you what is your view on the role of subject area definition model for conceptual data modeling? Oh, someone there I thought you asked that but they probably know to set me off on a rant but I won't so a lot of people call the conceptual model the same as the subject area model and in my book on the second page here Data Modeling for the Business which is on I will say I'll say it because I'm the only one of the two other authors it's unconceptual modeling in between the three of us to that note of cobbler's children you know they don't have shoes we couldn't agree on what we would call so see Steve Holberman who's a lovely gentleman and very intelligent he called it a subject area model I called it a conceptual model Chris Bradley my other co-author called it a conceptual model and Steve's very persuasive and he's the publisher so he won and we just decided to call it a high level model just to keep it let's not argue what we call it we did do a survey just because I must be competitive people proved me right most people call it a conceptual model the reason I don't like it called a subject area model partly because a lot of the tools have things called a subject area that they're kind of a different thing whereas I think of it I think of a conceptual model as concepts I have a customer and a product and a subject area would be an area of it could be my finance subject area where you group many of those together but I don't worry about it as long as someone does it but that's my color commentary on that often people see them as the same thing and I see them as slightly different I love it well Donna thank you so much for this amazing presentation today and again thanks to all of our attendees for being so engaged in everything we do we just love all the questions keep them coming in like I said I'll get any of the unanswered questions over to Donna and just a reminder I will send a follow-up email by end of day Monday to all registrants with links to the slides links to the recording and I'll get the additional big data recordings in there as well so you can go back and look at those in addition so thank you everyone for attending today Donna thank you so much for your time today and I hope everyone has a great day and thanks to Idea for sponsoring today enabling us to make it all happen we appreciate it I'll just do one call out because there were so many questions I did put my contact so if someone wants to if it's not in your list and they want to reach out to me personally they feel free to do that as well awesome I love it and that will likewise be in that follow-up email so alrighty thanks everybody thanks Donna thank you