 Welcome, my name is Shannon Kemp, and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining the latest in the monthly webinar series, Lessons in Data Modeling with Donna Burbank, and sponsored today by IDIRA. Today, Donna will be discussing data modeling and business intelligence. Just a couple of points to get us started. The large number of people that attend these sessions, you will be muted during the webinar. We very much encourage you to chat with us and with each other throughout the webinar. To do so, just click the chat icon in the top right corner of the screen to activate that feature. For questions, we will be collecting them via the Q&A section in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our questions via Twitter using hashtag LessonsDM. As always, we will send a follow-up email within two business days, containing links to the recording of the session and additional information requested throughout the webinar. So let me introduce to you our speaker, Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Limited, where she assists organizations around the globe in driving value from their data. She has worked with thousands of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa, and speaks regularly at industry conferences. Speaking of, you can meet Donna in person at Enterprise Data World 2017 in Atlanta, April 22nd through the 7th. And she'll be starting with a tutorial there on practical steps to implement a metadata strategy. And with that, I will turn it over to Donna. Hello and welcome. Shannon already introduced me. Just a little more background. I am on Twitter. If anyone falls on Twitter, I'm at Donna Burbank. There's also a hashtag for today. There's always some great conversation about the event. And some people like to continue that conversation on Twitter. So please do with that hashtag. I guess the other interesting part of my back-to-back for today's conversation is that our sponsor today is Idira. I, for many years, ran their product management for your studio when it was back with Embarcadero Technology. So note that, dear and dear to my heart. And we'll talk a little bit about various technologies today. So moving forward to the topics at hand, as I mentioned, that we have a whole series of data modeling topics this year. If you missed the one on data architecture last month, that was actually very popularly attended and got some good online feedback as well. That is all recorded. And then coming up, you'll see we've got a wide range from conceptual data modeling to metadata, which was actually a replay from last year, a re-request from last year based on popular attendance. So hopefully you'll join some of the others coming up. But today we'll be talking about particularly data modeling and data modeling in the world of self-service BI and more traditional BI and really how I like to say it, how data modeling is the intelligence behind business intelligence, right? So we're not always at the forefront. Data modeling is the forefront of the flashy reports that maybe the business user sees, but if you don't have the back end part right or the right business meeting and content, understanding the data structures, your reports aren't going to be as effective. So we'll talk a little bit about that moving forward. So here's, I like the slide. I made it, right? At least in the US we are oddly fascinated with the idea of the bumper sticker, right? We put these things on our car that we feel very passionate about. And one, if you haven't seen it, is if you can read this, thank a teacher. You know, the idea that if you can read that a teacher was behind that. And then it's sort of been overtaken by other groups that have musical notes if you can read this, thank a music teacher and that sort of thing. Well, if I thought about it more, when you think of a nice clean report or you're doing self-server reporting and the data is just nicely formatted, behind that is generally a data modeler, right? Or someone who actually helped build that data in a nice clean way to make sure the data was correct and that it was architected correctly. So in a way, you know, maybe like teachers feel, we're sort of the unsung heroes, making others look successful. So that's just the feeling while we're doing this and why I'm still in the industry is still having a lot of fun with it actually. I think more than ever, I mean we've always been data driven in business, but I think business folks are really getting it. I think the technology is such that it's more accessible to everybody and this was just, I just did a quick, you know, some of the top business magazines that we should be familiar with. All we're talking about data forms are the data driven business page. The Wall Street Journal has things about the data driven business and if you have not been living under a rock, you've probably heard how the data scientist is the sexiest job of the 21st century have my doubts and their definition of that, but it is something to be said that, you know, data is now hot and then a lot of people are paying attention to it and the reason is because we really see the business value. So it's sort of a corollary to that. We see the rise in self-service business intelligence, I think makes a lot of sense. A lot of companies want to be more data driven and they want to get their hands on their own data. I think also a lot of technology has really come a long way to make that possible whereas, you know, I think the business people have often in the past understood the data and the meaning of the data, but some of the tools were either so complex or they just didn't have access to them. That wasn't really feasible. So, you know, some of the drivers of, you know, the idea of not only self-service BI, but self-service data prep, maybe to do munging or whatever the vendors, various vendors want to call it. And I have to say the tools are slick. I mean, there is some neat stuff out there and it becomes a lot easier to manipulate data that just took, it would take months in the past and things are a lot easier. There's more data that's more accessible. We'll talk a little bit about, you know, things of open data where you can just go to the web and, you know, governments and scientific research agencies and things are just publishing the data that's open for you to analyze. And if you're a geek like me, you can waste a lot of time doing that. There's some neat stuff out there. Or even things like social media. I want to see, you know, Twitter sentiment or if there's a lot more data available and I think people are trying to get insights from that. And I think, you know, business users are becoming more tech savvy as folks just use technology in their daily life. And I think to some extent, business users have often been tech savvy. I mean, I've seen some amazing spreadsheets out there and some very complex, you know, pivot tables and things like that. And I think a lot of the business users looking at some of these new tools, well that's not hard of a spreadsheet. You know, I understand the concept of data manipulation. I just never had the tools before. So that's great. There's a lot of opportunities that can also be fraught with challenges and you might be facing some of that as well. Of the reports are only as good as the data behind it and is all data accessible in a format that's reportable. So that's really an opportunity for folks like data modelers and the models and the metadata. I'll talk a bit about that a lot today of the metadata behind the models. That really makes the job of business intelligence easier for both the BI professionals that might be building the warehouse and generating the canned reports that everyone needs or maybe just the casual business intelligent reporting user who just would love to have a trusted dataset and be able to query it easily if it existed. And I think business users are often frustrated with self-service BI, so you might be wondering about the pictures. But sort of my analogy is I think they want and the promise that the tools is there. I have this great tool and I can slice and dice and do all these fancy reports and visualizations. And I want it sort of like a salad bar where I thought I wanted the salad and I pick a little bit of lettuce and a little bit of beans and whatever I pick and it's this beautiful salad. And what we hand them is sort of a shovel and some seeds and what a watering can. It's hard to get data in a nice, clean format and the end result of the report that people see took a lot of work to get there either aggregating the data from different sources or making sure you have common master data or reference datasets, et cetera. So I think that's an opportunity for folks in the architecture space to build those trusted datasets that folks can use. And I think often there I've seen in some organizations kind of an osu! versus them. Why are these people getting into our data? Well, it's not your data. You might be managing it, but they want to see the data too. And they're equally frustrated that they'd like it to be a lot easier. So I think that's where a nice architecture that can be easily queried really helps everybody. So here's an example. Data is only as good as the metadata. And I kind of being a self-proclaimed nerd and I think that's a great thing. I think I'd be offended by nerdery. I think nerdery is a wonderful, wonderful trait. We're the sexiest job of the 21st century, right? So I actually went through this a bit myself and kind of put on the self-service BI person. Partly to play with some of these tools I've been wanting to spend more time with. So it went out. It was actually the UK Open Data site. And that will be kind of a fun example to visualize and play with some open data to show how easy it is or not easy. And there was one sort of on road safety by vehicles, by making models. And it was really kind of controversial. Does a Porsche have more accidents than a, I don't know, than a Ford? I don't know. So I thought that'd be sort of interesting. And this is probably very much the experience that the business user might have. And that sounded really neat. I just thought I could point one of these self-service BI tools at it and have some great slice and dice pie charts. And I open up the data and it's probably every Data Modeler's nightmare. You have the typical field one, field two, field two. Like I said, there's like through field 100, right? And if you look at the values, it's not very intuitive. There was no make and model. There's some sort of reference set or reference data set where these numbers mean some sort of make and model of car. And then you have numbers, whatever F-10 means. I wasn't quite sure. But I, undaunted, went through. I did some great, beautiful reports. Look at the one on the right. I can see that F-13 is really great. There's 250K F-13. I had no idea what this means. And then I did some more visualizations than found yet. F-13 and down there at the bottom is definitely doing something, but I don't know what F-13 is. And that's sort of a very classic metadata problem. To be fair, I also sort of gave myself a time limit, not to be a busy person. I could spend, actually, I highly recommend if you have some time. A lot of these tools are, the visualization tools are downloaded for some free trials and there's amazing amounts of open data. It is kind of fun to play around. There's the good side of self-service BI and exploration as well that we'll talk about. I sort of gave myself 15, 20 minutes. How far can you figure it out? And as a business user, I just need a report for my meeting. Can I get this out quickly? As I went through the data, I did see some insights. And I could start to figure out the patterns. And I think that's what a lot of business data scientists do. So there's the pros and cons of that. Sometimes munging through the data, you can find insights. But I didn't have time to find insights. I wanted just a nice great report. And even with understanding trends, I still don't know what F-13 is. And some open data sets are better than others. And not to knock this particular open data set. I mean, some are very clear that we're just putting it out there. We don't have time to curate it because as we know, in this industry, curating takes time. So most open data sets actually are pretty good. And there are metadata of how it's used and what it's used for, et cetera, et cetera. But I thought this was a good example of, you know, that's great. I can have the best tools on the planet to look at it. But I don't know what it is. You know, fairly data type things. You'll see that F2 sort of looks like a year. But they see that as kind of a numeric, 2000, 015, et cetera. So, you know, just kind of an example of that. And this sort of hit home the point that metadata matters, right? So I think with, especially with this, I rise up self-service BI, self-service analytics, more and more attention does need to be paid to the quality and the content and the structure of the data, a.k.a. data models and metadata. And if you look at some of the quotes at the bottom, this is a known problem. And you're probably facing it yourself or hearing others complain about this and certainly being written about that you have a BI professional or a data scientist or a tech-savvy business user and they want to get the results. They're spending 50 to 90% of the time just cleaning the data and reforming the data. And that little example I went through, I was interested in the results and I sort of would, where I do have gotten that problem, I would have done more research and what those numbers meant and then probably tried to create some structure behind it and all of that. That would have taken time. I really just wanted to know how many accidents happened by each vehicle time. And to the right, you know, data quality is, it might take 80% of their working day and that's probably from the data scientist, which is I know different from but closely related to BI. So least favorite part of your job, you know, you want to create these great reports and see the great insights that you probably don't want to, you know, manually change data types or, you know, try to go through and see what data means and that sort of thing. You're not getting to the insights you actually want to get to and that's why they pay us the big bucks in the data model and metadata world. Not to fix this sort of thing and make everybody's job easier. So this is a bit of a plug for a research paper we did last year but I did think it was interesting, some of the insights. We just did a survey for the trends for metadata, which is related to data modeling closely, more for metadata and who are the users of metadata management. So you'll see here that, no surprise, very high percentage of the BI reporting team, the data scientists and business users who we just said business users, we didn't go much deeper than that but probably the self-service BI or folks that are looking to look more at the data are some of the biggest users and requesters for metadata management and I followed closely by data architects and data modelers and I kind of put that X slightly because I see that as a slightly different one as the prep and one as the consummation consumer. And then you'll see the DBAs and developers are actually closely related as well. So I mean it is something to have a good metadata and a good model and a good structure really does help everybody and I think the more people look at the data they're sold in that idea pretty quickly that, wow, if someone could just structure this for me very easily, my life would be easier. That's what the data meant. That would be helpful as well. So if you look at sort of the classic, even with the rise of self-service BI I think we kind of hit that point home is that you can find insights on data but it's a lot easier to find insights on data that's been managed and manipulated and cleansed for you and there's certain use cases for this concept of data warehouses. I know the industry loves to create controversy and things like is data warehousing dead? Now we have Hadoop and don't get me started, right? I probably have been to enough on some of those things and other webinars but there's a use case for each technology and I think the idea of having even an organization that's very driven on discovery and innovation through data science and looking at raw data sets within and out of things most companies have something that needs to be your financial reporting, for example. It needs to be curated. It has a lot of volume. We need to manage it to be more effective in the data warehouse. There may be innovations in how we format the data warehouse or how we populate it or what the source systems are but that concept of managing the most important data in your business I don't see going away. And again, my little phrase where the intelligence behind business intelligence story may kind of explain that. So you might have your average business user saying something that seems so simple. Could you just show me all customers by region? Can I maybe have that report by this afternoon? And I don't think, you know, well, I think more and more business people do understand the complexity of data but often that doesn't seem like a hard request but think to get that you have hundreds, dozens, thousands of source systems that probably have customer data. Something as simple as what's the definition of company as a different structure? Who's the owner or steward of that data? Do we truly understand what that source system means? Is this sales calling it a customer that might be actually a prospect and is it, you know, finances calling it a customer who actually owns their product? You know, all of this difficulty that we need to understand the source systems and then put into data warehouse for a, doing that, you know, ETL or the transformation or the munging of getting all that data conformed and then as well as thinking of things like performance or, you know, things like a dimensional model. I know that's not the only way to do a data warehouse but it's a common one to really focus on, I'm going to use this data for reporting and then for reporting you need to model. You know, what are the definitions of these key business? What do I mean by total sales? What do I mean by region? Is that, you know, is, is, I don't know, North America, Canada and the U.S.? Is it just Canada, the U.S.? Mexico, you know, a lot of, the different subtleties on how we define regions and how I make a service efficient in reporting correctly. So this is probably your classic, you know, from source to transformation to warehouse to target to the, the information on your BI report and I really see a data model kind of hitting every step of the way there. I would, you know, we've talked a bit about that idea of it's a long way, people want quick insights. Do we even need modeling anymore? Are we just getting in people's way? And I think, of course, yes, we still need model, but I think there is the balance of modeling what matters and have this idea of a trusted data set. And I have worked with a lot of organizations recently actually that are struggling with this in something like a data governance program. And I think there's only conflict when we don't actually spell it out and see what we're trying to do. So I think it's managing what you need to manage and leaving it alone where people really just need to explore. So, you know, I'm just trying to look, I've launched a product for marketing. I know that the results of that marketing campaign when I send it up to management, we're going to report on the street or whatever, those financial numbers need to be super accurate. But I just want to get some Twitter sentiment analysis to see, we just launched yesterday, what are people saying and get some trends. Leave me alone. I don't need a whole lot of governance around that. I want to actually go look at the Twitter streaming data and then see what's out there. So you don't want to hamper people's creativity or hamper, you know, the right tools for the right job. But you do, whether it does need to be a set to give that example I found something on Twitter, what customer said this, I hope there's a master list of our core set of customers. And I hope there's a reference data set for the different regions. And I think you won't get too much argument around that. I noticed a lot of the companies I've been working with that do want to be much, very much on the right. And do, you know, they might have a cloud-based data lake where it's all, you know, the innovation, but I think as soon as people start to look and to do some of this, say, if I just had a nice clean reference set, I could be there. So, you know, some of this against, and if there were a master data list when I need it, I think that's a win-win for both parties. So for me, it's just understanding what does need to be modeled and then what doesn't, you know, allow that to be what it is and do the exploration and the real value of that report is when you can compare both of those together. So what are the customer trends? How does that relate to my specific customers? You know, what are the trends by region and how do I define a region? You know, all of these different things that need to integrate with a report. You can be done with that balance. So I think that's an important point to make and I sort of get annoyed when, you know, I think it's got a lot of media and that sort of thing, you're trying to get, you know, clicks, right? So they try to create controversy that one is better than the other, one is going away. And, you know, they're all good. I live in Boulder. We're just kind of a hippie town. It's a common phrase. It's all good. It's all just the right tool for the right job and making sure the touch point integration has made sense. So the other thing to remember is the different modeling levels. And I tend to show this a bit often in the presentations that I give, but whenever I leave it out, we kind of wish we had this to refer back to. So just when we're talking about modeling, there's different levels. You might be at the conceptual model level. We're really, you're just talking it. Very high level business terms, definitions. What do I mean by customer? What do I mean by product or region? And that's super important as we're looking at a report. Because if we don't have common agreement, when we say what a total sales by customer, and what do we mean by a customer? Is it a current customer, past customers, you know, gold level customers, then your report is meaningless. So especially with BI, that high level conceptual model makes sense. You know, the logical, that's more, it's still, you're talking about the business, but you're adding some different rules about it. You're adding some attributes, what are the attributes of customer, name, address. And we're talking about Star schema that we're talking, you know, a little more about what I'm going to report upon. So you're getting a little more detailed, but the focus should be my business goals and objectives. And then the physical is really at the physical database table level where either you're doing inventory of my current sources or you're creating a new source and you're really trying to optimize for performance and that sort of thing. On that conceptual level, I mean, business meeting and context, I think I touched upon is so critical. So again, the business person might ask what seems to be such a spy region and the good data architect dreams and gets excited about this thing and ask all these questions that other folks may think were strange, but once they start understanding, I don't think they'll think it's strange at all. You know, does this include current customers or just lapsed customers? How do you find a region? Can you kind of customer have a billing address in more than one region? So maybe you were building one region, but you purchased in another. What do you mean by region? You know, all this type of stuff when you really want to make sure that a report is correct, you think about that. Or, you know, we're thinking we're in the age of compliance and things like GDPR or, you know, personal privacy is a big deal for folks. We have customer information. Should I obfuscate or hide some of the PII? Do you want trends of customers by region? I really can't show you customer names by region. That's sort of invading people's privacy, right? So there's a lot to think about on just a simple thing, like, show me all my customers by region. And that's where a data model comes in or at least the rigor of that type of questioning that we tend to do in data modeling. And this is a cartoon that I tend to use a lot, but, you know, there just aren't a lot of cartoons about data models. So even though it's really not that funny, I use it because, you know, it's a data modeling cartoon and there just aren't enough of them. But I think if you have been in this business, you might find it funny. You know, we're here. We're all done with user acceptance testing and everything looks great. We're about to roll out this new marketing application. Just one question. What was the customer? What do you mean by customer? And that, you know, might seem kind of strange, but I've been in groups where we haven't been clear about customer and it might have been that we're only looking for online customers and not brick and mortar customers. And the whole campaign we built makes no sense anymore because nobody asked those basic questions in the beginning. So that's why we always like to have a data modeler in the room or an architect to ask those sort of questions. And I have seen more and more companies really sort of get that. And even, you know, I've been in work with some companies that might be a product development. You know, they might be developing tires or, you know, and they have kind of, I guess what we would call Agile or Scrum meetings type of thing of helping, you know, the product launch meeting. And more and more, there's a data person in the room. How is this decision going to affect data? Well, this, do we need more information to report upon? And I don't, I think that probably always happened to some extent, but I'm hearing it happen more and that I know some of my customers were never in those meetings until the past year. And I think now more important people are getting of, we can't wait till the end of our sales cycle or our product launch or our campaign or whatever. And then start thinking about the data. You need to think about it in the beginning. Here's an example of a conceptual data model. In this particular tool, one of the things I like it about it is that it just hides all of the detail and really just focuses on the business definitions because each of those model levels has a purpose. And the purpose of conceptual data modeling is really to get that communication. So this could, you know, it seems very simple, but right, you know, quickly a person could look at that and say, okay, a customer is a person or organization who's rented a move, wait a minute, we don't sell the organizations, we just sell the individuals. So right there, that could be a big business change or, you know, business clarification that we needed just by showing it clearly on the diagram. Because the data model metadata, both technical and business, can be used by a lot of different roles. So it could be just as something as simple from the business person and finance. What do we mean by regional sales? How do we find regions? Is it by geographic regions, sales regions, political regions, you know, a lot of different discussions there. I might be a data architect and I'm trying to build something new, but is there an approved data structure for storing customer data? I don't want to reinvent the wheel. So if you can make my life easier and give me something that exists, thank you, that would make my job easier and might as well have a standard. You know, it could be an auditor or an e-report saying, please not just verbally tell me how total sales was calculated. I absolutely need to see the lineage and any of you in the financial industry know that, you know, there's regulations around this that you can't just say, I don't know. I actually need to show and show that data lineage of how that data was calculated. Again, you might be building a data warehouse and I need to understand the source to target mappings, how that data was created. Or, you know, I've seen companies use data models and the business repinded something like HR and I'm trying to get my staff up to speed on the company's new business. You've done a data model, especially at the conceptual logical level, a lot of it is. Do we call this a client or a member or a customer or, you know, how do we term these things? It's really the lingo and the way the company does business. So I have had seen companies use a data model or a version of it. Might be turned into a glossary or some have used the model itself at a high level because it kind of shows the relationships. And we have a customer with a care representative which is different than your sales rep or et cetera, et cetera. So it really can help non-technical people as well. So another thing you can do when you're doing these data model levels is this idea of kind of map or show the lineage between them. So it could be something like, you know, especially with things like regulations or a report, you know, I have this concept of client on my report. Where did that come from? Well, you know, in logical mind we might use the term customer for that. I know you guys say client, but in our world it's customer because on all the different tables, Oracle called it cost and Teradata called it customer and DB2 it's C table underscore 16. And being able to see that lineage of, you know, where is all my customer data and how is it used? And this is just showing it at the very high entity level, but you can do the same thing for someone with a thing of reporting down to the attribute level, the business term or the core data element or whatever you call it, the metric to really see what do I mean and where is this data stored? So the other part on the physical level, really that's your kind of active inventory of your data assets, and this isn't insignificant. So one of it is, so think of that example where, sort of teasing, but the naive business user, but this gives me a list of customers by region by this afternoon and we don't have anything in place. You know, customer data could be in thousands of sources across the organization. And one of the nice things about a data model is most of the data modeling tools can sort of point the data model to these sources and create a picture like you see in the middle automatically. And there's been some amazing aha moments of, wow, I didn't know that we had this data or this data structure. And then when you start in a model repository, you can start to do some impact analysis off that of, oh, these 16 tables all use the same data tie, having the same data element, for example. Know what data you're mean. So once you have the data models, you can link it to that logical or conceptual model leader top down, bottom up or some sort of mix between, but then the beauty of that is that your business requirements are linked to your actual technical infrastructure. And then it can support data consistency. So in this example that we might have, you know, reverse engineered 100 databases and we see that the date field is stored differently. It's a character field and some of the date field and others. You have to start to create these new standards or business rules, the technical rules. You know, maybe dates should always be in a date field. And then when you do new development or you want to go back and clean up your databases, you can use that from the model. So you have kind of these core standards that kind of make the data more consistent. You know, think back to my example of the open data where they had a date field that looked like a numeric, didn't make any sense. So if we can just get that stuff out of the way, so that's, you don't even think about that. You know, a date should be a date. And when you're doing the reports, that's just not, you know, ideally the business person would never even know that was an issue, it was just as nice and clean. And then it adds that context and definition that kind of already mentioned that, but you know, the beauty of how, you know, some folks might say, well, I have all that stuff in my database and I'm only using Oracle and, you know, I kind of know the date structures. I'm a DBA, I get that. But most data modeling tools can add a lot of metadata around it. I talked last year in the metadata section, they can almost be sort of metadata repository light in a way if you're only using it for a very focused set, especially around things like BI. You can add a lot of either tags or fields or whatever the tool calls it. So you can have whether this is a required field, whether there's some business rules around it, whether it's private or secret, you can add a lot of context around data that might not have been known. Or simple things, you know, city. What do I mean by city? That seems very obvious. Everyone knows what the city is, but in this context, is it the city where the customer lives or where they purchased it in the store where it was located? Again, if I'm doing the report and I want to see sales by region, that is a big deal. It could be that I live in New York, but I was visiting London and bought this pair of shoes. So what does that mean by sales by region? You'd think, I would think there would be a London sale. But again, that's something the business needs to define. And then this is just another example of that. So, you know, a technical metadata, that's almost your DDL there on the left where it actually would be your DDL if a lot of the tools can forward the engineer as well. So I create my standards in the modeling tool and kind of forward engineer that onto a platform. And the nice thing is that that's linked to your actual business data. What does an employee mean? What does the customer mean? And then as we know, that's separate from your actual data, but not really. And especially when we think of things like personal privacy, it's always good to remember there's actually a person named John Smith that we're reporting about or sending a marketing campaign to. You've got to keep that in mind. So here's another data-worthy example. We kind of showed it as kind of the classic source of target mapping to generate a report, but I thought this part was interesting in that it really shows where data modeling hits just about every step of the way. So if you sort of start at the right or a lot of folks, if I'm looking at a report, it might focus, you know, I'm looking at my sales report, right? I want to know total sales by region. I might be using some sort of BI tool or self-service BI. Generally, often, that's sourced from something like a data warehouse. Or to build that, hopefully, either in the modeling tool or separate, there's something, you know, business glossary. What do I mean by customer? There's probably we'll talk a bit more about a dimensional data model to set it up into cubes and easily report so that you can generate a nice report, you know, total sales by region, by sales rep, by whatever. And then there's the physical tables behind that that generate the report. So that's kind of a lot. And the nice thing about data model is that they have all of that right there in the tool. So there was probably, you know, going, jumping to the other side, that report that might say total sales by region, a very simple summary might be that maybe there was three databases of this game from. One was an Oracle, one was SQL Server, one on DB2, one called customer, one called the cusp table. You know, I think I've hit that point, but yeah, they're all named different things. So you'd have some sort of physical data models, multiple, that would understand that context. And then generally there's some sort of staging area that cleans this up, and there's ETL tools in between it that kind of do source-to-target mappings. Well, each stage of the game, there's some sort of physical model describing these areas. And many of the tools in the market now can actually start to document some of these source-to-target mappings, and these, I've really messed up this picture, these lines in between it as well. So what's nice is that these tools can really start to see that lineage and link it to a sales report. There's also tools in the market now, and because the space is fairly mature, in this youth case it's fairly mature, most tools on all sides have some sort of common metadata exchange, so that, for example, I have this structure in my warehouse and I want to import it into the BEI tool I can, or definitions, or the lineage behind the integration and the metadata flow between these goes very well. So again, you might store it in the separate metadata repository, you might store some of this in the modeling tool, some of it in the BEI tool itself, but they all are, nowadays, fairly well integrated, so you can kind of see that flow, which is nice. So the lineage I just mentioned, this is actually a screenshot from my dear that shows the lineage from the target, and you can put the business rules in there. So what's nice is this isn't, this is more at the business level to understand the meeting, because more and more people are using modeling tool to kind of see that pure, this isn't replacing an ETL tool by any means, but it's kind of showing you what's happening as I move between the address table and the staging area for address, can I just visualize that? Because most modelers and most people I would argue are visual creatures, so it's really hard, you might have written the script and you can see the script and you get what that script is doing, but for me, and I think for most people to kind of see that movement as a flow in a visual line, and that's what modeling is all about. So a lot of the tools now do have this part of that. So why should we model the warehouse? So I think I've hit home and off the idea of kind of the business rules behind it, understanding the lineage, but also just the ease of use and the speed of access. So as I was trying to create a unpleasant reporting situation for myself to show an example in these tools, I also tried to just not think at all about how I constructed my query, and put 17 rows and columns and mixed them together and sort of a particular, but similar, computer report, last time four days, 10 hours, 27 seconds. These reports, if not queried correctly or the SQL isn't written correctly or the underlying structure isn't such that it can be easily queried into the nightmare. So I will say that a lot of these tools have gone a long way. The one I was using wouldn't let me do it. I kept saying, warning, this has gone a long time, which is why I had to make up a fake little error message there, but I remember probably not the highlight of my career as I was writing bad SQL myself. I did some embarrassing things like that as I was learning about joins and what not to join and how not to structure a table. But that is a beauty of a warehouse that I think has been forgotten. I don't have it in this presentation, but I've shown it in others. I actually stood up at a TWI conference. It's just sort of like a big confession. The ruler of the big data space and a lot of the new analytics was sort of admitting that they had built a data warehouse. And one of the reasons was not only their understanding was what they were trying to find out how many current users were logged in at once, but we mean by a current user. If I'm on Spotify, I might use a Facebook even though I'm logged into that sort of thing. It was a similar warehouse at Hadoop and it just was taking days to get the performance back and I think the same report was 40 minutes or something with the warehouse. So performance is something that I think often gets overlooked as we're thinking about some of these technologies for reporting. And then again, if you really want that high quality data and an integrated set of data, that is part of the reason we modeled because as you look at these data sources, you can see the overlapping data types and the technical integration and also kind of the business meeting. So all this work we put in is for a reason to make it easier on the reporting side. So this is not as you've noticed sort of a how-to on how to data model for the warehouse. There's a lot of material out there already, but if this is absolutely new to you I thought this might be sort of helpful to kind of set the stage a little bit. So one common way and I know not the only way to model a data warehouse is the Star Scheme or Dimensionally Modeling, which is kind of the Kimball. There's the Kimball versus Inman, and this is the Kimball way. But one way I like to kind of think about it is, and apologies for anyone who has a grammar person on the call, but what are you reporting by? So I'm reporting by month by product by sales, by sales rep for example. And that's really what's going to be your dimensions. So the thing in the middle of the fact is what I'm reporting upon and then the things around the Star are kind of what I'm reporting by. And that's probably the easiest way to think about it. And a little different the lines and the Star Scheme or Type model are really your navigation paths for the reporting. It's similar but slightly different from when we're doing a logical model for a more relational LTP kind of customer have more than one account. I'm kind of creating business rules around that. Not so much here. It's really to try to get that reporting structure. So again, it's kind of a summary of that so that the fact, again, that's going to be the thing we're reporting about. So what are my sales figures? Generally, you know, it's not we have tons of out at these are very deep tables. You'll have a lot of values but not as many attributes and the dimensions you might have a lot of attributes but shoe values is kind of a hand wave generalization over that. But really the fact that these are my sales figures for the year. How do I report sales by month? How do I report sales by region? And in a lot of ways you can roll these up and do different simplest view of the Star Schema as we call it. And data modeling tool. Again, it's not everybody using the data modeling tool for a Star Schema. You know, people can just build that right in the database and then report upon it. One of the nice things about using a data modeling tool partly, well, many reasons but one is this nice way to visualize it. So you can literally see the Star in the middle. I mean, the fact in the middle and the dimensions around it and then all of the attributes, you know, can be defined and you can get all of the benefits of using a data modeling tool around it. So, so quickly summarized. So, you know, this idea of the data driven business is really increased the need for everything in data and data world which is great. Which is why I'm having a lot of fun lately doing a lot of the new things with data but it's also increased just the demand for BI reporting which has been around for a while but I think is in more and more demand particularly the self-service aspect. But as well, I hope we've shown that the BI reporting really is only as good as the underlying metadata and the structures and the quality. So again, a lot of the reporting tools are great but they can only be as good as the data underneath it. And data models specifically, you know, are critical for understanding the meeting of the data, making the data reporting process easier and improving performance and really critically when we're talking about something like a warehouse you know, understanding those source and target systems and the lineage and how you get that right as you know that that's a big task. And having a model where you can rationalize that and visualize that makes a big difference. And again, just to remind kind of we would talk about that modeling what matters. Modeling metadata are great. They're not the only way to deal with data by any means. Not all data needs to be modeled. So do allow for this kind of exploration or just doing a quick report or you know, not everything needs the rigor that we'd like to give, you know, the core data structures. And that's often where we start and the project is really identifying what those core things are. To use a well-worn phrase, you know, I want to boil the ocean, right? But if you can start with the stuff that matters then that's key. There's me if you want to contact me after with any questions or anything I had mentioned after the fact a little bit about my company we do this for a living. So if a little call out for diversity training or interest in the metadata we have a course online about metadata management specifically and other courses as well from some of my colleagues that are great on data quality and data governance. So you might want to check that out just for further education. A little call out to the Lessons in Data Modeling series. So the next one is going to be on conceptual data modeling and either how to get the attention of business users or if you are a business user what the heck is data modeling and how can I get started and what makes sense for me to know about that data modeling. And then you'll see we have a variety of topics as data modeling is becoming more and more popular as according to Data Diversity there's one of their most requested topics. So we're trying to mix it up and hopefully keep it interesting. So at this point I'd be happy to open it up to questions and thoughts ideas that you'd like to share and I'll let you coordinate that Shannon. Donna thank you so much for another fantastic presentation always such a pleasure and just to answer the most common questions we receive I will be sending a follow-up email to all registrants by end of day Monday with links to the slides the recording of this session and anything else requested throughout the webinar. So we've got lots of great questions coming in Donna so I'm just going to dive right in here. What is your view on the impact of big data on data modeling? I think the impact of big data on data modeling I sort of talked about this several levels so there's a bit of the hype that anything new has to replace the old which again could be a full rant that I will refrain from but I think one of the famous speakers and you know the tyranny of I'm going to mess it up the tyranny of or and the beauty of and I guess as we're trying to say you don't need one or the other I think they both work well together I think a big data fits in a lot with kind of what we're saying is for that explore exploration mode and so you know I maybe just need all my sensor data and I need a massively performant and less expensive way to store it I might put that something like a Hadoop or even a you know cloud source I am seeing more customers trying to rather than having a different places doing some data warehousing on a big data plot you know big data means a lot of different things before thinking of say like I do platform there are high structures there's a way to do relational databases and because what they're seeing is you know when you're looking at the type of report for a financial report you still have to do that rigor so it's really the usage of big data but again I guess I would summarize it by just saying you know what you need to model what you don't so if I'm really just getting sensor stream data and I need to get the statistics off that don't really need to model that but if I'm doing my reports for a BI report I certainly should you know we had another question that was just along the same lines you know and specifically you know you know has it relates to the you know big data relates to the non-relational the no-SQL databases so yeah I would say for so again it's kind of the fit for purpose thinking back if I had a slide on it you might also want to catch I think it was last for those interested in big data it was last was it December or October I think we did a big data data modeling webinar which I think is still on recording so you might want to catch that as well but yeah so first particularly for data warehousing I'm sort of a fan of the relational you know table structures because I think you can do a lot with that and a lot of the rigor behind those is good but you know we're talking about performance or I'm trying to do I don't know real-time you know shopping cart for my online system then I think you know some of these no-SQL databases could be excellent but it's really to me a different use case although at the conceptual level we just still know what we mean by customer right and some of those but I think again it's a fit for purpose I think modeling still stays but modeling for some of those no-SQL platforms is still kind of evolving and no-SQL she would as well so kind of you know some of the document databases for example are a little easier to model they're kind of set up for that but something like key value pairs kind of more of a thing relates to thing so kind of a broad topic but I would point you first to that webinar and then we'll be talking more about that I think later this year as well sure you know we do have that published in our on-demand webinar section on the University and I include a link to that to everyone out in the follow-up email so you know what about data privacy if I'm a paying customer why is my usage being manipulated I already paid and I don't want to have my personal characteristics to be used where do you opt out of being used for data modeling so I think privacy is a huge driver for a lot of this and I think data modeling helps with that so I think early on I had in one of my examples when they said show me all the data about my customer and that was one of the questions the data architect said of we need to consider PII so just clarification when we're talking about modeling here we're thinking of modeling the data structures we're not talking about sort of doing statistical kind of big data modeling on purchasing patterns and that kind of thing it's sort of a different kind of modeling but I think exactly that's where data modeling can come in is that I can say these are all the attributes I know about customer I might sign up for a product they're going to know my name they're going to know my credit card information if I purchased a product but I need to ensure that when I put my credit card information only the people who need to know to see that to make the transaction see it that should not cascade across the organization so I'm seeing that a huge driver for for data modeling and I guess it also ties back to the question on data modeling for big data so that I want to move a bunch of data off the cloud well what is that data I'm not going to put my PCI or my PII unless I can absolutely know that this is tokenized and so a lot of my customers are spending a lot of time on that very question and using a data model to help track that so I have 100 attributes but a customer it's fine for marketing to use these you know 17 but don't use these 20 this is a generic thing but we have 10 customers who are interested in skiing you know that's fine but we don't want to know that Donna Burbank is a skier that's kind of goes too far so yeah very relevant to PII I'm fine I'm not embarrassed to say that but you know I'm not a marketer knowing that or I'll get a bunch of ads well you know data privacy of course is right into data governance so you know looking at the three levels of modeling where do you see the data governance council or work groups as an audience well it sort of depends on how you define your work groups but I think all of them to a certain extent I was going to try to see if I can just go back to that slide so everyone is talking about the same one so again I see often the data governance council is often made up of business people and it should be so I think at that sense a conceptual model is excellent and I think a lot of the discussion I've found that's been super helpful in some of these governance councils is you know what do I mean and it's always it's always amazing to me when you go to a company it sounds like that would be the most obvious thing what do I mean by sales date or what do I mean by account closure or what do I mean by something the outside world would seem pretty obvious is when you close the account but there's so many subtleties in the business that I think at that conceptual model is where that can be very helpful to people to understand that because it's not always a definition of the data itself but it's often the relationships with how it relates to something and not often can be a aha moment oh you're talking about the customer that relates to the support rep I'm talking about the customer that relates to the customer care rep we call those different things so I think the conceptual level can be very helpful as you get down to logical and I guess that's the business side of the logical as well sometimes you need to get to that level of detail I think the physical too I think on the council should be some technical folks and often they'll raise their hands and say I know that's what you want for the business we can't we don't store it that way or we can't store that data or that's PCI we can't put it in this platform or etc etc so I think a council that works well is a bit of both I think often a lot of the big aha moments are at the conceptual level but you kind of need someone who knows the physical to keep it real of you know heavy thought of this this is how we need to implement it or it's not going to be a performant so I think all of the layers but hopefully that clarifies I have a couple of different questions which play off that you know can you speak to the difference between while we're on the slide specifically can you speak to the difference between a data field element and entity with respect to the three levels of modeling so that was a field and entity and what was the other one so I'm going to jump to another one to answer that so yeah I would say what do they say the cobblers children has no shoes and I think as an industry we're terrible about using the same term for different things they're different terms for the same thing so what one person calls an element someone might call something else maybe this one might be a little better to show so if you have a customer and this picture to me this is almost a relational table the customer to me would be the entity and then in kind of data modeling world the data field would be the first name last name a lot of folks call those an element some people have a more generic concept of a business data element so maybe my business data element is social security number and that's just a generic business turn up element and that's a key element and we need to know that's PPII information and that might link to one or more logical models where it might be social security number and one SSN and that kind of thing and then it's actually limited as a field and different database tables so kind of linked I would say the only bit very different in my mind and that is generally an entity which I see is kind of a super set of those things called elements and fields and attributes and kind of at that granularity but they kind of have a different use case in each tool and each vendor something different but that generally there's kind of a business element or a business term a logical data attribute and then kind of a database field is kind of a common one so can that helped? Indeed and you know going back to the previous question you had mentioned physical models the question is doesn't Star Schema model consider as a physical model as that's how data is stored in tables? Yes, but I want to make my case and some of the vendors will even argue that there's only a physical model so yes I would say this is one of the ideas of a Star Schema is for that performance and this is how your database would look I'm a big fan of starting with more of a either conceptual and or logical as well because if we're to get back to that if I'm trying to work with a business user I'm saying this is what your report's going to look like you're going to do sales by month, by region, by product and then you probably have those questions what I mean by product not a bad idea to do at a conceptual level as well but yes at its core most people think of the Star Schema as a physical implementation of the way you're reporting upon and creating cubes from but I think there's a place for thinking about the logical and conceptual level too. There's so many great questions coming in you know if we don't have time to get to them over to Donna we'll see what we can do to get some additional answers for you I just love this engagement from our attendees it's awesome. Some might argue with that with a dedicated metadata repository the inventory of data assets is already available but I have yet to see a metadata repository that can effectively give the visual presentation a modeling tool typically can. Can you comment on that? I would agree and there is that overlap of what goes in the repository what can be done in the data modeling tool and there is some valid overlap but part of the part of the value of a data model is the process and the iteration and the visualization so I'm building a model and I'm trying to understand you know I have a customer and I have a payment and we call a payment a voucher and we change that and we can visualize it and it's more active and used and I like that representation of a model in the showing the attributes, showing that I think the question earlier that was kind of talking about elements and fields and how they relate I often will build a lot of that in a data model first and then import that into a metadata if a customer is using a metadata repository import then because I think there is a unique perspective it's not the only perspective but it's unique when a data model offers and kind of seeing those layers and linking them together and getting the full view of this is my conceptual view of what a customer is and we're going to decide that, we're going to create standards around that and I personally am biased but I kind of like the view of a data model I will say about metadata repositories have come a long way too and I think sometimes their lineage is going to be better but that's what they're meant for they're doing a super set of not only data models you could probably argue any source system that they get a lot of these repositories can get JavaScript and COBOL and as well as data models so they're getting anything that's more generic a lot of the Java program is going to like their format better than their Java program so that's kind of by definition they're going to do something more generically hopefully that rant made some sense we have time to slip in one more question here but again keep the questions coming in we've got so many great ones I'll get those over to you on the role of subject area definition model and conceptual data modeling oh someone there I thought you asked that but they probably know to set me off on a rant but I won't so a lot of people call the conceptual model the same as the subject area model and in my book on the second page here data modeling for the business which is on I will say I'll say it because I'm the only one of us two other authors it's unconceptual modeling in between the three of us to that note of cobbler's children don't have shoes we couldn't agree on what we would also see Steve Holberman who's a lovely gentleman and very very intelligent he called it a subject area model I called it a conceptual model Chris Bradley my other author called it a conceptual model and Steve's very persuasive and he's the publisher so he won and we just decided to call it a high level model let's not argue about what we call it we did do a survey just because I must be competitive and people proved me right most people call it a conceptual model the reason I don't like it called a subject area model partly because a lot of the tools have things called a subject area that they're kind of a different thing whereas I think of a conceptual model as concepts I have a customer and a product and the subject area would be an area of it could be my finance subject area where you group many of those together but I don't worry about it as long as someone does it but that's my color commentary on that often people see them as the same thing I see them as slightly different I love it well Donna thank you so much for this amazing presentation today and again thanks to all of our attendees for being so engaged in everything we do we just love all the questions keep them coming in like if you don't get any of the unanswered questions over to Donna and just a reminder I will send a follow up email by end of day Monday to all registrants with links to the slides, links to the recording and I'll get the additional big data recordings in there as well so you can go back and look at those in addition so thank you everyone for attending today Donna thank you so much for your time today and I hope everyone has a great day and thanks to Idea for sponsoring today enabling us to make it all happen we appreciate it I'll just do one call out because there were so many questions I did put my contacts so if someone wants to if it's not in their list they'll reach out to me personally feel free to do that as well awesome I love it and that will likely be in the follow up email so all righty thanks everybody thanks Donna thank you