 Hello and welcome. My name is Shannon Kampen. I'm the Chief Digital Officer for Data Diversity. We'd like to thank you for joining the latest monthly webinar series of data architecture strategies with Donna Burbank. Today, Donna will discuss designing data for business intelligence and analytics where the star schema fits in a modern data architecture. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A panel. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, the chat defaults to send to just the panelists, but you may absolutely change that to network with everyone. To open the chat and the Q&A panels, you'll find those icons in the bottom middle of your screen to enable those features. And as always, we will send a follow-up email within two business days containing links to the slides and the recording of the session and any additional information requested throughout the webinar. Now, let me introduce to our speaker for the monthly series, Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the managing director of Global Data Strategy Limited where she assists organizations around the globe in driving value from their data. And with that, let me give the floor to Donna to get her presentation started. Hello and welcome. Thank you, Shannon. And thanks everyone for joining. Always always a pleasure to do these. If this is your first time joining us, this is a series. One of the great things about Data Diversity is they keep all of the previous recordings, I think in perpetuity on their website. So if any of these previous topics from earlier in the year of interest to you, they're all available on Data Diversity for Replay. And one of the most common questions, will you get these slides and will there be a recording and that's all available on the Data Diversity site? And we also posted a link to that on our website as well. This topic, as Shannon mentioned, is designing data for BI analytics and where the good old fresh and star schema fits in that next month will actually be live at the DGI Q conference in Washington, D.C. So everyone's got to join us there. And we'll talk into the business benefits of data modeling. We'll touch on data modeling a bit in this session, which is, if you know me is near and dear to my heart. But we'll do a whole session on that next month as well. So why are we here today? Unless you've been living under a rock. You've heard about data and analytics and BI and AI and all of the great aspects of data. That's often what kind of people think of when they think of data is the reports or the dashboards. We all know that there's a lot more around data. But what is the challenge is there's so much happening in the industry and what makes our industry super fun, especially if you're technical and love to play with the new architectures and new tools. But what is the best way to support dashboards and analytics? Is it still the star schema? Are there other other ways and how that fit into what we call a modern data architecture? And what does that mean? So we'll never one answer, but we'll kind of talk through that today. And that's really a topic of our exploration today on the webinar. A little bit data behind the data. Data diversity and global data strategy each year put together a survey on trends and data management overall. So this is everything from from AI to master data to metadata to everything. And there was nice to see is fairly consistently and is growing the data. You know, organizations see data as a strategic asset. They understand that the second bullet there. Typically, why people see that as a strategic asset is reporting and analytics. That's kind of the face of data. That's where, you know, a lot of, again, business folks think of data as their dashboard really. And that's been consistent. I guess Shannon might correct me, but I think it's been like seven or eight years we've done this survey. And I think every single year, the number one driver of data management is reporting and analytics like that's not going away. That's not a bad thing. There's other things as well. But I think we'll consistently see that one stat I like to see is that bottom one that folks proactively said that they felt that there was improved collaboration through a defined data architecture. We see that all the time in our practice. I think a lot of folks brains don't initially go there if they think of architecture as kind of a tech thing or. But we see things like data models or some of the tools we'll talk about of getting that way to kind of communicate to the business about how we do the architecture. I think a star schema is actually a great way to do that. And we can kind of talk about that. But I think, you know, it's a misperception that architecture is only a techie thing. It's both and right. There's definitely a business aspect to data architecture, which is why we have a whole webinar on it next month. But we will get a little bit more into both the techie side as well as a little bit of the business side as well today. Moving ahead. Again, this is just a little bit more meat behind that stat that I mentioned of when folks say what are your business goals and drivers for data management overall. Again, this wasn't necessarily a technical question. It could have been, you know, saving costs or reducing risk or that, you know, famous digital transformation, increasing revenue. One of the, which all has a business intelligence component where if I'm trying to, you know, understand my revenue growth, I probably want to dashboard to really understand that. But the number one answer was gaining insights through reporting and analytics. And again, that has been fairly consistent across every year. So, you know, that this there is definitely a business driver for things like. Like, dashboards and analytics. So, I think we all understand what a dashboard is. This is kind of, you know, nice generic dashboard for at me code that kind of sells widgets over time and you've got your night visualizations and, and more and more self service analytics is a big thing. But more business people can see the data. I guess my, if you've heard me present, I have my all my little rants that you get to hear them all. One of my rants is, you know, what, what about. What about the data, you know, the data behind the dashboard and I think more and more people are getting it. If they don't get that the data has to garbage in garbage out. They will very quickly. Once we start building dashboard. So we work with a lot of organizations. And a lot of range of maturity. A lot of folks come to us as a, you know, consulting company as, hey, we have these dashboards and I don't trust the numbers between total revenue by year. Or what do we even mean by year is that fiscal year calendar year. I mean, gosh, we're on the site today and that that came up in about 6 of the interviews just what is a year. You know, what was an account all of these things. Some companies don't yet see those problems because we're not even in a data driven culture where we're using fast boards. So, you know, even getting data, you know, there's always that balance of do we start to get the reports in front of people when we know the data quality. Isn't there. And there's a chicken and it's a both and right. You sometimes have to show the data and it's unclean state before you even need to clean it up. So really to make a dashboard sing needs a lot. The 1st 1 is even having that data driven culture as I just talked about do we even use dashboards in our sales meeting or is that classic. I don't need a dashboard up and doing this forever. I've got I know you tell me I don't know who my customers are. You don't know where I should sell my product and the frustrating thing is, you know, we're probably 80 90% right. But actually, I was talking just to customer yesterday and she's like, yeah, well, we actually did the analytics and we were about 90% right, but I got feel both that 10% was pretty down important. We were missing a huge 10% that we hadn't thought of and we needed the dashboards to look at and I, I think that's actually very accurate. It often is that, you know, 10 to 20% you might be missing. In today's business environment, you know, that 10% percent can mean being the leader or not right. The data governance aspect we talked about, you know, how do we define total revenue how do we find a year how do we find a product how do we find a customer all of that right. When we say South America what countries is Matt is Mexico South America is that North America right so all of these different questions of the data quality is the source data accurate right that all could each of these could be a webinar in and of itself. Another aspect and where we're focusing today is that data architecture piece. So say we have all of that above. We know what we want to slice and dice by we know we have the data quality and we know what these numbers mean. How do I as an architect on the architecture side, manage this, is it a data lake is a data warehouse is that a data vault is it do I do I even need any of that because the tools are so good nowadays I can just point it against the source systems and look at the nice pretty visualizations, you know, maybe data architecture itself is old fashioned we just have these viz tools. I don't believe that. In general, there's a place for that in some cases. But what is that right answer, which is really what we'll delve into today was frustrating and exciting depending on your mood in the day and your workload right is that there are so many options out there right there's the good data warehouse right or is that is that a, or is that a dinosaur now now everything's a data lake or is the hype cycle for a data lake over and now there's a data lake house or is it a data hub. And what does that mean, or is it really a master either MDM hub or is it a data mesh, or what's the difference between a data mesh and a data fabric and what about virtualization is that really a fabric or a message and maybe it's a catalog of data. But is it a data catalog or a metadata catalog or a data marketplace of data or is that the metadata marketplace that points to the data market while we just put it all in a knowledge graph. Or is that relational non relational star scheme is great and I'm talking quickly on purpose because this is how your brain might feel right which is probably why you are at diversity to try to help make sense of this and we'll talk a little bit about this so all these are really unique things. Many of them have their place. Many of them in my opinion or hype or different words for something we've been doing in the past. That's kind of what we'll talk about today because it can get very confusing. Again, and I stay diversity survey when we delve into a little bit in terms of you know what we talked before about what are people's business drivers for data. I think it was gaining insights. Well there's a lot of ways you could gain insight you could just point AI to everything and maybe that's your insights, but we kind of delved into this a little more. And what kind of data management aspects are you using to gain insights and the number one is still data warehousing to support business intelligence reporting I'm actually pleased by this graph and that data warehouse. There's still the number one above bi because again the temptation with some of these self service bi team tools. They're slick and you can do a lot you can do a lot of a spreadsheet you can you know you can do a lot but to have the maturity that you really for enterprise reporting needs some sort of warehouse or some sort of data store behind it to make that something. And there's a whole lot else that needs to go along with some of these aspects, but that is still the data warehouse still sort of leads the day in terms of what people are looking at. The other question is, and I feel old enough now to have lived through an entire hype cycle because I do remember the day and was skeptical at the time. So a lot of the vendors and I can think a lot of you probably felt this way when there was why are you, why are you spending time with a warehouse or a data mart or a ODS or anything else that's architect. We give a data lake now just dump everything in the lake. I mean, gosh, the lake is so powerful and, and these tools are so powerful you can just put stuff in there. Well, I think we all know this kind of like your file cabinet I could have a really big file cabinet and throw my papers in there. And, but how do I organize those papers right and so that said, not everything lives in a data warehouse. I'm doing sensor data streaming from smart meters for a water company right or I'm. I'm a shipping company and I want to see the sensor data for my trucks to see or my driver speeding right some of those are great use cases for data like that you cannot and should not do in a data warehouse right so it's not an either or it's a both. And my one of my other rants and frustrations in my heart. To comprehend the both and only ask the survey. I thought this felt pretty accurate in terms of what I'm seeing that people are using data lakes, but generally in conjunction with a data warehouse. Some people aren't using likes at all. Not everybody has a use case for a lake. I mean, in some ways. You know, a lake might be your kind of raw landing area for your structure data before you transform it into a. A warehouse or something else right so it isn't for everybody it is a thing I like to think and we'll talk more about this in terms of zones for your data right there's many storage patterns. One of which is the warehouse 1 of which is a lake 1 of this might be an operational data store or. Etc etc etc and to think of it that way sort of avoids that. You know, either or because I think almost every organization of every size. Has the both and or many, many and and figuring it that way really kind of helps go through what are what are my use cases for a lake. What are my use cases for a warehouse. What are my use cases for real time streaming etc etc kind of helps at least in my brain kind of put these different solutions together and we'll talk more about that. So, because the data lake does have a different architecture of purpose and there may be different use cases and audiences. So, this is 1 very super high level pattern, but we know think of a data lake. It could be real time streaming. It could be exploratory data from social media analytics. It could be sandbox data where we're just we're doing some data science. It could be your actual. You know, real operational enterprise data that you are doing data analytics and discovery on right so there's a lot of different options, but the storage pattern is different and you can have. You can have video, you can have, you can, you can have structured data, but there's also just a wider range of the type of data, the volume, you know, the classic, you know, v's right the ball. I'm going to trip over them right the volume velocity, all of you know variability all everyone has their different v's, but that's really your data lake, which is a different thing. Then if there's just some options, there are more your enterprise systems of record, which could be your master data. Do I have a single view of my customer product account patient client. Student, you know, whatever those are my reference data my locations my account codes my charts of account all of those types of thing. They don't, it's not either or right a lot of some of your, your, your biggest advocates for this system of record are your data scientists in the lake. If I'm trying to do I know social media pattern analysis for my customers. I don't have a single view of customer. It's going to hurt my analytics so so those and those lines go both ways right some of the stuff you might explore in a little like can maybe be something we want to start storing in a system of record or a warehouse right. Maybe your warehouse, maybe your, your trending over time, your march might be that for for a different individual business units or subject areas. Your operational data might be more of a relational data sort of to see more real time operational right so they are different things they should be stored in different ways they have different. One of the questions early, I think before the webinar even started. What do you mean by designing data. So some of it is designing the architecture pattern this more of a system architecture diagram where do I store it. And then how do I store it within each of these boxes so your warehouse maybe a star schema, it may be relational. Maybe vault pattern, your operational data may be relational it may be key value pair right depending. So that's the other way of how you model or design the data. I'm one of my other Donna Rance is I'm feeling that that's a bit of a rare art and I want to say lost art because a lot of the highly paid folks doing this are the architects that sort of understand that nuance. It's not about the platform. I guess some of my frustration with, you know, folks of a lower level maturity as well it didn't work on snowflake and I'm not any vendor is better or worse here I'm just using example. It didn't work on snowflake so we sent it to Azure and then that didn't work so we put data bricks strike so the platforms have their pros and cons. But it's how you architect and model the data behind it is a big part of this decision which is why we kind of have some of these different boxes. I would say the certain things that go across all of that your data governance and I like to say governance and collaboration because I think when adults work together. It really has a better result for governments. Because people understand the why behind things that should go across all of these platforms I my favorite real world story and I've protected the names for the innocent. But we were doing an architecture very much like this. And it was a customer with PCI and PII. It was a insurance company and we went through this and it was back to that the governance line which is the purple one and that security and privacy which is kind of the cousin of the governance and it was one of the new interns or new employees. And they raised their hands like so I shouldn't be putting the real credit card PCI data onto the sandbox. You know went white and said pretty we're talking after work we need to talk through just because it's a sandbox and you can play around with things even more so. You should be looking at what could be privacy security or or maybe you're like is your true operational data and absolutely governance should go across all of this right. You may include document management which is not the topic of this webinar but you know you could say that that isn't in scope but if someone stole my credit card information and when I complained to the company they saw that that was in PDF. I don't care how it was stored is still my credit card information right so I think you need to look across all of this. And then the reporting analytics can look across all of this right. I can do analytics off my systems of record I could also do it across the data lake you know some of these self service BI near standard BI reports can kind of also kind of coexist across these so that's kind of at a super high level in terms of the system architecture sort of level and they they are different things they're they do have a different level of governance however in the sense of. I would stress that if I am talking about master data or my enterprise data warehouse that I'm reporting our revenue figures to the street or something like that. There would be a high level of governance yes we have to decide as a as a committee or a group and agree on it and vetted and it doesn't get changed really nearly whereas you don't want to have that level of governance on your on your right you want to have certain security things but don't lock that down so much that people can't have ideas and explore right so kind of a lot of points there. So, if you've done my webinar if you've probably seen these before we'll talk a lot about the architecture and modeling aspects of kind of the warehouse and reporting but none of this lives in a vacuum right so how do we store this is it a database is it big data lake. How do we integrate what's the metadata, is it for bi or analytics or warehousing, do we have the right governance and then the why, why are we doing this to begin with so I won't go in depth and all of these as they've covered this a lot but I think when you look at analytics reporting you do have to look at all of these pieces and how they fit together in different aspects. So, let's talk a little bit about that design aspect of data architecture for bi and analytics there. I will start even though we have a whole webinar on it next month on that business aspect and I know this can seem especially if you're one of the more technical folks on the call. Or you're new to data management this can I remember when I was new one of my earlier data versus the conferences way back in the day, you know someone had this type of joke of, you know we're building the application and we're all through testing or about to roll out just one thing what's a customer, everyone laughed, and I didn't get the joke, you know what a customer is right oh how naive I was right. You know customers and elapsed customers and internal customers that you know they have anything that's important to the business. There's a whole lot of nuance to that, and getting those definitions right is key before you even do any of the bi and analytics what are we even reporting on is that classic if you don't have time to do it right you're going to have to do it again so getting some of these clear definitions up front and and and I'm not overstating this because gossip and doing this for almost 30 years now and I still am on site with clients who have had either really embarrassing or in the news or type incidents over something as simple as what is a customer and we work for a company and they actually sent renewal notices to prospects right of time to renew your product and because and then these people didn't have the product because they went to a database that was called customer but it was actually the prospecting database for sales and yes the word was called customer but you know and they are going to stop sales people from saying I'm going to visit a customer saying well no actually there are prospects right but you should know that when you're building a dashboard off it or or making a business decision often so do not skip the whole you know kind of business driver aspect of it. And there are different layers of data models or design, right one might be even it what are these even high level subject areas what do we even mean by a customer is that separate from prospects or is that separate from product at separate that are even at the conceptual model what do I mean by some of these core terms. On site today was something as simple as what do we mean by a year. And I know that can sound so silly but this took you know one teams effort of a week to consolidate reports because the data came in and different aspects of what a year was fiscal year versus calendar year versus, you know, so I think if you're used to doing data management you're always asking the why or what do you mean by that, you don't want to catch that later when your numbers are wrong. So that's really up at that high level logical levels where you're getting more rules around that more attributes more detail and then the physical, we'll talk a little bit more about that today. That's the actual if you're using a database those your tables right how do I actually structure that those might be my data types and all of that. Hang on I've seen horrible embarrassing things for companies at each level doing things wrong. Why was that a site that was a retail company. So they understand this one. Now that the DBA or the data engineer shorten the product code field from 10 characters to eight or 12 to 10 or something like that brought down their whole website and they couldn't sell product for two days. So we would think of shortening shortening something that important without checking for the impact analysis but again all of these models are crucially important, and they all kind of fit together. And neither one is inherently better nor worse than the other they have their different use cases right so I think we are yes I have weird data model correct. I think normal form right I think a lot of us in the call instead of understand that idea of third normal form, if normal form voice caught or whatever right. But really why do we do that that is really good and it's not going to go away has this use case it's not for everything. But for reducing redundancy increasing data quality ensuring consistency asset transactions across right so I want to know, I don't know. My customers my addresses right you don't want to store addresses in the customer table you want to separate that would make sure you buy clean addresses you can link together with address types and all of that like that's really good for things like master data. And some of your core dimensions and things like that for reporting doesn't go away. It's super valuable. It's core to a lot what we do but it is not everything. So, we also have the star schema, which is kind of your kind of looks like a star we'll talk more about that. It's really good for summarizing and slicing and dicing historical data over time, it can be a very way of doing that, but that isn't everything right it has a certain use case. Those are kind of some of the very common ones with enterprise reporting data that we use, but we also have. I know no sequel is very very broad and there's a lot of types of no sequel, but just think of say key value pairs. That's really great think of something for a website usage and you know websites hits by click or, you know, when I really am doing kind of that you know more operational real time or high volumes. Easily flexible for change, probably not great for a warehouse reporting over time. Doesn't mean it's a bad thing. Right. So, I think again I do hear a lot of folks saying, oh, I don't want to use this and we'll read something out. And I'm seeing this both ends right there's a lot of different use cases right and there's a whole lot more and they can't do all of them in this webinar XML has this place more of a hierarchical graph patterns. I'm a huge fan of what you know I want to see kind of connections between my customers can't do that very well until you do the relational one to really understand what a customer is right good old fast and cobalt copy books. I think the Wall Street Journal had a thing the day of, you know, so many companies are still running on things like cobalt. I wouldn't say doing too much net new and that you can't knock it still running right so, or S3 buckets right I'm just trying to kind of bucket data for storage and things like that data vault has its place right so there's a lot of ways to store it here some. And I think the main message of this is that no modeling technique is apparently better or worse than another. I think it's your use case and purpose and drivers in terms of what good looks like is how you choose either of them. And I would say, don't roll out quote good old fashioned stuff because it still works is still has this place is still running companies. It don't switch the modeling technique because it's hard or, you know, I want to get a quick win. But don't also get stuck in just one right. I am actually a fan of relational databases and I'm a fan of star skew is if you're only doing that in your organization I think you're also missing out on things right there is a case for a data link. There is a case for graph databases and can you do some proofs of content on on ways we working with one big. I can use my words it's a telco company over in the UK and they have all of these problems that they do need to clean up their data and the reporting, but they got some really great quick lens with using a graph database on their existing systems and getting some early insights. And then kind of, you know, moving in other directions as well, but graph did offer them some really great advice for their sales team. So, again, it's a combination just understand what you're using. For, right, because you don't necessarily want to use a graph for your operational accounting system. Right. It doesn't mean grab is bad. It's grab is use case. We're hand over. So is the star schema dead right you have so many options. Now, I mean some of the reason for a star schema was to help with with large volumes of data and you could be cynical and say well gosh we have data lakes and storage is so inexpensive and platforms can really scale and it's not like we can't get good performance on things. So, why would we even need a star schema I will I will put my foot down and say yes I do think there is still a use case for it and there's as many reasons for it. And I still see the use case so I do a lot of this I do this for a living. We work with dozens of companies each year and almost to a company. You're still saying a very calm almost the number one request from your your exactly your business people your operational leaders is I want to see historical data trending over time to slice and dice by year by region by product. From a single source of trusted data from all my relevant systems right and then I will hear that and whenever I'm hearing things like by year by region by product over time. And that same person will say but I don't want one of those old fashioned data warehouses I'm told I'm told by the vendors that those you know and uses those anymore. So sometimes when I'm cynical I say well we'll put a modern data warehouse out there right so yes there is modern technology we can go to the cloud there are performant ways. But there is still a place for that again when you're want to slice and dice by year by region by product from a single source of trusted data from all systems. That's kind of describing a data warehouse in a star scheme is a very good way to do that. So what is a star schema. Part of it even even performance aside even if you could flatten everything out and you could still get the same performance. That idea of kind of mental slicing and dicing we think of business users what they're probably living on now if they don't have a dashboard is a pivot table and Excel. And this is kind of a pivot table and steroids where you can just get that performance and ease of use. But if you're if you're new to a star schema and that's fine because that's why folks come to these webinars is to learn. Think of it in terms of facts and dimensions. So your facts are what they say. Those are really your facts are the things you're measuring the thing I like to think of on the things you're reporting on. What are we measuring it's it's your sales transactions it's visits for patients things you're counting. Often you don't have a lot of after descriptive attributes. It's just sort of your your numbers you know my my sales revenue and things like that and then links out the dimensions for you have there to tend to be have a lot of value. You know rose and not necessarily always a lot of columns and then your dimensions. That's where you're going to get the detail you're going to say I want to know revenue. So again when I hear a lot of those buys. To me those are your dimensions things you want to slice and dice, port by report on however you use that in your English. So by date region quarter month sales rep product right those are often your master data domains right when we're talking about those different patterns right because not everything lives in the stars schema right so you may have a master data in your customer or your product which kind of feeds the would say conform to dimension right that I when I have my customer dimension is conformed I know that this is my single view of my customers of my products that I can then slice and dice so often is your and master data that can feed those dimensions. So, that would be a good use case of relational and dimensional, fitting together in the same world. Because this would support back to our ACME co who's selling widgets. If this isn't obvious to you and this is new maybe this type of name visualization helped me when I was learning it. So your facts are the things you're reporting I want to know total revenue by year that total revenue are going to come from your, your kind of your facts and then the things you're slicing and dicing by by year by region by product are your points of that star that you can easily slice and dice by it's nice because when you have some of these bi tools that are super easier to use that becomes your cube maybe your your semantic layer. And it's a really nice way for self service and it's a really nice way for people who are used to something like a pivot table and a spreadsheet can take something like these bi tools with some sort of semantic layer where we know what the definition of customer is and product and sales rep that's all been nice put together in your buffet and people can slice and dice by it. So that is one of the positive use cases of a star schema and see a lot and they and you're going to have many stars. So the nice thing is as we build these over time, I've defined through MDM or reference data what my customer is or what my regions are. And then you can reuse I might might want to not only revenue by region but employees by region or customer visits by region or patients by region right so all of these things can be reused. And that's what we call can call them conformed dimension because that's the benefit of raising these over time. One of the tools again kind of a little bit goody we use all the time and this is a simplistic version of it is a bus matrix. Kind of based off a kind of a bus architectures not business matrix a lot of folks that I don't really care that's just a little nerd trivia. But these will be all the things again that I want to report. Oh, I have it backwards report on which would be your sales revenue and, and then report by which is your dimensions and I'll fix this and send it out good data quality is important. I want to report on revenue or number of returned items by region by sales or by product by customer. And when we build these out we also have a definition field and a calculation field and a grain. I want to report on total sales revenue in US dollars by day. And this is how I calculate revenues. Total sales revenues only retail and wholesale revenues a different calculation and we're agreeing that all the regions do it in a simple way we often send these out to business users it's a really great way to describe a star schema. To a business people because they get spreadsheets they understand okay I understand total sales revenue I understand how I'm going to calculate it. And then for the data architect can easily take this and kind of make that dimensional model. Okay, my facts are my sales revenue, my dimensions your region sales products like that. So, kind of an oldie but goodie and kind of a nice way for kind of your business analyst in your data architect and your business to collaborate because remember that quote earlier on the webinar. The best use cases of an architecture is collaboration. To me, it's just kind of fits together there's the why of why we want to do this. There's the high level what and how which is your bus matrix and then that turns into your dimensional model, which is going to help all of those together in your glossary in your governance and all that together that are going to build something like a nice clean dashboard where when I'm reporting to the CEO on total revenue by region, we are all confident in that. And the beauty of it at the end of the month we just press the button, or it's already there and we don't have to spend time cleaning it up or arguing it over and what do we even mean by a widget or a region or a revenue. I can't be overstated once it works you may be rolling your eyes and saying really we have to describe this, but I would say 80% of the companies. I know I'm biased because folks are bringing this in because they have a challenge, but this isn't just everybody's doing this and it all works. There's a lot of work to get here. And if you're lucky enough to be one of those companies to say gosh folks don't have this, just feel very lucky thank your data architect this would be a bumper sticker. Thank you data architect today when you can report on revenue by region easily. That's really probably your relational and dimensional models helping with that with your governance. But again, there's several design patterns even within a data warehouse that the battle still rages. Am I in my Kimball my relational am I dimensional. Yes, both and I think we had some good use cases of where relational model and where a dimensional model can fit. Is that a data warehouse and a data lake. I mean, I'm sorry data warehouse or a data mart. Yes. Right. This probably good case for a mart. Please do the design first so that when we have a mart and we have a region it's coming off a same design and table of what we mean by region. There's data vault, which again could be a whole webinar. A lot of folks a lot of folks in Europe in particular kind of having success on this kind of a flexible architecture of hubs links and satellites. Maybe I don't know all those defined business rules up front or want to have a flexible way to design those good way to do that. Column column column database with again that flexible material pattern where you kind of flip it up your columns and rows, or, you know, flatten everything. And that's not a bad thing you may want to flatten that from a relational database I had a kind of a previous customer kind of write to me and say we're having this big argument I'm asking folks to flatten things because I need it for my kind of SAS report from right here she was the data scientist and they're telling me I can't I should have it relational and I said it's a classic both and yet you should have your relational that to make sure. So again it depends on the use case I would not say flat and everything out to one big table. If you're trying to slice on sales by revenue by region right but once you've gotten that and you may want to flatten some of those dimensions or flatten things out of your your analysis that's a good thing to do it just has its use case right now and I mentioned to graph it's a great way to discover discover connections across these patterns and more and you'll probably some of the comments and I haven't looked at them yet here is that what about what about yep probably right there's a whole lot more this is just an example of ways to think about things and again I always think in zones or patterns based on the use case right I may have my relational zone to make sure the data is clean and consistent for master reference or even some warehouse data my slice and dice over time zone which might be my dimensional model I might have my flattened out everything zone for my data science exploration I may have my discovery zone with graph to see the connections etc etc again or I may have my raw streaming zone for my sensor data for my machines. Then alike right so they all can fit together just think of what goes where and this is a horrible messy I chart but probably is not too unrealistic from a real world so great I often break out the zones into these patterns as well so you have your operational usage. I might have an operational system my accounting system my people suffer my sdp right is running on a relational database for its operational use case excellent use case for that I would not put my operational system. I can use my words a dimensional model that wouldn't make sense right that but that is a perfect use case for that I might have a web app that wants more of a key value pair for session information etc so there's operational patterns. There's me I want to kind of move from left to right from operational to reporting. So on the right for consuming data for analytics I might want that in a nice to slice and dice for reporting, I might want to flattened out for analytics, I might want time sequence data for you know some of that longitudinal analysis. And then how do you get it there right so there's there's zones across I might want to transfer the data with Jason or XML or. You know this is this patterns for that I may want to have kind of a subzone at the bottom there to keep a certain golden set of data which can be your master data your hierarchies your data quality your your reference data what's the single list of countries or what am I. I might want to have a single view of customer development hierarchies right that then can feed either your operational data that if I have a single view of customer why aren't we using in every system. And then also your think of your slicing and dicing your conform dimensions and in your warehouse right so even your storage and analytics from reporting which is kind of that middle zone. I might have some relational models I can have my slicing and dicing model I can have my data mall model. I might want to put all of these ad hoc randomly here I what I didn't put here is real time streaming data which is another common use case that could then go to some real time reporting. You may do some reporting or against those operational systems out to your reporting right. None of these is necessarily inherently wrong, just do it mindfully right there is a use case I might want to drop a tableau or power BI onto my system for some use cases you may want to take that out and put it into an ODS for performance. But as long as you're there is a good use case for it, and you've thought it through and you understand all the understanding around that is kind of the point there. So, and summary, as we kind of walk through and I did want to leave some time for QA because there's always a lot and I'm always running short so I did not want to do that today. So, analytics and reporting your key priorities, you do need negative data architecture behind it. And there's a whole lot of choices out there but in those choices, just choose wisely isn't that a movie quote right so the core fundamentals still apply there's different storage patterns there's different architectural patterns and you really want to kind of think, and it's most likely thinking back to this a combination if you're only using one, I would think think more broadly. You're stuck in a rut, think more broadly but don't also just jump to the new shiny thing and throw away some of the fundamentals that, and I would say a star scheme is still a fundamental does it fit every use case absolutely I would not put my necessarily, I don't know my real time streaming data to feed a real time operational dashboard in the star schema. That wouldn't make sense to me but if I want to report some of that over time slicing and dicing patterns that I would. That widely do a lot of research before you put that through because each one of these as you folks know on the call many of you, each one of these takes a whole lot of time and effort. Please also do not just do platform shopping. And I think I mentioned that the beginning of a we did all this we built the star schema and you know platform a that didn't work was to a different platform that may well be the case there are pros and cons this pricing for different platforms. I just see too much had nothing to do with the platform to do with how the data was modeled on that platform and I know, often from a business perspective business users don't necessarily get that why would they I think tools like data models and bus matrices and glossaries and things like that can help with that, but they don't really need to know that that's what architects are for, but you should be thinking of that or the right pattern is for your use case. So before I open it up for questions. Do think of joining us online for the next one which will go more into data modeling but more from that business side. If you are at DJI cube both Shannon and I will be there so we'd like to shake your hand and say hello. And blatant plug we do this for a living so if you need help, think of us my emails on the second slide, and I will open it up for Shannon for q amp a. Anna, thank you so much for another amazing presentation such a great topic. And if you have questions for Donna feel free to put them in the q amp a panel and just a reminder I will send intense the most commonly asked questions I will send a follow up email by end of day Monday for this webinar to all registrants with links to the slides and the recording and anything else requested. I'm coming in here Donna. When modeling for relational or document databases, are there any modeling choices that make the new database more friendly towards eventual residency in a data warehouse. For relational or document databases, I did not mention document databases, but those are really great as well. Actually storing some of the documents in the data store. I think that great I do think. I really think of kind of a pattern of Ron landing raw and rock and be in a lot of different formats I do tend towards really I think document databases can be really great when there is kind of document linking some of those document stores with that but I do I kind of lend towards either a lake for some of the raw dating there's also a question there about the streaming. That's a great use case for a leg or I do think relational kind of wins out. When you're trying to get that consistency or kind of that single version of the truth. I have seen some. And maybe I'm wrong here I did see a customer do that with a graph, which really surprised me but even with the graph they use that for exploration and ended up putting it in a relational at the end. But yeah, I think maybe I answer that maybe I didn't. But yeah, there you go. Thank you to add any additional comments on that and there's a there's a comment that came in the chat earlier I just want to make sure and get to it and you know, if an organization already has data all over the place where do you start when trying to get things better organized. Oh, that's probably every organization they have it all over the place. Well, one is just creating an accurate. We often do a kind of that it was an ugly picture that I showed you can go back to it if I can move my own slides. I would correctly document kind of the before and after. This was is a hard don't ever show me if you're on my team this type of architecture diagram is terrible it's a cartoon version right but what would be the actual real world architecture and make it messy right I have 1000 source systems and I 40,000 spreadsheet sheets and then I have this six data warehouses that were built over time and whatever we really have the realistic world you know where I where we are now do a do a ideal state where we want to get to because you have to be realistic and then you also do the business prioritization which may maybe next month's webinar will help with that a little bit more of the why because you cannot boil the ocean. I think a data model is a great use case we're actually we're doing it on site this week of they actually are that environment I just described, maybe not thousands but hundreds of systems. And then what data across those can be that can often help a data model. G we have 1000 use cases and 800 systems and but we're all converting around product data and product by location. So we've got a data centric view, and maybe start small and build over time so I use the word quick when it's not a quick fix but knowing where I went ahead with a good data model or data architecture and then how do we build out the components wisely. There's a good way to think so think of and then who cares about it right so we could, I don't know. Chair storage by region and nobody cares about that but if we're picking revenue, you know product by region or something that's going to affect a lot of the different end users because you're also trying to get buy into this approach and and these things aren't cheap. So, so thinking of who why and then how and be realistic, because you also don't want to over promise this isn't just a, you know, you can do some pilots and things, but but this isn't a commitment and make sure you have the right research. So many great questions coming in and and you kind of touch that's the streaming question I should have brought that up earlier but you just mentioned that you not put real time streaming in star schema can you just go over that more slowly with pros and cons for better understanding. Yes, and I apologize I'm a fast talker and I will never be cured of it. I will try. Well, again, back to your use case so I have I'm trying to understand my, my, my trucks that are out on the field, delivering my widgets, right. I want to know real time where they are kind of build a ways right or I want to who's speeding that would be a real time streaming or maybe a dashboard off that that to me is not a star schema, because you really want to see it real time I want to know that Joe and Mary are both speeding and they're both in North, you know, North Chicago right now and like that's where my product is right that would be in terms of those patterns and that I will keep going back to the ugly architecture diagram but that would be a real time streaming pattern it was still go from I was kind of think left right from source which is your truck to some sort of data lake to maybe a power be a dashboard or a tableau or one or the other. That's because you really want that real time and a warehouse star scheme is not real time I might want to then say what are speeding patterns over time by driver by truck by city, we speed more in Chicago and then Boston, I made that up. And then if you're from Chicago Boston right, but then you might take some of that data and start it out and track it over time so maybe that's a good use case one really is real time where my truck is, and then over time, what kind of drivers over time speed more and where I went. That'll help me. So for dimensional modeling, if a source system has a number of dimensional attributes that are just a code and a description with no other value data points is your preference to put them into the fact create a dimension for each try to group them together in a junk dimension or something else. Oh, that's the classic question. Yeah, I wouldn't. I think sometimes we architects can kind of overdo it so that's why I think there's a nice mix of master and reference data where you kind of clean up that it makes for those are consistent that it can feed in your dimension so I would group. I would kind of lend towards grouping it together in this performance right so you kind of do flatten out a little bit more or it might just. Oh, so the use case might define it maybe I'm I'm. I'm rambling here. I'm trying to say, I don't know. Oh, I'm back to my trucking example they have a call station that they have to go back to in their region but all call stations always in one region you just put it in the call state right or there's a, you know, a manager for that call station. Yeah, maybe correctly you would have a separate, you know, table for manager in this case it really is just an attribute of that call station or something that that is kind of that design decision and there's no one answer but whoever asked it is thinking along the right lines. Right, if I'm really do need to snowflake it out or how many attributes are on that thing. It's almost your, your, your relational model just isn't described is. Is it describing the thing, or do I need another table for it that really is sometimes based on the use case and I think you can be a little more flexible in the dimensional it doesn't have to be as that's maybe, maybe the correct is in the reference data set but when I'm dimensioning it out for reporting. I might must those together a little bit because that's the reporting requirement. Right and that's where it's a little bit of an art in the science just make sure you're not doing it to be lazy you're doing it to really get the requirement and don't. I tend to overdo it sometimes well technically the manager of that station is a different thing, but it might be, you know, it's always only one manager of the station and it's really just an attribute of the station. Okay, let's not overdo it. Right, let's motion together. Okay, sorry, hopefully that helped. I love it. And so, I'm using an old system that is very row oriented. We need to govern column or parquet files having multiple instances differentiated by dates and their names. Older repositories and people don't think that way is their way to shoehorn parquet files into an older metadata repository like this. Yeah, I'm thinking about that one. I don't think I can say something intelligent quickly on that I would say if that person must email me I could maybe I feel like I need to think that one through a little bit I don't have a quick answer. Sorry. That's a hard one. You are not alone. I don't want to just make up something and ramble because I could do that so but yeah I would that's a good question I'd like to think that through so get me offline. Yeah. So, moving on to the next question here is Donna is data mesh is a new pattern emerging and being implemented in different systems. What are your thoughts on how that fits into this design pattern. We did do a data mesh earlier in the year. If you want to catch that so I think is mesh and this fabric. I mean a mesh could be if we let's go back to my ugly diagram. I would say mesh tends to be for more analytics. Right. And think of an owner for my quote domain that might be your data march your data warehouse right a good use case. For example, we work with a university. And universities tend to be siloed in some good ways right the physics department is going to have their data marked. Totally different than the finance department right so but there's a use case to maybe go so they're kind of the. There may be a mark for each node of the mesh right and they are sort of the owners of my. Mart might be one if we're kind of thinking dimensional right I do think there is a place in the mass that is centralized and that might be your master so thinking that university. We should have a single view of our students Jane Smith, who's a physics major is still Jane fix Smith who's a physics major. When I'm in my note of the mess that's my mark or my whatever you know, however they're storing that to know what am I grades over time for physics. That's kind of the domain or the ownership of the governance for physics, but have I paid my tuition and do I need to get alone and can I be helped out by finance. That's their domain. Right so this kind of the central hub of standards, which is your master what is it what is a student is she a part time student is she a returning student is she a full this year master's degree student. That's kind of your mat your hub in the center and then each of the nodes of that mesh or the governance and the ownership would kind of be so it's almost like this. Think of that middle there that analytics were actually any of these could each be a note of the mess when you think about it so the physics department would still have their warehouse and their cubes that own it and then maybe that master data and the bottom would be the things that we have to functionally manage. We've got six minutes left I'm going to try and get to as many questions as possible here. So don't know what methodology would you recommend to design model a technology agnostic data architecture, once we've identified the business use cases, or what resources would you recommend. So I would say starting, starting with a full on what are the different use case patterns, right, and then even something like this for, you know, you use case to method, right, I'm going to historical data over time. That would be my warehouse real time data streaming from my sensors for my equipment. That's going to be a real time pattern, right and then each one of those would go if we think back to this is almost like the, the what is it like what what zone do I go into, and then back to that pyramid of, you know, logical. I can't get there quickly. I think everyone dizzy this one. You know what then then there's the design of that zone itself. So far more warehouse than I have to design my star schema. If I'm streaming I need to design how that lands in the lake and how I segment the different like patterns or if it's a, it's a master date around that I would need to have a relational model of the what and the why and what pattern that relates to how that fits together in the system architecture. And then within each of those zones, how do I model the data itself, like, is that a, is that XML is it graph is it. All of that. There's my answer. Hopefully that helped. I like it. So, I'm done and can you please talk to more about normalizing data from different data sources and tools. So that would be in one way just a common way to do that is with your normal form, you know, relational model. I think going back to my ugly architecture diagram, which I should make prettier if I'm actually going to use it. So you do have all of these systems right across this and that's where that conceptual talk about a little bit that next month, even a conceptual model of what is a location. Right was a region so location might be why several locations one is my store location one of my region and one is my GPS coordinates right so that once you even understand that a business look what those are. Often at this and I didn't put this isn't a true architecture right you often have kind of that. So either in your, your, what's this your master reference data, you could say this is my common set of locations or regions, sometimes some of that's done in your landing or staging. Sorry staging where you kind of get together even just to sell the data types are the same for location. So that is kind of either kind of the light version of that is I stage it so at least all the data types are the same. And especially for master and reference it kind of lives in its own zone, which is this is my hierarchy of locations and this is my common list of codes and these are my common list of customers. I, I tend to think of that's more relational pattern, but I mean to me it's more, are we getting the rules around it hierarchies can also kind of be done there XML could be done there, but I tend to do relational. I think that's what some of the hardest part of all of this and that's where the governance and everything. Once you get those nice building blocks if you use to throw those into a dimensional or into a graph or it flatten them out but getting those core definitions I think that's a lot of the hard stuff. People that people in discussions right not just detecting. Yeah, that is most most often the hardest thing. For sure. So don't know we've got an operational system and analytic reporting systems. If we use a data lake as a staging area for accumulating data in source system form. How can I get create a catalog that makes it easy to discover data in the lake. Say no to data swamps. Yeah, no, I think that's a valid. I kind of do. I would call that maybe a landing, but whatever semantics staging I think you kind of manipulate a little bit, but say you want it's raw form. We're actually doing a project right now and that that can have it's just use case I just want the people's off data and the people's off format right. So I think what you can do is that it's almost your data dictionary. It's this is what the data is these are the tables that make sense. And this is the definition this is because there are use cases that people just want to go back to source I don't want to go to the system itself. So I think creating a catalog of where those are stored and what the different data types and the meaning is I think is a great way just to make that accessible and that everyone knows what that data means and how it's been updated and things. So kind of a roadmap to it so it doesn't become a swamp. If that's how you're using it just be. Is it just raw your dumping stuff or I actually am putting it here so that people can find it later and often the data catalog can help with that. Perfect. Well, Donna, I'm afraid that is all the time that we have for this session. So many great questions. Thanks to all of our 10 days who have joined us throughout and for all the engagement. I just love that. And just a reminder, I will send a follow up email by end of day Monday with links to the slides and links to the recording to everyone. Donna, thank you so much. Thank you always enjoy these. Thank you.