 Hello and welcome. My name is Shannon Camp and I'm the Chief Digital Officer for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Data Modeling Fundamentals, and it is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. For questions, we will be collecting them by the Q&A section. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, the Zoom chat defaults to send to just the panelists, but you may absolutely change that to network with everyone. To open the Q&A or the chat panel, you may find those icons in the bottom of your screen for those features. And to share highlights on your favorite social media platform, you may use hashtag Data Ed. And to answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days, links to the slides. And yes, we are recording and will likewise send a link of the recording of this session, as well as any additional information requested throughout the webinar. Now let me introduce to our speaker for today, Dr. Peter Akin. Peter is an acknowledged data management authority and associate professor at Virginia Commonwealth University, president of DAMA International, and associate director of the MIT International Society of Chief Data Officers. For more than 35 years, Peter has learned from working with hundreds of data management practices in 30 countries, including some of the world's most important. Among his 12 books are many first, starting before Google, before data was big, and before Data Science, Peter has founded several organizations that have helped more than 200 organizations leverage data-specific savings, which have been measured at more than 1.5 billion US dollars. His latest is anything awesome. And with that, I will turn it over to Peter to get his presentation started. Hello and welcome. And welcome to you, Shannon, and welcome to everybody else. Happy Valentine's Day in case you have forgotten about that. Just one of those things that you don't want to forget about it. It's on your radar there. Welcome. It's a beautiful sunny day here in Richmond, Virginia. I hope it's nice to wherever you all are. But let's dive in and talk about data modeling fundamentals at this point. And I'm going to use these two icons at the top, the data model on the right and the architecture on the left here to illustrate a couple of points here. So those will be sort of our themes. In addition to that, the real purpose of data models is to facilitate understanding. And we want to do this among three particular groups of people. We want to understand business users, technical personnel, and the systems themselves all speaking the same language. If they don't, we end up with nothing but confusion. In addition to that, the only way that we can achieve this goal is by using something we call a trusted catalog. Again, you've probably heard it called a business glossary or data dictionary or any other number of things that come out and talk about them. So the idea here is to have everybody speaking the same vocabulary. The reason that is critically important is because most people don't know much about or care much about data. And I'm going to go into a quick little joke here just to show you how disdainful most people are about this. The setup for this next slide here is that I like to pretend that I gave Seth Meyers my business card at one point in time and he had it and made a joke out of it. So here's the way it works. Oh, there we go. That's the problem there. Business cards. Really? We're still doing this? I hope your business is waste management because this is going right in the garbage. Given someone a business card in 2021 is basically steampunk. Great. I'll give you a call when I need my cotton gin repaired. Thanks for the business card. It's a great way to be sure I'll remember you in six months when I'm cleaning out my wallet. Dinner receipt. Dinner receipt. All right. This. Call me if you ever need data solutions. And what are those? You have to call me to find out. The only thing business cards are good for is to put in that fish bowl at the diner to see if you can win a free Rubin. Hey, business cards get bent, you burn. Now, aside from the Seth Meyers aspects of it, data solutions, what is that, right? And most people really don't have a clue. Well, modeling fundamentals is where we have to start out. So we're going to do this in three big parts. The first one is what is data modeling required for. We'll talk about the very specifics and we'll move on to why is data modeling required for understanding. And of course, the answer to that is if you don't have a data model, you truly do not understand the way it needs to be understood. And then how to use data models effectively. And then of course we get to the fun part where Shannon and I get to parse your questions and see if you guys can come up with some other topics that we're going to be looking towards to get on this. So let's just jump right in. What is data modeling required for? We're going to talk about increasing understandings between systems and humans as I've already referenced. Also for precisely defining the data, achieving simplification goals and getting to focused points of agreement. Finally, understanding, building and deploying stable business models is a part of strategy. So again, let's just jump right in here and talk about what do we mean to talk about data and information and then knowledge and this is you'll see this as a traditionally set up as a pyramid of things, but I'm just going to throw out a number here 42. And to help illustrate the concept on it. Now, if you've ever read the hitchhiker's guide to the galaxy, you know that 42 is the answer to life, the universe and everything. And if you don't happen to be a fan of science fiction, you might think that 42 is Jackie Robinson's Jersey number and a wonderful movie about his life, or if you really want to get picky, you can say is Peter old enough to purchase adult beverages in the Commonwealth of Virginia. Well, it turns out that 42 was my age 22 years ago so yes you can do the math and figure out that I am in there what I've done of course for all of you is to associate three different meanings and probably many many more with the number 42. That combination of a fact and a meaning is what we mean by data. Technically we call it a date, um, but nobody uses that terminology anymore so just allude to it in there. And in addition to that we don't just want all data we want only useful data so that we can actually understand the things that we need to understand and avoid information or data overload that happens in many instances. If we want to distinguish between that, excuse me and information. The difference between it is that data exists and information exists when somebody asks for it, whether it's in the form of a report, or any number of other configurations that you managed to put together. This is how data is distinguished from information somebody has asked for it. Now of course you can all get this point pretty easily you can have data without information but you cannot have information without data. And the wonderful thing is that this diagram and explanation basically shuts down every instance of people saying well I'd like to manage my information separately from my data. And I show them something like this and they go well there's not much point and that is that in the answer of course no there is not. We want to get to the highest level of the pyramid is either knowledge wisdom or intelligence depending on what era you see these from. And the question is how is that information that has been requested from the data. Put to strategic use and so strategic use is what differentiates information from intelligence so that we can start to understand these things in very good terms. I'm showing you here a poorly constructed data model it's not a rigorous data model but at least gives you the information in terms of the things what they are and how they're connected to each other. I'm going to start to find data information and intelligence here for you now I'm going to show you unfortunately how most organizations evolve which is over time and kind of ad hoc. Again your payroll may have its own data set your finance your manufacturing they only have their own clusters of data, and this looks pretty good and made good sense as we were starting it and by the way we still build systems this way of course. The Salesforce would look like if they had thought about architecting all those pieces into it in the first place, instead of putting them in as a bunch of add ons around the whole thing for those of you that know the Salesforce platform. So this works great until somebody says well I need to get some data that I need to integrate between the payroll system and the manufacturing data. It seems very quickly a Gordian's not where it's just very difficult to get information from one place to another to integrate it. If we want to fix this on an organizational basis, we actually have to take a step back from it understand what it is that we have you'll see this at the And then integrate moving forward so that we can start moving data into a generalized more useful facility, we'll call it organizational data at this point, and then start to re architect around all of that. The data model is of course an essential component in order to do this. On the other hand unfortunately many organizations do attempt to do this without using formal methods around that and I don't mean like you know advanced mathematics I'm just talking about making a map of what the data actually represents on this. The reason this is important if you think back to where before I re architected this data and everything was kind of connected to everything else. It's kind of the situation that most organizations are facing. If your application if your organization only has six applications but you wanted everything tied to everything else. No reason you shouldn't want that by the way if that's a business goal that you're having your organization. The question of what is the upward theoretical complexity and the answer is. 15 interfaces will permit us to connect everything to everything else. That's wonderful. That's for six of these interfaces. Of course that is a challenge, and I'll just give you a comparison point the World Bank of Canada. Many moons ago told me that I could use their number they had 200 major applications at the time, which led them to close to 5000 interfaces that sounds like a lot of complexity to look at well let's take a look and see how that actually works out. If I plot the number of interactions potential upward theoretical complexity on the vertical axis here, the y axis, and then plot the number of systems on the x axis that you see across the bottom there in the blue line. You'll notice that there's a huge increase at the rate of increases. So it's getting faster at an increasing rate is the key for this. And if we put over here. Again, if I have 600 applications and I've worked for some. I'll use an example later on today where an organization has 100,000 applications not 600,000 but excuse me 600. And if the number is just 600 to 180,000 interfaces that could happen here. Well, again, let's just plot out the Royal Bank of Canada's numbers and you can see there that they actually compare somewhat favorably compared to how they could on this again I mentioned 200. So that would be roughly 20,000 potential interfaces and they've only got 5000 to deal with. Well, while that may be some nice solace in this, what we're looking at is we need to transform this process of point to point application integration into something that looks more like a spoke and hub model. And of course that thing that is at the center of the hub is literally a data model. Do a quick definition around this data models capture physical system requirements typically we can also incorporate conceptual and logical requirements as well. So we have a definition here that we use an organized purposeful structure regarding as a whole and consisting of interrelated interdependent elements around that represents the lowest level of decomposition available. When we're looking at systems recall that systems by definition consist of people process hardware software and data, of course data is the only one that's increasing at an increasing rate. The data components are the most stable pieces over time. If you didn't catch that little visual transition there, most organizations tend to focus on the software now we call them apps. Okay, but it becomes something that the data serves and that's of course exactly backwards from the way things should be, but that these data models can incorporate organizational business rules within there. And for example there may be a question of in some context kind of project be owned by more than one department. Well the answer to that not at all material to our discussion here today might have severe implications for the organization. If it isn't correct at the data model representation that all other interpretations must also be suspect. What we need to do with data models is to minimize the number of connections that we have to have as few hubs as possible. If we have one hub for example we have a single point of failure that may be an unacceptable risk for organizations because things are easier to steal, but on the other hand they're easier to manage. This is not a course in risk analysis so you don't have to do that calculation but somebody will at some point in the data models I've said before is the heart of every hub it's the only reliable means of conveying the enormous information that's required to run these hubs for your various organizations and objectively use standards within and across organizations. This data model can always be inferred or determined objectively so somebody might say I don't have it anymore and therefore we can't use it. No, any data set that's out there we can reverse engineer. And with the right set of technologies, we can actually come up with a logical third normal form, taking a look at this. But the challenge is that data modeling is not considered a necessary it skill, which means that both business and it decision makers are unknowledgeable about these decisions. If IT tends to assign responsibility for the data to the business. I know I would if I was in charge of it I'd push it back to the business say it's your problem right, but on the other hand the business looks around and says, you've got a job title they're called chief information officer. What else would they be doing except looking at my business the answer is CIOs are talented individuals who have an enormous range of responsibilities of which data is only one. We've gone over the last couple of years and increase in the number of chief data officers that are out there. Unfortunately though because of this confusion we have an enormous amount of data debt that has resulted from all of these things. When we take a look at today's modern IT environment everybody is in some form of moving to the cloud now let's forget about the fact that not a single corporation or organization on planet Earth has ever saved money by moving to the cloud. There are good reasons to move to the cloud saving money is absolutely not one of them and we see this happen over and over again. We see organizations where they've got well I've got some applications there, and I've got maybe two or multiple cloud platforms oh my goodness and I've still got some stuff on Prem. So we can see the data models map across and between software packages in and out of the cloud and often are the golden source of information about whatever integration is your data model at this level. That's really an information architecture component is the only thing that ties these various Prem and cloud based apps together. So let's be a little more formal about this again. The model is a representation of something that exists it's a pattern for something to be made. Again, if I give you a model and say build me one of these you can do it oftentimes. It can contain one or more diagrams and also has to have of course the definitions using standard symbols so that one can understand the content and you'll see in a second here this is not rocket science but it is standard blueprint type of method and term that we use around this. So data model is the analysis and design method to you define and analyze data requirements. If it's not in the data model it's not going to be in the database and if it's not in the database is not in the system. The data model is an integrated collection of specs and diagrams that represent both the requirements and the design, at the same time just in swing ways the architecture is the thing that results from the process of architecting. Again, I mentioned it employ standards some symbology and things like that data modeling then is the process of uncovering and analyzing the various scoops that we have in there. So I said scoping the data model requirements around that is used to design the data structures that support specific specifications. Thus, shareable data represents the things that we want to use about sharing our information. If we're going to share it internally or externally the data model is required in that and using mutually understood definitions. If I travel around the world, I see data models in many, many different languages but I know what they all mean because we use standard definitions. This also gives us the opportunity to start introducing the concept of standard enterprise data in the organization. And if we support these models these are the skeleton of the business architecture for those of you that are gardeners these would be the bones of the garden that we would have in order to do this, the data model the data itself stays stable in organizations, and that's been the wonderful thing about seeing this happen over and over again and you know 40 plus year career here. These same companies are still dealing with the same basic elements data elements that they were 40 years ago and I absolutely convinced that the data will stay the same. This is required as a condition of deployment. I can't tell you how many projects I've gone to where they're spending money customizing software but they haven't yet got a data model and the people who are doing the software conversion or just dying to have some guidance in order to do this we've got to use these models so that we can use them in the basic components of the system pieces, and most importantly they're often missing from these projects so this is a very big hole in our learning about all of these data models, as I mentioned is very purposeful and one of the purposes is to support strategy. So if you have a choice of building something that is more rigid or more flexible, the data modeler would tend to go for the more flexible and adaptable data structure given a choice, because that results in cleaner and less complex code, and helps to ensure that the strategy meets the objectives that are stated in this, we can build in future capabilities. A brief example of this if you're doing a data model and somebody says hey, is it possible that you're going to do it in a language other than your native language for whatever country that you happen to be in. And if you say oh yes I'd like to bring that out, you want to build that into your data model at the core of the product as opposed to building your product, and then trying to add it afterwards just as adding security afterwards adding quality afterwards, it just doesn't work. So, one final piece on the strategy is that if you're looking for merger and acquisition strategies. This data model can be one of the most helpful diagrams in diagnosing the potential efficacy of a merger and an acquisition situation in here. A brief example here this is one from a company that I worked for at one point in time, where I was a manager working in a organization, you can see there's a manager and a manager type diagram up there and I can be a line manager or staff manager against not a very good data model, but the important point here I've noted in blue is that they made managers separate from sales people and yet we went through one of these periodic recessions or belt tightening. And one of those things it seems to never end as you get older there's always the next one around the corner, and they said all managers must sell and we said great no problem we want to keep our jobs and we'll be happy to sell, but we're not set up like sales people were set up like managers this was a very poorly thought out data model for this particular situation and this decision haunted the organization, because people were always able to say well yes if you built the system correctly, we could report our managerial sales around all of that. So my first chunk here real quick, you know what is a data model required for again, we want to increase the understanding we want to precisely define the data because the data at the most granular level has to be precisely defined at that level if it is squishy at that big problem. It can help you with simplification, it can focus specific attention on points that you need to agree on in order to move forward and help to understand, build and deploy modern business strategy using data models. So let's next section here why is it required for data understanding. Well, we'll dive into it. What is a data model. Right. Well, there's a wonderful one from Ellen got Steiner who did a great job of just illustrating how crazy models can be around this and literally anything in this diagram can be used as a model and is useful for it but here's a more example, more specific example in this if I was going to try to explain to you the behavior of this model that you're about ready to see here. It's not going to be the easiest thing to understand put a little music in the background hopefully it's not too loud around all this, but the models can represent expertise they can store and formalize information. They can filter out extra detail they define an essential set of information they help understand complex behavior they monitor and predict system conditions. We can communicate more effectively between novices and users business and technical people. The list goes on human and software we can streamline the documentation monitor and predict the systems. Look at the process of interacting with the model and that can help us to understand what we're trying to do in the larger term. Evaluate various scenarios indicated by the model understand behaviors illustrate complex patterns and meta patterns that are involved in all of this. And of course you've seen by looking at this picture that those things do go through a repetitive pattern. Here we go we're getting ready to do it again well we won't worry with the model the second time but you can see models are extremely useful ways of conveying this information and if somebody asks you the question why would you do the model. So would you build a house without least looking at it. Would you look at the model to figure out how much it was going to cost. If somebody told me they were going to make a house out of snow and I live at the equator that's probably not going to be a good way to do it if I hired contractors from all over the world would you like them to have a common language. Again that is the model that we use would you like to verify proposals you can test the model before you spend thousands of hours or dollars. Building something physical if it was a good model would you like to do it again. Absolutely that becomes the documentation for going forward and finally for going to do any diagnostics. Around here the data model is the piece that makes it easier for us to support and maintain all of this so data models exist whether you like them or not all systems have data models the question is if you don't understand the data model, you can't make use of it. So the documentation is absolutely key in this context. Another thing to think about from a data modeling perspective is the old story of the princess on the P by Hans Christian Anderson on here again are peas right down there at the bottom and the princess is sleepless at the top and the reason for that is because flaws and data models lock in imperfections for the life of the application. It is unbelievable how many bad data models are implemented in production that people know things about oh yeah when you do this you have to kind of like the Oracle site that says what are all the errors in the Oracle manuals you can go find the the errata for that in one specific piece. It also if you do a poor job on the data model restricts additional investment benefits and leverage that you can have data bad. Sorry, let me try that again bad data modeling accounts for 20 to 40% of it budgets where you're spending time migrating converting and improving data and if you know anything about the size of your it budget you know it's constantly under pressure to shrink. Instead of growing so this is an area that you can actually make friends with people on lack of good data capabilities models. Causes everything else to take longer costs more deliver less and present greater risk and we always thank Tom DeMarco for that wonderful articulation of that. See, I mentioned data debt before it's the time and effort that's going to take you to undo some of the bad practices that have been happening we call it getting back to zero. It doesn't actually buy you anything in the business but it helps remove things that are uncomfortable. Think of it as a dissatisfier. You're going to have to learn some new skills there as an organization and a zero scratch start means that there's got to be an annual proof of value and now you get to do both of these things simultaneously. There's very little guidance around this and even less at getting back to zero around this but yet that again we need to pay attention and say data debt proactively helps us eliminate the slowness improves the quality decreases the cost and decreases the risk that's associated with it. But data debt isn't as visible as everything else you can't visualize the cost of data debt in a very similar way in order to do that. Now I will give you a very specific example of data debt. This was a group that we were working with at one point in time and they had a query. You can see the query is pretty complex. It's nothing in specific but this group for whatever reason had never learned about query optimization. So the same query can be refactored to look like something less. Again, I'm not showing you any of the details but I will tell you that this particular query ran a billion times a day at this organization. And so while people like to say well that's death by a thousand cuts, technically it's not really it's a bad word that we use what we should say is it's an unnecessary discomfort from lots of small paper cuts that we get all over the place. So data modeling helps us to discover analyze and scope requirements and we're going to go through these in a little bit more detail it helps us represent the communication requirements in that precise model that we do. And it's an iterative process that may include conceptual logical and physical models on this. Let's take a look at the first part of discovering analyzing and scoping data requirements for organizational persons places or things the nouns of our organization. We need to take the person places or things whose information needs to be created, read, updated, or deleted and potentially archive so we call it crude or crud a for when to do these things that we're individually characterizing are called attributes. They are the parts of the data model that give it the detail and differentiate one instance of a thing from another instance of a thing let's get a little bit more specific. An organization might decide to characterize the parts of thing as thing ID description status sex to be assigned and thing reservation reason. Well, while that's interesting, it gives us also important characteristics for example, if we were looking and saying, can we have a female thing versus a male thing. Well, if I don't have the things sex to be assigned I wouldn't be able to tell the difference between female things and male things that are there. These characteristics can be shared and all of these things have a status so we can look and say that many things can be assigned to females. In that case, with the specifics of the example there, these characteristics also help us to identify a unique instance of all of them so this thing is different from every other thing that we have in there. The description however is likely to be characteristic for things in general. Now when we talk about representing and communicating these requirements well here's our attribute collection of the instances. Here's an instance of it club ID tells us a lot about this data collection. The existence of the attributes in there tells us that clubs need to be separately identified one versus the other. Why might that be important. That's going to be a business reason that you as the experts are going to be able to figure out. So if we add a club ID to the club entity in there. We know that in addition to that some level of organizational of organization exists above that club level. In order to do that if we go a little bit further into this here attributes describe entities and attribute values are characteristics of the instances of things so a club might have something for a current promotion a maximum period of obligation. Number canceled year to date number of members number of total units sold for the club. Okay interesting depending on what the business problem is now let's look at this in context of the relationship so I've got club club member and cluster connector. Alright now again maybe it's a wrong term maybe it should be club connector but it's what it is in this so the relationship is a natural association between two or more of these entities notice I've not used any notation I've simply said that they exist. We do need to get a little bit more precise in fact there are four variants of modeling notation and this is one of the things that is a improperly explained to young people as they take our classes and because there's a lot of confusion. We have four styles there's a Chen style a Bachman style a Martin style that would be Peter Chen Charles Bachman and James Martin who put these together but the one that most people use is something called information engineering. And that is the one that was created by Clive Finkelstein almost the entire world has moved to the information engineering mandate on this, but just pick one and stop arguing about it I see too many arguments in here and nobody who's outside of data, like Seth Myers is going to give a hoot about any of these things. So let's take a look now at what we're looking at well we've got exactly one that's a notation one or many, eventually one that was an innovation that Clive introduced to this zero one or many optional and there were eventually zero one or many are the four ways of describing, five ways of describing how information entities can be related to each other around all of this. These specify the ordinality and the cardinality as in the mandatory optional relationships using minimum and maximum occurrences and so here for example in this short little data model here is a rule that says a bed is placed in one and only one room. All right, seems reasonable, and a room can contain zero or more beds. So again you can look and see how this notation makes sense intuitively a bed is occupied by zero or one patients and a patient occupies at least one or more beds. Again, you may say why are we having multiple patients occupying sorry one patient occupying multiple beds. That's something that's going to have to occur in the context of what you're trying to figure out around. So it may or may not be a problem and by the way this becomes a huge issue. If we now start to discover that the beds are portable, or they can be moved their own wheels in that sense. So let's see what happens as we get things a little bit more complex around all of that. Here's a thing one and a thing to here we're representing and communicating these two and what it says that each thing to must be accompanied by a thing one, whatever that happens to be in there. This gives us the ability now to dive in a little bit more detailed with the specifics of the data model that we're looking at. And here's another component when we look at this we can say a bed is related to a room that's great. And we want to start with that but we might want to say, maybe we need a little bit more precision. How many beds can fit in the room. Unfortunately the answer is going to be many. Right, so we're going to move up to a similarly glossy term, and maybe many beds in many rooms is not really helpful. But if you think about it you really can't implement physically a many to many relationship in the real world because you very quickly do a stack overflow on it in here. So we eventually move these things to one to many and things like that and it says many beds can be contained in each room, and each room may contain many beds now that's a very good specification in there. On the other hand we say what if the beds can be moved. Alright, well, now we need to introduce a dimension of time. The bed was in the room from period one to period two whatever that happens to be as we're looking through it. There's a series of organizational needs that the organizations all have, and those needs need to be instantiated and integrated into data models, these data models then authorize and articulate specific system requirements. So people are spending money on development activities, and they are ignoring the data piece of this do so only at your peril. This also of course is an iterative process, because we do need to go back and get some feedback and the data models are developed in this response to those specific requirements again our trustee catalog we need to make sure that that is in the process and like I said it iterates over and over again so we're going to do a second version and a third version, as we're looking to improve the effectiveness with which we do this data modeling stuff. In fact, if it doesn't make a lot of sense just sort of put it up on your wall and leave it and we can come back to we won't do it in this particular term, but there has been a standard around this for many years, talking about the three types of models that I mentioned earlier starting at the bottom the physical model is the model of the actual database that is in session. However, that database that is running, maybe a, let's just say Oracle again because I've said Oracle once already. Oracle may do things differently from how other types of platforms do it whether it's in the cloud or on Prem. We need to understand this from a logical perspective because the business people are not going to be able to relate to the physical part of this but they will be able to relate to the logical the logical model is a physical model that is descriptive of all the technology specific details and aggregated for the use in the business world, but conceptual model is what you want to show to the people at the top level of the Zachman framework if you will. That allows people to go through and relate on a very managerial level to all of these models. An arrangement of data is called a data structure, and you can see the computer science definition there over on the right and organization of information usually in memory, I won't read it for you but you can get the sense. You'd never want to have that conversation with somebody that is not interested in having that conversation but what it does do is describe the characteristics these data structures represent a grammar for the objects that they constrain things about the objects that may be unique or a requirement for uniqueness. We may have an order that is hierarchical or network or into a lake. We need to have balance we need to have optionality excuse me optimality in order to do that. All of these components are expressed in a data model by taking the details and organizing them into larger components. And that's where things get intricate. Important to make sure we've captured the intricacy of the business because your business may have a significant advantage over another organization's business, but you don't know exactly what that is and it may be the way that you handle your data. So that intricacy becomes a very important part the larger components are then organized into data models, and this introduces dependencies. For example, I can't sell a product if it's not in the catalog in the first place so clearly we would have a process that adds a data item to a catalog before we can attempt to sell it I know that sounds quite obvious to us but we have seen it. In many instances where organizations don't actually have a really good idea of what they're looking at on this. And finally the data models are organized themselves into architectures. And these architectures have to be driven by a purposefulness. If we don't have the purposefulness, it's going to be extremely difficult for us to figure out why are we doing this by the purposefulness relates directly to strategy so again motivation strategy etc etc. All of these things start to come into play in order to do this. And while we have purposefulness and it's kind of interesting data models end up being architectures so again, the attributes, the entities, the models. And finally we get to the architecture. Now, there aren't a lot of architecture pictures out there on the internet for people to use and this is better than most. And if you look at it carefully you'll see it's not only upside down but it's also backwards, which I know is completely unhelpful to you but the idea is you're never going to put this up and start to use this as a model. One of the first really comprehensive data modeling value added data modeling efforts that I did was that I had the title US Department of Defense reverse engineering program manager for a number of years when I worked for the Defense Department. And we've put together something called the DOD integrated process and data model and believe it or not, even though I did that 30 some years ago they're still using that model today because we got it right. I had a great team that we were working with I was part of the team I shouldn't at all claim the credit for it. We had a really phenomenal team that was working within these areas. So, when you look at a big architecture like that the next thing that happens is, people say okay so data modeling is a part of the DIMBOK now if this is the first time you're seeing the DIMBOK blame that entirely on me because as data management professionals we're not the most articulate, certainly not the best with marketing, but the DIMBOK is our body of knowledge in this area. So we're all at the data management body of knowledge, put out by data international of which I'm the president at the moment and have been for a little bit and we're getting ready to start the work towards our third version of this so it's a very very popular product. It's being used in university curriculum around the world and you can see here data modeling is a critical important piece of this now. What I'm giving to you all and the way of guidance here is very different from what you get when you taught this in the university context. In university context it's much more narrowly defined for the purpose of developing a brand new database, and we shouldn't be surprised therefore since we've been only teaching people how to build new databases for the past X number of decades that's at this point that there are too many databases out there. So we really in some ways sort of made our bed and have to have to own the amount of data depth that we've created by doing this in a, I will say have hazard but certainly not in a fashion that we should. So we've talked about models, because models help to simplify and create understanding the models help to get precise agreement, certain level again the data is the most granular piece that you can come up with for these. So we need to make sure those pieces are right if not it just becomes sand in the gears. Again, perfection is something that we'd like to have particularly when we're building new products, but if we can't get perfection we can at least get to flexibility and adaptability. The most important part and this is the part I have the child's trouble with explaining to the students in school is that we have to iterate on this development and the reason for that is because we're unlikely to get it perfect the first time. We need to put it in place we need to try it poke it and product and expect that iteration will be part of the process as we're documenting and understanding these data structures. So let's spend the last 20 minutes here using how to look at data models how to use them effectively around that and the first question everybody will ask you of course is where are your blueprints. These are what the data models are for and how you should use them. There are correct ways to organize data. And yet, if you think about it, most people are not taught. And certainly, most people aren't good at practicing. So, again, in any shape I can optimize it for flexibility for retrievability for risk reduction lots of different ways of going about it and implementing a couple of different pieces here and I'm sorry I'm going to go back on that because I want to make hit those components here and the idea is, again techniques include data integrity smart codes and architectural join tables so that's as far as we're going to go today because we only have a limited amount of time, but let's start off with smart codes bad dumb codes good. So that may or may not make sense to you on this but if you'll think about my area code 804 in Richmond, Virginia and the US of a, we ran out of 804 numbers a long time ago used to be though that when the bell system was created the first phone system was created. We use that zero at the center three numbers to signal long distance, all telephone equipment, everything that we built all these pieces that you're looking at in the little video there on the right would look through numbers and anytime they saw zero in the middle number of three they would know that was a long distance. Here's another example that was hilarious but nevertheless true. This is a set of courses that we offer at Virginia Commonwealth University business 361 362 363. We were doing intelligent coding here it was a business and the 300 levels were the ones that were for the information systems course and I had a dean tell me at one point. We're going to do more courses, because we've run out of numbers. Now of course that's a really bad way to think about courses and numbers in this context but these are the way in which some people badly put these things together. Finally, I mentioned before the upward theoretical complexity that we were talking about. This is an organization that is going to have to change something they have a primary key that's going to have to be expanded throughout all of their systems. This is going to require changes to upward of 100,000 systems. I bet you'd like to know which company that is so that you can go out and sell their stock short because they're going to clearly have a lot of uncured incurred liabilities around that particular piece. All right, let's move on to table handling now. Again, a little example here here's a table collection of things that have the same things people might look at counts monthly averages. Again, table name and table elements right well this is the simplistic component of it. If we don't do this right we can introduce confusion and risk, and here's a table that talks a little bit about an album that I got recently by the way recommend it highly. If you're a fan of Oscar Peterson or just good jazz in general. So wonderful concerts that they unearthed from Helsinki many many moons ago, but you can look and see what's here, a song and album right. What's the length. Do they keep the length and the answer is no. It's a really bad idea to keep the length in there because you can put a stop and a length of song and come up with the difference between the two of those. It's a more flexible and less risky approach to the overall portion of the table handling I mentioned itunes to it's actually they've changed the app now to music but you know you'll have it with your windows platform or a Macintosh platform here. So I've got a war records or maybe there's four million records, and I've got purchaser ID so they can tell my purchases from somebody else's purchases, and then we've got individual songs and prices okay well that seems reasonable at first, but let's take a look. If I delete the record that is marked record number one in there. What happens when I delete record number one is that we lose two facts we lose the fact that purchaser number one purchased cool walk, which is the name of the song. We also lose the fact that cool walk costs 99 cents, because it's the only place in our system that we stored pricing data in here. This is usually undesirable and unintended those are the words we're going to use to describe all of these that's by the way called a deletion anomaly this is an insertion anomaly so I now want to add a fifth record to this or a 5 millionth record to it. I want to add the new song cakewalk, and I want to add that it costs $1 29 well that's a fact, but I also need in order to do this table properly a second fact that somebody has purchased this particular piece so I need the purchaser ID, storing two facts in this table is not good, we can't insert a full row until we have an additional fact about that row this is always unintended and usually undesirable finally we have update anomalies as well. If I want to go in and ask the question, maybe I put cool walk in there incorrectly at $1 99 and it should have been instead $1 and 29 cents. If I go into this type of a table, I've got to go in and examine every instance of song to see whether I find all the errors so that I can change every instance of $1 99 to $1 29 this is of course undesirable and unintended as well and we will also miss any misspellings that happened to be there, which is one way that you insert a lot of data errors into your data. So how should it be done well again and store as much as possible one fact per row. There's our original table in the upper side of this diagram will bottom side of the diagram shows the two distinct facts a pricing table on a purchasing table. Now I can join these two tables in order to provide information to say who has bought, in this case cool walk and I can look and see that purchaser one and purchaser three do it by the way this works with 10 records 10,000 records 10 billion records, all the way around. Another component on this is use the words, not definitions but instead something called a purpose statement. So if I asked you all what was the definition of a bed you would go hey, don't ask me dumb questions right there's a bed right. Well, let's go back to our little, you know, example that I was showing you in the hospital example of few minutes ago so if I say bed is something you can sleep in. It's kind of interesting. Notice here my purpose statements actually describe the bed, and in this case, a room contains zero or many beds by the way, even though that's not the notation that I showed you before it still follows from there so it says no to associations and the red line pointing to this in there, in order to do this. If I discover at some point, all of a sudden that we're going to use a ID on the bed a Bluetooth descriptor or an RF ID. If it's an older system in order to come up with these things. And I start to move things, the meds around this like every immediately goes oh great I'll be able to stop losing patients in hospitals yes that happens an awful lot of the time. And it's sort of sad different story entirely around all of that. But what room is the hallway, what room is the elevator. This was an actual situation that we came up with where the contractor was proposing to do this and you can see already from those two simple questions it's a really bad idea. One last thing on these data models. You also should have a status associated with them you should always list all of your data models as being in draft mode until they have been validated. By the way, if I'm doing this properly and trying to validate the model. I also may want to look and say where's the bed transponder in order to take a look at where this all goes to work. He's a great pioneer in this area, Fred Brooks. I know very briefly, he's the inventor something called the mythical man month while that is an interesting concept and not something we have to do here it's something that shows the nonfundability of many things in it around here but he also had a couple of great sayings in here and one of them is that data representation is the essence of programming. The flowchart conceal your tables and I'll be mystified was his comment, but show me your tables I won't need your flowchart. It'll be obvious what your business is and that is absolutely true. The process that we use to define understand data requirements is the same process we used to document them so we might as well do both at the same time. And that the needs are constrained by specifically supporting business processing and information systems that we need to have main times the intent of this is to focus on a specific question. I may need to do a data model just because a specific question comes up and I'll give you an example that in just a bit, but there's also several variations that permit different efficiencies in the organization. Once you've created some of these they can actually become documentation for your organization. So here's an insurance company that has said, these are the things that we will use when we're talking about accounts, subscribers, charges, and bills, and they make copies of this and give it to everybody in the entire world and they all use it and they all agree and they all speak the same language and everything is easier. So here's an example on this. This is a rental car situation here and I just want to point out just one thing, which is that nothing in this data model ties the automobile to the rental customer. There's nothing that prevents a car from going out to multiple places. So if you've ever run into a situation where the rental car company says well I know I had your car but I don't have it and could be they were using this data model in order to do that. Another quick example again price is not included in the catalog here. Now that's sort of weird. What does that mean well it may be good if we want to have the sales people set the price, or maybe really bad if the sales people are setting the price and they're not supposed to do that. So whatever the system is is whatever people will find out. Here's another point on this one. We look at the invoice here and say it's not possible to determine what part of an order the invoice pertains to. The invoice is simply blocked on at a very large level. If we need more granular detail than that it cannot come from the system so we have to involve people in here in order to figure out what's actually going on in order to get this forward. Here's a quick example here from a data model. This is one of from the DoD that we use many years ago but it actually came up. So here's a one to one relationship between an admission as a discharge. What that means of course is that every admission must have a discharge and every discharge must have an admission. Okay, that's very nice. What does that mean well turned out in this case. Excuse me. The key here was that they didn't want to call death a disposition code at this particular hospital on here but it turned out that if that's the way the system is built death must be a valid disposition code. Ouch kind of a little harsh there we won't talk too much more about death. Instead let's talk about how things are mainly done and that is through we've taught our students to do forward engineering building new stuff. On the left hand side which is a requirement. It's typically a set of specifications that are there then we make a model to show how we'd like the implementation to build and then we actually build the system in order to do this. We look at it in reverse engineering. Now we're coming from the existing system and trying to figure out what were the, what was the design and what were the requirements in order to do that the reason you want to do that of course is because everything has some good stuff and some bad stuff in it. So let me show you the whole model here when we look at our environment there is our reverse engineering if I'm going to go back and understand how the system worked. Again, every system is going to have some strengths and some weaknesses. We need to understand that what they are so that we can build on the strengths and hopefully correct the weaknesses in the next version of whatever it is we're doing if we're not changing the strengths then I can follow the purple path and just go straight from one design aspect to another before we implement but most organizations don't do it this way. Again, it's sort of not how to do it but how it's typically done. We take the forklift from this place right here, and we move that data down into the new system hey it's just a series of spreadsheets what could be hard about this as well. Again, a very bad way of understanding it because what prevents us from bringing the bad things across that we didn't want to bring across and what helps us to create the useful things that the current system does in order to do this and what we really need to do of course is the exercise that everybody goes through in high school which is to say I start out with my piece I go to a logical as is that I move to a logical to be designed and then I put it back in the new database. The problem is they forget to tell them there's just one other thing that happens there, I have this new criteria, but the reason I'm doing this in the first place is because I'm adding or changing some requirements to it. And now with those changed requirements I can move it into there and that's where the oopsies come about in a real problematic fashion. Here's the entire model of data model possibilities, the as is model versus the to be model, what I have versus what I'd like to have. All right, now I'm going to change the color scheme on you and say, remember everything can be valid or invalid. Right well if it hasn't been validated it is questionable at this point and that's a real question as to whether you should finally, we have our logical physical and conceptual models to take a look at this and everything that we do in data modeling falls into one of these categories. Every single piece can be mapped to some sort of transformation in here. Here's the overall transformation I am not going to walk you through it but you can see it, and I've put it together there on the next page for you to do. The key is that every modeling cycle has a specific purpose. We're doing this for a specific reason let's focus on that model purpose. We're in this room to understand the difference between a relationship between the soda and the customer right well okay not terribly interesting that, nevertheless we've modeled it understand the difference between our hospital beds, primary ways of tracking patients, or could we change our systems to handle this business rule tomorrow. Is it permissible to do job sharing on this well without the data model. It's almost impossible to have any sort of confirmation. All of this, one of the more important aspects is to not tell people a whole lot about modeling. It's just not interesting for them. So don't tell them you're modeling, just write some stuff down, arrange it on a piece of paper and then make some appropriate connections between your objects, and they won't mind at all they'll just, oh you're taking some notes cool right. But from how do we actually build these we identify the entities that are there. We identify the key for each entity so we can identify an instance of each of the individual pieces that are there we draw a rough model of how those relationships work. We then put down all of the attributes that we have and allocate the attributes to the various entities that we're looking at and then as I said before, we iterate model evolution is good at first. We keep iterating to the very end that would become really problematic but at first we should be making some additional improvements and when your improvements start to top off we've we've kind of got everything you'll see how it works. The way I like to describe it here is sort of the relative use of time during the various tasks. So, one of the activities that we do in data modeling is collect evidence and do analysis. Over time the collection, which should requirement should go down and the analysis should go up. Similarly the project coordination requirements should also decline over time. The amount of target system analysis what we're trying to do to get on the new system should increase our modeling cycle focus should alternate between validation and refinement once we have got the model correct we need to go back and refine it in a way that will make perfectly good sense overall. One of the nice things about what we've been doing here just as YouTube shows you lots of ways of fixing things and all sorts of stuff I remember my parents were so thrilled when they figured out they could go on to YouTube and ask it a question and they would answer for them. We have a lot of basic data models that have already been put out in here for really good books on that that show David model patterns and other types of things. I have had people call me up and say do you have a data model that I can look at for a health care pharmacy. And the answer was yes it's on page whatever one of the books that we can take a look at in that. I'm close to the top of the hour here and I'm just going to do a couple of takeaways as we get ready for our Q&A around all of this. Goal oriented data modeling is really quick. If you don't understand if you have no disagreements that are going on while you're doing your data modeling, they're not paying attention. There's errors into the data model so small and sometimes just to make sure that people are paying attention if they're not vested in it if it looks fine, you're not making the progress that you want to have data exchange is automated. And therefore we want to be highly dependent on good quality architecture and integration in there, which means a solid foundation of data modeling on top of which these techniques are built. Similarly, the model characteristics are going to evolve. And I've said I showed you an example just a second ago but also different analytical methods you may combine this with some exercises around process modeling for example. Keep in mind here, everybody says definitions. It's not sufficient if you incorporate a purpose statement in there that speaks to the motivation and that has so helpful in your data modeling so data modeling is a problem defining as well as a problem solving activity on this. And the use of modeling is much more expensive excuse me much more important than the selection of a specific method so it's really not critical that you have a good argument or attend your study to decide which of these things to do, but use models. People will understand why you use them, and they will start to access them and more importantly use them in the future as well. So these models are living documents which means they can't exist on paper alone they've got to be in some form of a case tool, and we don't even teach students what a case tool is right now, much less how all that works together. In order to do this, the key for all of this of course is that utility is absolutely paramount. I can add color diagrams objects. I had a do d exec throw something at me one day because I was explaining in a very academic way how the data model was correct. And the official said, I want to see the during battleships. Yes, I got that. Okay, we will absolutely move out with that. Thank you for the hour here and time to turn it back over to Shannon. We've got some upcoming events here again next month will be data stewards and of course a couple of events that we're getting ready to have here but let's let's go back to Shannon and see what sort of questions you guys have for us. Peter thank you so much as always for another amazing webinar always appreciate it. If you have questions for Peter feel free to submit them in the Q&A portion. And just a reminder and answer the most commonly asked questions I will send a follow up email by end of day Thursday to all registrants with links to the slides links to the recording and anything else requested throughout here. So diamond in in today's so called agile process where management uses it wrongly and just wants products delivered ASAP where data modelers doesn't get enough time to think big picture or flexible future proof. What is the best approach to in that situation in other words is there an agile approach for data modeling. There are a lot of people who have labeled things that they do agile. I'm going to read to this and this is I've actually written on this a bit agile is a movement that was based around for initial premises where the three. I forget what they used to call three amigos got together and said what do we really need to do to create higher quality software faster and nothing besides agile has produced the ability to create higher quality software faster. Most organizations are not creating their own software now they are instead moving to packages. And so there's a lot of organizations and vendors that have moved to this sort of agile delivery mode to say that we can do things in a much more adaptable flexible and adaptable fashion. Great question on that. If you're going to do data models though you cannot take shortcuts. And this is something hopefully that you got out of the presentation to say that just as the the princess on the P is something that we understand if I build that situation out there where I've got a very nice system but the data model unfortunately looks like the princess on the P, it's going to be a very sad situation and of course if I built all of those layers of application programs up on top of it. There's nothing agile that I can do to correct that that is data debt that we have in there and we're going to have to find some way to ameliorate that particular debt. So be wary of people that are trying to sell you snake oil and say I've got an agile data modeling piece. There are some very good techniques that you can use to help the sessions. There are some really clever techniques that you can use to reverse engineer from existing systems. As you're doing this, many people call just those pieces agile, I don't know that I would because there's really nothing that relates them to agility in there but they are faster and less risky so it makes a lot of sense on that. So, there is no real agile modeling piece if you do the data badly, you will have that P at the bottom of all of those mattresses and the princess will be unhappy about the process. We need to make sure that people understand that certain things that you can skimp on and certain things that you can't if you're creating a data model that is going to be the basis for software development or other types of activities that you're doing. The model has to be correct because we're now going to potentially do it a billion times a day, which means if I'm doing this particular model a billion times a day I've got a billion P's out there. I'm not sure I wanted to go there but that's really worried and ended up anyway great question I do appreciate it hopefully it's helpful. Very helpful. I'll just answer that. You guys know that Shannon's in charge right okay. So Peter, how should data domains be derived in an organization. So data domains are a good way of looking at dividing up your data into some smaller pieces. I'm not going to say this is a good architecture even a good data model here, but you can see at least that there is some orange and some green and some sort of light blue and some dark purple and other things. These may be ways you can divide up candidate domains now let's be very clear what you mean by a domain. So data modeling context is the allowable values that you can have and I'll give you a very simple example that unfortunately has become more complex recently. In the old days back when I was growing up, you had two values for gender. One was male and one was female, not going to comment on the correctness or incorrectness of that it was probably wrong then it's probably we're not right. But if I build a system that has two values to allowable values in it, then I'm going to go build other things from that for example, in most buildings you then have a female bathroom and a male bathroom. In order to do that of course we've hopefully gotten smarter as a society in the long run, and I believe it's Facebook now counts 63 different attributes of gender that they connect in here. So maybe a very different data model, then is the allowable domains of M and F in fact probably M and F are going to get you in trouble if you limit your domains to that specific piece. Another way that we use the term domain though instead of the allowable values in a data model is also the relatives where these data pieces are primarily stewarded from now most people like to use the word owned. They don't allow people to own data when I'm working with them. I know that sounds like it's a crazy idea but if you allow somebody to own the data. It means somebody else does not own it and therefore they feel like they don't have access it's not about hurt feelings. It's about making sure that your projects can move on and I have stories that I can tell you guys where one particular organization. If you don't like my data you can't have access to it anymore and they closed a curtain across their data and wouldn't let anybody have it they had to wait three years till that individual retired. These things sound crazy but they do happen. It is absolutely completely nuts. So these domains we might talk about the orange representing purchasing and the green representing supply chain and the pink over on the right hand side representing HR. And that's another way to do domains and what you're looking at goes back to I think the chart that I showed you here, just so that you guys can can get there pretty quickly because it's a fairly easy process to get to. What are the nouns that you're talking about pertaining to persons places or things. That's too high a level of abstraction but we can absolutely get down and say what types of persons what types of places what types of things are we going to do. Excuse me in this. And I should also note as well this relates to master data management because some of our data we want to treat extra specially careful in order to get that in place. So the domains are going to be subsets of these person places or things that kind of hang together and have some sort of collective stewardship responsibilities. But again, I, if company says will we let people own their data. I just won't even work for them. It's just that silly of a proposition to do it I have seen so many problems that can be solved by simply not letting anybody with the organization own the data. And again, within this context here, what are the domains that they're going to have well, certainly if somebody wants to own something they can own the data requirements will give them that. But they can't own the data themselves because the data always belongs to the organization as a whole. Again, great question. Thank you for asking that. Peter, is it good to have a summary aggregate tables along with flat grain tables in a data warehouse or just the flat grain table knowing that today we can create business logic in a semantic layer. For example, flat tables in Oracle data warehouse and a semantic layer and power BI or Tableau creating business logic which may have aggregations. It's right. So, the data warehouse is designed to, and by the way you need a data model for your data warehouse of course if that's not obvious there. Data warehouse is designed to help people get where they're trying to get to data in a faster session so we've taken it from the disparate systems that I showed you before where they're all kind of connected to everybody. And everything's tripping over everything else and it's many to many. And instead, what you're saying is I'm going to put it in a data warehouse. So the question that comes up is sort of like this question here on the question of the timing of the songs here, we can derive this data very easily. And so in that sense there's no need to devote horsepower cycles engine cycles you know database cycles warehouse cycles to calculating things that already exists, unless people look at them a lot. So, one of the examples that we had on this, and a similarly structured table is a table that showed what the sales people's bonuses were going to be at the end of the month. And you can imagine, if you're a sales person and you understand a little bit about data and you want to know what your bonus is going to be at the end of the month, or your commission check or whatever it is that we're looking on. And so you go out there and you learn a little bit of SQL and you start diving into the data warehouse and running SQL against it so that you can see what your commission is. They've seen examples where that will hang up production, because the sales people are really interested in what they're going to get paid as they should be by golly and mess with somebody's pay is not a good idea at all. You know, they'll, they'll do that so then what you say is okay great so let's just make it so that we can have a daily, you know, commission rate for each individual there and we'll put that in a little table. The same way as the hash tables and things that the questioner was describing in there can also be maintained in spite of the fact that it's potentially redundant data. So my mind here is, how can we deliver information to the most people, which from the shared resource, the most effectively, and if some types of queries are going to get in the way of other types of queries that it does make sense to bring out other tables. I'm not giving you the entire picture that's a much longer diatribe on that but unfortunately the answer is it depends. I think it's going to be hurt by having redundant tables in a data warehouse that makes the overall warehouse respond more correctly to the users requests that are there. Thank you. So, do you recommend. My question just moved. Do you recommend one model covering all reporting requirements of an enterprise or do you recommend subject area base data models integrated subject based ones is definitely the way to go. I showed you this sort of silly model of everything being up there and can you imagine working with something I have worked with them I mean I told you guys I built the DoD data model as part of a team effort that we were doing and we use it in exactly this fashion. But when we're really working with the DoD data model I'll just use this domains thing of the end a little bit. So, one of the groups that I know that uses the data model on a regular basis is something called US Transcom the US transportation command, and we're going to pretend on this model that the pink or the green stuff is for US transportation command. Well then they would only look at that portion of the model, as it relates to them and if they find anything that needs to be changed they know they have to go up a level to the DoD level and create in this case, a change initiative around those other activities whatever it is they happen to be in there. So, very much not in favor of trying to subdivide things too much but at the same time have a general set of categories where people know what it is they're responsible for, know where to go look for it. Again, oh yeah I remember you were in charge of that stuff over there. You ever heard of this thing, right, whatever it is. Yes, I can answer that question or no I know I can't in that particular context. So, lots and lots of things that can be done with them but I would, again, look back at your specific what are your business operational requirements, and more importantly, are your business operational requirements the same ones now that they were when you created the system. Well yeah when we created the system we had 100 customers now we've got 100,000. Okay. I'll just call me a little crazy but I think you should probably relook at some of those things in there right. Again, great question. Thanks very much. So, Peter, should data models continue to use the cardinality construct to define relationships or switch to the multiplicity construct. Well I have to admit I am not familiar with the multiplicity construct is that something somewhere else or maybe so we can put that note and eliminate all of us if I didn't have my screen shared right now I'd look it up real quick and pretend I know it but I don't know the multiplicity concept. Sorry. Good. It's, you know, if you want to if the questioner wants to add that into the chapter that I will come back to that. So, moving on then, can data sets be reverse engineered into a unique solution data model wise. Kind of an awkward question. It is possible to reverse engineer any existing data set that data set can be in the form of a database, or can be in form of things we used to call databases which were like I say I'm in v Sam models, or even tables for an Excel sheets everybody understands Excel we store multiple facts in each row in Excel which is one of the things that's not a good way to create data model designs for in order to do that. And so, we end up with a situation where you've got too many things that are going on at once, it becomes very very difficult to take a look at, and very difficult for somebody to confer to comprehend, and really to make use of in the way that these things are intended to, to make the use of it. That said, when you look at how the rest of the world is evolving in these ways. People are starting to have expectations about data and data sets. And when you start to mess with those expectations, people get disappointed very quickly. So, you'll see an awful lot of generic reporting type constructs, again just like the previous question about reporting, you know, all the reporting in one, one big data model or whether we should have a series of little ones that the key is they can't be little and standalone in order to they've got to be integrated in order to come up with it so you can't just have a, you know, a piece and say hey here's this piece of it and here's this piece of kind of trying to solve a problem by only looking at the individual pieces that are out there, and not in fact, at the overall totalitarian because there could be something over in orange that creates a problem for something in late blue later on and if we don't have that knowledge in there we certainly won't be able to anticipate much less try and head off whatever the bad thing that's going to happen when we do that. Yeah, and I got a lot of track on that one. You did. It was good. And I don't see anything to clarify the previous questions and moving on. Is there a contradiction between the data model being static the bones versus the data structures being adaptable. That's a really excellent question. Alright, so let me go back to the princess and the P image there while I talk about this. The key is that flexibility is an architectural characteristic. One of the more fun things that I do at the university is that I tell them did you know that buildings can be flexible, and they go, what does that mean. In my case, it's a very large structure of multiple parts, and we will go out into the hallway and I'll show them a joint where two parts of the building are attached together. So the question was, should we build the data model in a way that anticipates all the exact needs in here, or can we instead do something that will allow the organization overall to gain from potential future pieces and I use the example earlier on of languages, you know, if you're going to build a language into your system, it's a good idea to build the ability to change languages into your website or whatever system app that you're building in order to come up with a more flexible structure. Let me give you a different example on this one, and this is one that comes out fairly often these days, because we keep seeing more and more levels of specificity and people's abilities on this. One of the things we have at the university is something called an accommodation, and they have a differently abled individual who simply needs more time on a test. Okay, perfectly reasonable. Somebody else does the judgment they hand us a note as a professor and say well you may have everybody in the class gets 10 minutes on this particular test. This person gets two times, whatever that so that'd be 20 minutes for the test. Okay, interesting right. If we develop that model and I'll go ahead and give it an instance. In this case, Canvas is the learning management system that we use at VCU fairly fine, you know, learning management system. If Canvas had put the time for a quiz as an attribute of quiz, then I would be able to set one time for the entire quiz. Of course you can see already we've got differently abled people that are taking classes so it makes absolute sense to make sure that they also have flexibility on the time. Now, did Canvas know in advance that they were going to have differently abled people taking their quizzes. Maybe, maybe not I'm not as familiar with Canvas and don't have the ability to go back a couple of years in order to look at what they didn't they evolved, but because Canvas correctly places the time attribute in relation to a person instead of relating to a quiz. We now have more flexibility in that system in that if the organization wants to implement accommodations it can if they hadn't they couldn't simply put. So if you think something is a potential requirement for your system, and it doesn't completely disrupt your data model intent and purpose and that is a key thing data models are done for a specific intent on this and then we want to make sure that intent is known and understood by everybody who's doing the design work. Then we can build in things that may come down the road flexibility that can be be put into this. So I would much rather have a system designed with lots of flexibility around the whole system, then I would having it be perfect and rigid if that makes sense. I had one more point to the question on this channel because it's really, really good question. I worked on Deutsche Bank's main back office trading system for about five years and their back office trading system is a system is called DB trader Deutsche Bank trader, so basic trading application, but the architecture of the system was absolutely phenomenal from a design perspective. They had the system that everybody else on Wall Street wanted to have literally the code was secure in the way that the US Constitution is secure at the library of the archives in there where they, you know, make sure other people can get to it or Nicholas Cage can't go in and look at the back of it or whatever it is that they were looking at. There were three architectural characteristics of the DB trader system that made it unique the first one. And it was a real time system. All the other companies on Wall Street at the time were using batch systems. The fact that this was real time meant that they could absolutely tell what was going on at any point in time during the day, which turns out to be a major advantage when you're trying to get into some of the seasonal things that happen in the, in the investment bank world. The second attribute was that it was a multi currency system. So they didn't have to put in pesos in one side and have it come out euros in the other side, or I'm sorry so they could put in pesos in one side and get it in euros in the other side, as they were going through the transitions and even now they can convert easily between Chinese Japanese, and the euro and the dollars back and forth. In order to do that which makes it hugely easy for the various trading partners to actually use as a platform. The third architectural characteristic of this system was that it was all table driven. And that doesn't mean anything to you what that means is they configured their products at Deutsche Bank their financial products by creating entries in tables. They didn't have to rewrite the system for new developments they could create them, more or less on the fly. And that is one of the reasons they had the best system that was there on the marketplace. So again the question was, is it better to build this stuff in that's the first place and answers yes it's better to build flexibility in. So I'm going to end up with the princess on top of these things. 20 years later, being unhappy about something you can't do anything about because the P is buried under tons and tons of application development that is not going to be easily undone and redone. Again, great question. Thanks for asking. And we've got time for a few more questions here that I'm going to slip in. Can you, Peter talk about the relationship between the data model and data strategy. So data strategy is, well let's go to architecture in the first place which is the idea that these systems should have a purpose. If we don't have a purpose for the system or we don't build that intentionality into the system, then any answer was correct and that's probably not going to ever be correct in that sense so again intentionality, very, very key. When you look at organizations try to do this. Typically what they'll do is they'll take a software package and say well this software packages how we're going to implement our strategy. Again, a very simple one that happens a lot is that banks want to get to know their customers better. You know, last time I looked back in the 90s banks were looking to get their customers better I was the whole purpose of come inside so we can give you a toaster. We're trying to find information on people that the key there is that if the data model is set up and again let's just take strategy at the highest level strategy at the highest level is faster better cheaper. Right and there's a component of risk that goes into it as well but all strategic goals are going to sum up to those four things that are there. I've optimized the system from a data modeling perspective to give me the most granular data possible so that I can always do the most flexible and adaptable analysis. And yet I have a query that runs a billion times a day that's never been optimized in there. I'm going to have an inability to achieve all of my goals. The goals of the organization may change over time. We may start out by focusing on quality, and then as Ticketmaster has done, but switch to speed. Right. Ticketmaster is getting ready to put Beyonce tickets on sale next week and you better believe there's going to be a hullabaloo over how they handle it because I know they haven't made any changes to their system since the last oopsie that they had there. If Ticketmaster had the ability to do so I can guarantee you they would be back completely reverse engineering and reengineering their system from the bottom to the top. So they don't get Taylor Swift fans running up to Congress and saying Ticketmaster sucks. There's a great example of strategy not working out well for an organization on there and I can also tell you that since the goal of that organization was to save money not for us but for Ticketmaster they have not upgraded their system since they put it in the first place and that really does speak a lot to say, you know, there is a room where competition should come in and allow them to go out and build additional components around the whole thing, but because everybody's optimizing for speed in this case here or in this case minimizing costs, there's no way that their system does reflect the strategy of that particular organization. So absolutely your data model is very intentional in supporting the strategy. I could do it this way or I could do it that way. I could run it fast. Okay, then I'm going to do it that way is basically the answer that you have to get from all of these pieces. Again, great question. I think it's more slides I got to add into this for the next time through. Indeed, so we've got just five minutes left so I'm going to throw one more question out here in a data ecosystem what role does a data model have to play from a data governance perspective. The usage of data governance should be metadata. Now I know that's a little bit of an esoteric statement. But if you were people who are making decisions about data, aren't using the terminology that the systems people are using, they're never going to be able to understand fully the problem dimensions that they're working on. I guarantee you every organization I go into and I've been well over a thousand of them. I believe somebody that really understands the data that's the data person that's there. Ideally that person probably should be working for the chief data officer at some point but we won't get to that too soon. So when you look at how organizations are trying to figure out, make use of these data assets, and to employ them strategically in the process. You're going to be a one or done thing you're going to try some things do some things over here, and then you're going to do some other things and we need to have that communication in place in order to talk about them so the language of data governance is metadata. If a data governance group is sitting around and they've never looked at a data model. How do they know what they're making decisions about. And the answer is typically not, although I will say I've seen a lot of groups that are working in a regulated environment they can do that, pretty much without the data models but for everybody else is trying to better use their data in support of strategy that's the whole purpose of data governance in there, then they're going to have to understand what the data is and if I don't have that model that shows me what the data is. I will have no ability to tell the governance group what they should be doing and when they should be doing it and, you know, again, let's go the whole Zachman framework, who we're what why and how, right. As far as all of those. Oh, you've left us with a few minutes left Peter. I think I may be able to slip in one more question you think we can. Do we ever get the question on that multiplicity or whatever. I did not know, but so Google it after we're done. So, see so your elevator pitch here in a data warehouse the data is collected for multiple source systems. So in this scenario does each source system classified as a data domain. Not necessarily. And we have to talk about the difference between the domain allowable values for the database and the data domains that we're talking about at the high level again, the Navy has ships the Air Force has planes that shoot down balloons, you know, blah, blah, blah, blah, blah. I wonder if any of those balloons were Valentine's to that would be really sort of an oopsie thing wouldn't it Shannon. The key there is that we want to have people with expertise we use the term subject matter expert SME. And those are the people who will know the data better than the IT people because they're the business people they know how to apply the data in support of their perhaps locally optimized perhaps organizationally optimized process. So keep the focus on that and allow people to people this gets to another question that's related I believe, and that is whether our data stewards or governance role should be full time or part time. I would always opt for full time individuals, working on a smaller set of problems and trying to get the entire organization covered, and assign all of our data stewards at once around this it's just a things change too rapidly to to try to have a, you know, master plan for for governance which means that governance has to be partly reactive to what's going on and partly proactive to what's going on. And the only way they can reconcile those two pieces is the data model. Again go back to the diagram or I showed the reverse engineering around that those are the types of discussions and we need to make sure that what they're describing is not just the marketing data but very specific attributes of the PII that we've got but the consenting individuals who are sharing their data with us and allowing us to better serve them because they've given us better knowledge. I'm making that last part up but it's it's certainly what a lot of people are striving for for sure. It sounds good. Well, Peter, thank you so much for another amazing webinar thanks to all of our attendees for being so engaged in everything we do we just really appreciate it. Again just a reminder I will send a follow up email by end of day Thursday for this webinar with links to the slides and links to the recording Peter thank you so much. Shannon thank you and thank you to everybody for participating we'll see you next month where we're going to talk about stewards right. Alright, cheers everybody happy Valentine's Day. Happy Valentine's Day y'all.