 Hello and welcome. My name is Shannon Kemp, and I'm the Chief Digital Manager for Data Diversity. We would like to thank you for joining today's Data Diversity webinar, Data Modeling Fundamentals. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag Data Ed. If you'd like to chat with us with each other, we certainly encourage you to do so. And just to know the Zoom chat defaults to send to just the panelists, but you may absolutely switch that to network with everyone. To open the Q&A or the chat panels, you may find those icons in the bottom middle of your screen for those features. And to answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link of the recording to the session as well as any additional information requested throughout the webinar. And as we introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. He has written dozens of articles and 12 books. And Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. The most important and largest organizations in the world have sought out his expertise. Peter has spent multi-year immersions with groups as diverse as the US Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. And welcome to you, Shannon. Thank you as always for a great introduction and getting us started here as well as hosting all of these things and welcome to everybody. Today is data modeling fundamentals and I'm going to use this icon in the upper right hand corner to represent data models in general. It's not a good one and don't let's get into the details of it, but just use it as an icon because lots of those icons together make up something larger called a data architecture. It's something that organizations are constantly attempting to get towards but have a lot of trouble making use of in a very productive way. We're going to start out actually by talking about Seth Myerge for a quick second here. So this is a quick setup here. Seth has of course nothing to do with this but I love his show and he did this bit the other night. And I thought it would be just really relevant because you see outside of the people on this webinar, not many people care about data. So the setup with this is I like to pretend that I ran into Seth Myerge and gave him my business card and he turned it into the joke that is this. Business cards really we're still doing this. I hope your business is waste management because this is going right in the garbage. Given someone a business card in 2021 is basically steampunk. Great. I'll give you a call when I need my cotton gin repaired. Thanks for the business card. It's a great way to be sure I'll remember you in six months when I'm cleaning out my wallet. Dinner receipt. Dinner receipt. All right. This call me if you ever need data stations. And what are those? You have to call me to find out. The only thing business cards are good for is to put in that fish bowl at the diner to see if you can win a free Reuben. Hey business cards get bent you burn. Aside from the fact that the bit is about business cards, the subject of it being data solutions. And of course nobody having any idea what they are is really the message here. And there's just so many opportunities to do this. And what we're going to do today is talk through a whole series of these which are the data modeling fundamentals that really come about observing this. So we'll talk in three parts generally what is data modeling good for then why model data and then how to use data models effectively then we'll get to the last part of this course which is the Q&A and look forward to your questions and interaction as always. Let's jump in and talk a little bit more about this. So first thing we're going to do is precisely define data and I'm going to do that by tossing out a not so random fact although perhaps to you the number 42 is a random number. And I'm going to assign three different meanings to that number the first one is the most popular, which is that it is the meaning of life, the universe and everything according to Douglas Adams's book, The Hitchhiker's Guide to the galaxy. And in 40 years of doing this I've never once failed to come into a room anywhere in the world and have at least one person who has read this book and when I ask what is the number 42 they say life the universe and everything the meaning of life universe and everything. And there's a random fact there here's another really important fact it is a Jack Robinson 42 jersey on that for the fabulous careers that was and 42 could be my age, 21 years ago, if I were trying to determine what whether I was old enough to purchase alcoholic beverages in the Commonwealth of Virginia so there are three random facts that are assigned to the number 42 and it is as I said just a random fact until you pair it with a specific meaning that combination of fact and meaning is a datum. And I gave you three datums they are the collection of them all together are called data. It's very difficult to get into the plurals and the rest of it here, but we're not really interested in all data we're interested in what data can be useful to us, given that situation here again useful data we can also then define the difference between data and information by saying what useful data has been requested. And as nature of those requests turns that data into information. That's an objective fact it's a really wonderful way to do this and it makes obvious the statement that you can have data without information but you cannot have information without data. And finally, to get to the last part which is the most three part charts work is that everybody wants to use the words intelligence over years we've used the words wisdom and knowledge also so anonymously up here but the differentiation as you can see here is what is their strategic use, given that as a context here so those are the three levels data is a combination of a fact with a specific meaning information is when that data is requested. Then, after all the requested data of the subset of that this group that is strategically use that's what becomes our intelligence around these concepts and this is a very stable architectures held up long time, these definitions have been around since 1883 utilizing these let's see how this now detailed model that I showed you here has to be implemented in systems and how it came about as the history of things that this is simply to say that it's old people that are in this for a long time. Now that many of these systems started out as siloed applications that meant the payroll data was connected to the payroll, and the marketing data was connected to the marketing group and everything else and you can see these piles of data still exist even in today's environment. Sometimes, there are some components that helped get us towards these things and we'll talk about those in a little bit, but generally used to have a lot of these things and when we try to connect them of course it becomes very difficult. So, I'm reminded of the game of sorry where you spend the dial and Twitter and whoever doesn't fall down last wins right well not falling down is not a very high standard on this and this is a very complex environment for most organizations to maintain. Over time, we try to integrate these things into some piles and try to define some specific enterprise data so that we can pull all of these things together. And eventually if we can re architect the process and come up with a nice spoken hub model for this whether it's an ERP or around an ODS or whatever kinds of things depending on your application. This is sort of the goal, but this is more the reality that organizations are facing. And there's a number we can put on the upward theoretical complexity of this which is in this case n times n minus one divided by two don't worry that's the only equation we're going to do today. And plug in and and and we'll come up with a number of worst case scenarios, should you have to literally connect everything to everything else which is obviously the most expensive solution in all of this and just to give you an idea of a reality number on here I was at World Bank of Canada. A few years ago actually decades ago, and they had 200 major applications and told me I could use their numbers, which resulted for them in 5,000 4900 batch interfaces that we're connecting all of these systems together so that was a fair amount of complexity of this and let's just take a quick look at what it looks like so if I take in and add a zero to it. 660 and 600 you can see the number increases rapidly, and the number of connections only being 201 on this graph but the number of potential interconnections 180,000 practically and World Bank of Canada actually compares very favorably I'm giving them a clapping hands you know at this situation, because the worst case scenario of course, what are we faced with it all of this is trying to have data go through as many hubs in one form or another as we can that's a non technical term it's a logical term it's a connection. So place where things are at and each of these hubs must have an accurate up to date correct data model to represent the data that's going through these hubs. You use these data models to capture and maintain formal system data requirements they are usually in a physical nature this is an organized purposeful structure, regarded as a whole and consisting of interrelated dependent data elements I'll show you some examples of this as we go it represents. The lowest level of decomposition that is available, when we speak of systems that is systems are always talked about as people process hardware software and data but of course the data part is increasing at an increasing rate and the other parts are not. Data models are by necessity, the most stable system component over time because you can see the role at the center of the hub means that they are going to have all kinds of pressure on them not to change because you everything that goes through the hub, then must change when the hub changes. The business hubs tend to incorporate useful organizational business rules in them, such as the answer to the question kind of project be owned by more than one department. Catch me offline for that rest of that story on that particular one, but the bottom line of course is if it isn't correct at the data model representation, all other interpretations must therefore be suspect. So of course for our organizations is to minimize the interconnections that we have here get a few hubs as possible, explicitly balancing various points of risk. Again, many organizations have gotten in trouble for having to few hubs. So the question is where's the right balance between them. But again, regardless of how many you have you have minimum data model at the heart of every single hub is the only reliable means of communicating the enormous amount of information that hubs are required to run organizations. Also, it is an objective use of standards within and across organizations, even perhaps extending to our business partners that can always be inferred even if the documentation has disappeared it can be retrieved by doing what is called reverse engineering. To get this data modeling is not considered a necessary it skill business and it decision makers are generally not knowledgeable about these topics and business and it are generally pointing the fingers at each other and saying you're responsible for the data and this confusion has rained for decades which meant an enormous amount of data that has accumulated on this today's it environment is likely to look a little different we're going to have applications more and more in the cloud which is probably appropriate and should be in there although if you're really fortunate if you might say raise your hands if you have multiple clouds because that's always lots of fun, but we never really get rid of all the legacy stuff and once again, even in this kind of an environment where there's no databases at all to anybody's knowledge. The data map across in between these software packages and in and out of the cloud is even more crucial to do this. This shared point of understanding where you're documenting and blueprinting a set of commonalities and interconnections that go back and forth between all of these things making that we agree across humans between business and it or business and the data people, but also between humans and systems in general so multiple goals of understanding on this let's take a deeper dive into data modeling, then is the process of discovering data requirements. And once you have that first step done then representing and communicating the data requirements in this precise form, called a data model. We also use data modeling as a process to design data structures that support the specifications around the model of course is a representation of something that exists for a pattern for something that can be made. It can be one or more diagrams standard symbols or charts blueprints. These are all models that we use in daily life and a data model then is an analysis and design method that we use to define data requirements. We hope to develop, not just the picture but also the data dictionary definition with it as well integrating the collection of specifications, pulling together standardized text so that it can be read across various disciplines, generations, etc, etc. And one of the more important reasons for having a data model is that this is one of the best ways to represent not just your data model but also your business model, because data models represent stable shared data. That is, if you're going to share these things across business partners or within the organization, they are required and they make sure that you have these mutually agreed upon definitions, becoming therefore standard data candidates that the organization can decide whether to support and make standard across the organization or just within limited parts. Data models are the skeleton of the business architecture. I've seen so many organizations do this and we'll look at another later on where we can see how these things all fall into play. If you're a gardener, you would say they form the bones of the garden and that would be a term that most people would understand the data portion of businesses have been and will continue to be the most stable part because of that dependency that I spoke about just a few minutes ago, but I've observed this over my 40 year career. One of the things I did early on in late or early 1990s was create the US Department of Defense integrated process and data model for the entire department. Literally 30 years later this model is still in use in relatively unchanged forms we did a good job with the model, and it has been useful as a result of this. The models are also then a prerequisite to deploying as in you can't get money into your organization you can't launch without deploying these things. Many organizations are not able to understand this they have an idea what their their architecture their data models look but they're not able to do as much with them as they potentially could and that represents opportunity costs that they're missing. That said a data model is considered a basic part of any system documentation, but nevertheless they are often missing in the process here. So let's talk about how it supports strategy first of all if you create a flexible as opposed to an inflexible data structure. And that means over time, as the business evolves, your data model may or may not need to change as often as the rest so you can build flexibility and as a conscious design choice you can result in cleaner or less complex code that's accessing all of these things. You can make sure that you can manage which you can measure, you can build in future capabilities just like Tesla unlock Again, particularly useful in a merger and acquisition component here I've seen this happen several times there's just an example. This particular organization had this data model, but you can see in the blue portion that I've highlighted down there below employee. You had to be either a salesperson or a manager, which meant that when they went through one of their periodic retrenchments and told managers to sell there was no component, no support for managers to sell because they had built it out of the system, a more flexible design would have multiple designations, etc, etc, etc, around all of that. Alright, so we've talked a bit about what is data modeling for and again I precisely defined data I said that the data models represent places where the organization agrees to exchange data. And we need to simplify the number of point to point connectivities to the degree that makes sense from a risk and effectiveness perspective, because these focus points of agreement, have a specific goal of making sure that everybody gets on the same page, and we have the same sheet of paper for making music, and that all of these concepts are not just useful in building new systems and talking about organizational strategy but they're kind of necessary around that, particularly the physical implementation of the data components of whatever it is, you're going to field so let's dive on a little bit further here and talk about why model data. But in the following way, looking at why model anything, we're going to talk data requiring the most specific definitions, it is the lowest level of granularity that you have in systems, and it also needs to be perfect. Suboptimal data practices lead to an accumulation of data debt data modeling again, describes details that everybody has agreed must be perfect so it's not just that you agree that they're very, you agree that they're also making the process perfect in there. They have to look at data models as an iterative process from start to finish, and that's a little bit tougher to think about because most people kind of going to want to get it done, and we'll get it fixed but again, and finally data models are used modeling data to document the organization of the data in the organization, and so that's the direction that we're headed. So first of all, what is a model. This is a wonderful picture that I'm building up here from Ellen got some diner who showed it many, many years ago, and it just shows many things that a model can be used for very, very useful set of concepts. This next piece. If you're going to slow internet connection I'm seem to be getting a pretty good one today, going to go ahead and start the video but if it's not coming across on your screen as well you can click on that YouTube video and go see the same thing. I did not put the stairway to heaven music on this but I figured since they had gone to the trouble of leaving it there we'd let it go now what you're observing of course is a model. Now, let's look at this and at the same time sort of pay attention to the bottom half of the screen which is one day say models can be used to represent expertise they can store and formalize information I could use this model that is showing in front of you to actually represent the timing of whatever it is I'm trying to do it filters out extraneous detail do I need lots and lots of things for example. If you watch the beginning of the clip you can see that the individual who started this put his foot on the bar. That was certainly not part of the model but it was probably not extraneous. This gives you the process of an essential set of information allows you to easily understand complex behavior you imagine if I was trying to describe to you with these billiard balls were doing right at the moment. It allows you to predict system responses challenges, etc, etc, gives you the ability to communicate. Whether you're novice with your business person with your human streamlines the documentation models and predicts the responses gain information from the process of just interacting with the model you evaluate various scenarios you understand behaviors. You can see patterns and meta patterns so what I mean by meta patterns wealth as you start to observe things from the world of looking at these patterns. As these data models reveal themselves to you start to see these patterns occurring in other parts of the world and realize all of a sudden that a randomly scheduling system for a dental office bears a striking resemblance to the queuing structure that is required in order to get enough cargo ships unlocked at this one particular port. Again, I made that up as an example but those are the kind of meta patterns that you can get to. Okay, we've got the point here of this particular example it's a wonderful way of describing this I had fun old friend right me on LinkedIn said I could just stare at that for hours which sorry Margie, but shout out to you I'm glad you like this. So, let's keep on this my modeling thing why would you build a house without an architecture the model courses the sketch, would you like an estimate. Yeah, yeah gives you an idea of how long it's going to take and what's going to cost if you hired people from all over the world would you like them to speak a common language. You will see this at construction sites today where the communication through the blueprints is all that is required for different groups in order to work together. You can verify what goes on the models can be reviewed before you build the thing. It was really good would you like to do it again yes absolutely. Would you make a change to it without understanding where we're the various important components. And again, these models document this making life easier for everybody. These data models of course exist whether you like them or not all systems have data models the question is whether your data model is understood. It cannot be understood if it is not documented. And if it is not documented it cannot be useful to you. So, doing a poor job with data unfortunately leads us to, if we do this badly, we lock in imperfections, generally for the life of the application. So you'll hear, that's just the way it is as a common response to things that happen around that. And the failure to do the proposed and existing ones in here that it restricts the opportunity that you can gain from long term data because of bad design decisions, it can reduce the value of collections, decreasing the amount of the leverage in the collections and truly accounting for between 20 and 40% of all to it budgets migrating converting or improving data around bad data in general just costs everything to take longer costs more deliver less and present greater risk. Thank you to Tom Marco and I mentioned data debt again this is the idea that people haven't been doing this well that the data models have caused data debt that the time and effort it's going to take you to get back to zero is a very challenging piece. And getting back to zero just means you get to undo a bunch of existing stuff that night, might or might not be in your existing skill set of your organization, but then once you're at zero you still are typically starting from scratch and this requires an annual proof of value. And now you get to become good at both of these things well. The nice thing is that data challenges are focused on some sort of exchange I want this you want that some sort of exchange is evolved. And that does give you a value component that you can incorporate into your prioritization on there, but there's very little guidance that optimizing the data management practices we're mostly at the area of trying to convince people to get started with these things the optimization part is a wonderful challenge and it can be done but we haven't as much guidance in that area at this point very little guidance however and getting back to zero. And so again just a challenging area that just like everything else slows progress decreases quality and increases costs in a step by step, unfortunate fashion. Now you may think well, how bad could it be here is a query from a random customer. And again, nothing wrong with the query it's a good query, but this particular organization had never heard of this particular, excuse me, the opportunity to do what's called query optimization which is finding out if there are things that are potentially repetitive was this put together perhaps lappily in the first place the answer seems to be yes, in this case because here is the restored query comes back on this. And while it didn't save but a you know a quarter of a second every time that it ran. It turned out that it ran a lot. And when you add these things up they do become large they do become countable. Most people use the phrase death by 1000 cuts but the problem is nobody is dying on this case so we call it unnecessary bleeding and lots of cuts and until these cuts become more tangible where we can put some dollar signs on them and allow people to really understand what's going on there they will not be able to relate to what it was, what it is we were describing I'll give you a very quick example here this is from Forbes last year. The American Airlines market value was $6 billion and their program was valued at between 19 and 31 billion same thing for United their market value was $9 billion their mileage plus component was valued at $22 billion both of the yellow and blue statements cannot be simultaneously true. It's just impossible so we're going to accumulate this much data debt is my theory towards what's going on why they can't unlock their own value in there. And this little section here is going to go into three sort of meta parts again sub pieces that we dive into. We're going to look at the process of discovering analyzing and scoping data requirements and figuring out what are the data things what are they doing what how do they interact as we're trying to describe in the data model that we're going to look at a quick section on representing the communicating the precise form of it the, the, the critical aspects of it. And then we're going to talk about the differences between iterative and various modeling types in here so let's go this first piece, just to see what happens here again the idea is for organizational persons places or things nouns. Whose information needs to be created read updated or archived. Again we call it a crud CRUD matrix. The attributes that go with it does are critical importance in here so an organization might say hey here's a thing, and there's some description that we have here in order to do this. If this case, we're assuming that sex to be assigned refers to male and female things that are supposed to say that all things may have a status. It can be assigned to females, the characteristics, maybe unique, for example, an ID permits everything as distinct, excuse me identification of everything as distinct from the others. And description is likely to be unique for each thing because it's not a commonly shared across the thing types in this case. What do we get into the next part of this well, looking at them precisely is really key so what is an attribute this characteristic of an instance in a collection of business things about which we create reading information for example if I give you the attribute club ID that tells us a little bit about it just for starters. For example, club needs to be identified separately from each other. Each club in theory would have its own unique ID the pound sign there is a term that we use a mark that we use to indicate a primary key and that and that the club specific information is likely maintained because you're using a club ID as the source that tells the rest of the information on that entity is likely to be about clubs in there. And then finally then some concept exists above the club level there's probably organizations if we've got clubs. They sum up just some level. There's probably more we can do that's just from one attributes that we've looked at here attributes describe an entity again here's a literal one not perhaps necessarily the best one but the attribute values are these characteristics of the interest is a business thing so for club ID number one current promotion whatever that happens to be is in there and it may have different information hold on to this. Then, once we've defined the attributes in each entity we want to relate each entity to each other again poorly designed in this case. I'm not going to teach you how to do this well I'm going to give you the fundamentals on this. Doing it well is really much more straightforward than you think. So, these are natural associations that are going to occur between these entities just saying the club club member and I've got something over here on the bottom right called a cluster connector that is seems to be some way of connecting these. This is not a good way to design the data, and if people were using the system they would likely be expressing frustration about the entire process. Let's look at a family of variants for starters here real quick we have our good friend and colleague Peter Chen and then it's a notation are Charlie Bachman had invented a style. James Martin had invented a style and Clive Finkelstein had invented a style and Peter actually is the last person standing at this point Clive past this last fall unfortunately, but he's he's the one that most people are picking there's been lots of argument about these things and disagreement. It's not hard to learn one from the other if you walk from one place to the other it's like a different accent, rather than a different language, but goodness sake just pick one. And let's talk specifically about the variants we can have exactly one, you know one or many you can have eventually one puts a little dimension of time and they're 01. There's going to be zero or many on this and finally, eventually one or many. So these are possibilities that you can have in terms of the relationships that relate to each other and again here's just an example for the little icon that I used here. A bed is placed in one and only one room on the other end, you say a room contains zero or more beds, a bed is occupied by 01 patient, and a patient occupies at least one or more beds. There's lots of laws here. How does a patient occupy more than one bed what turns out the answer is a time dimension which is not categorized anywhere in this data model at this level of detail. What if the bed is moved that becomes an interesting thing and just out of curiosity will ask the question. What is a room so this is, first of all real good reason why every data model is incomplete. So you are accompanying it with a proper data dictionary that defines what is room as an entity and then what are the attributes of room that are made up in specifically around bed and patient again everything in the data model. Here's another example here just fairly straightforward things can be related to each other each thing to must be accompanied by a thing one and each thing one must be accompanied by a thing to that is a possibility it's not a definite but this is a specification here and they do occur. There's some specification on this bed is related to a room that's nice and interesting but we can be a little more precise about it. That is related many beds are related to many rooms that's a little bit more precise and we can get even further on this and say, many beds may be contained in each room each room may contain many beds. So we're looking at a subset of the overall process in a different form of the specifications depending on exactly what goes on and again look at the nagging little question. What if beds can be moved on here so we're diving in a little bit let's go one step further. The process is iterative and I've watched this occur over literally generation so organizations become instantiated they are integrated into data models they're very interesting as components. These components, authorize and articulate specific information system requirements. This is the way the process must work if you're not involved somebody else is doing it for you or somebody's making assumptions but again you cannot build a system without a data model in their question is anybody paying attention in the process of understanding how this works out in the larger needs of the organization to exist as a program, rather than as a project level of course, we need some feedback around this as well. So we in fact satisfy the need that we were attempting to, in this case, this is going to occur some data models change a couple of times over a decade. As I mentioned before it's a very complex process because so many things depend on the data model hence it is critical to get them correct in the first place. The data models are developed in response to specific requirement to when I say requirement. It is a process where you're trying to answer a specific question in there. So let's do the last piece of this little mini section here. Again, what do we have in the way of modeling types conceptual logical and physical. So a really wonderful way of describing this conceptual is that everybody has a picture of any sort of concept of what but no real concept of how logical, eventually gets you into concept of how somewhat for getting you into the physical all the way we'll repeat this process in a little bit. Again conceptual is sort of what logical is very nice how and physical is is in order to do this. These are more precise definitions on the screen here in front of you, but the idea is depending on what type of question you're asking you should adopt different modeling postures, because you're addressing different questions around there. Again, this process here is focused all on understanding data structures, and I've given you in yellow up there a yellow definition of organizational information again very computer science. But actually quite quite useful here and we'll talk about it but my bottom line is the fewer of these things that you have the better off you are and if you can create incentives to reuse these things rather than invent new ones. It will simplify your life in the long run. So these data structures are the grammar if you will that goes on they constrain the data object, whether it is unique whether it needs to be ordered in one form or another whether it needs to be balanced whether it needs to be optimized in one form. So each of these are characteristics that one could describe here and again here's customer in a definite information engineering representation of what we're talking about on the previous screen so as you can see the components are details and they are organized into areas that give us this sense and reality of intricacy in order to do this that the larger components are then organized into data models and this gives us a sense of dependencies in there because now you are starting to put things on and say this cannot exist and whether these attribute values are mandatory or optional, etc etc So we assemble these data models that I mentioned before into architectures and architectures really get to purposefulness in here so in data modeling terms attributes are organized into after entities that are intricate in nature in order to do this, and there's lots of examples that we can talk about So these entities are organized into models again badly structured data constrains the ability of the organization to deliver lots and lots of examples around this as well. And finally the models are organized into architectures to give us the largest component that we have in here. If you're seeing this for the first time this is the data international body of knowledge. And I'd be remiss in saying how proud I am with the teams that have put this together again over generations. In order to do this data modeling as you can see highlighted in yellow is very clearly called out in this which is the analysis of data systems the design of data stores that are in there the implementation of those and there's additional data effort usually around preparing and coordinating with integration around this. So we've talked a little bit more here about why modeling anything in the ideas we've got to get down to a set that can be managed because we can't actually manipulate the reality of it that hopefully now you have a good data requires precise agreements that if you don't do this well you have likely accumulated already data debt. That's in your organization that your data modeling must represents agreements that have to be perfect in that sense that you're going to be looking at these things over and over again and that's the best way to think about them being developed, which is why it's so important that they maintain is correct documentation and finally again they are basic system documentation so if nothing else you can go to any textbook open it up and say hey, here is the documentation on here okay our last section here is how to use the data models effectively in this. And the first question is people are going to ask when they walk into your organization where are your data blueprints where are your data models where are the things that we need to have. You can read them correctly and carefully and precisely communicate this information back and forth. Again, not everybody's going to ask for them. Some may be able to read them some may not they may have others that they can pass them to. But these are really very, very important collections of metadata that you must have in order to do this because everybody asks for the same thing it's a universal language for some reason it's not universally created, which is unfortunate. There are correct ways to organize the data and the idea is, we can determine that flexibility, maybe more important than adaptability, or that retreat ability is the more important. Again, lots of different optimizations can occur within there and the techniques allow us to include different components that are there we're just going to go through them here real quickly. The idea is, first of all, as much as you can and any time that you can remember and repeat repeat the phrase smart codes bad dumb codes good. Let's all do it together smart codes bad dumb codes good. Why is that the case. Well, my collection of telephone numbers in Richmond, Virginia, this fall November of 2021 lost its ability to dial within 804 area code and why is that well we simply ran out of numbers we never thought we would create a situation where we would run out of the ability to signal long distance by putting a zero between two other numbers and that would always signal in that case long distance again smart codes bad dumb codes good all the telephone equipment in the United States had to be switched over. In order to do that it was a complete hardware transformation very much like the y2k thing not nearly on the same size since the bell system absorbed at all but it was still a problem. Another example I had a dean who's a very nice gentleman but we had these courses that were business computer courses on our university's list already and he's kept saying to us, I cannot add another course for you because you've used up all of your numbers. That's a really bad way to talk about education that you can only have 10 courses because that's the scheme that you used, or a large organization that's involved in logistics. It's got a number of various like the y2k problem again, they're going to have to expand their customer number from x number of digits to x number of plus digits. In order to do that and they've already discovered that more than 100,000 systems in their existing environment are going to have to be changed well that is an enormous amount of complexity in order to do this. So, again, smart codes, bad dumb codes, good. When you look at tables, you're going to be looking at the idea of generally relational kinds of management I've got one in the upper right hand corner there that it would be a typical one that somebody might do, unknowingly, again, if we understand what database characteristics are expressed, we're less likely to introduce risk into the organization. So, for example, in this first example, the table just consists of song and album. So I'm picking a really wonderful album that's come out recently on this and it's just got the song and the album that are on there. And you might say well doesn't the length count and no it's not used because the way itunes and I mentioned itunes is going to be in there but it's actually they have music now they've switched recently, but I've persisting calling it itunes because I'm old and stuck in my ways. Anyway, they don't use length they use start time and stop time and why. It's more flexible and less risky than it would be to use the actual time in there because that time might require conversion where start and stop time at a constant pace will not require that particular area so somebody might create this again I'm made note officially here it's not itunes anymore it's called music in here but you can see there are the the perhaps somebody unwittingly developing a database around this music app so first question to ask is could this be bad and the answer is yes so what information would be lost if I deleted record number one. If this is the only database in there at all. I would have what's called a deletion anomaly so if I deleted record number one out there, I would lose the fact that purchaser number one had purchased cool walk live in there but we would also lose the fact that cool walk live costs 99 cents. And that second effect is undesirable and unintended. We've got insertion anomalies that are prone to badly designed data sets data structures as well. If I'm trying to insert this number five record down there on the bottom and the faded in there because it doesn't work. I'm trying to add that this is a fact that I want to add this new song and this new song cake walk live costs 1.29. That's the first fact the problem is, I'm trying to enter this without having a fact to, which is to say that I can't insert the full row until I know who the purchaser ideas if I just put in purchaser number 9999. That's going to cause bad results and again undesirable and unintended in their update anomaly is going to be some mostly problematic if I want to change the price of this, because I put it in there wrong the first time as 1.99 it should be 1.29. I have to go through and examine song and read through every instance of it and I'll get that first one, but I won't get the second one down there because it's spelled differently and that's a really bad way to make those kinds of changes so the question is how should it be done. Let's go back to the original up there in the right hand corner and store as much as possible, one fact per row. So the fact that's being stored here wrote to is a good example. Purchaser number two has purchased a song and the song has a specific value in that case. Let's move on and talk a little bit about the idea that definitions are what people ask for all the time so if you ask people. The definition of your entity or your attribute they'll give you the dictionary definition. We want to do more than that with our data models and the idea is to instead of saying bed something that you would sleep in we want to actually get to the point where it becomes kind of useful for us here so a purpose statement that describes why the organization should maintain information about this business concept talks about the sources of partial list of the attributes that are associated with it. And we're looking right here at an association that says one room contains zero or many beds that are in there and that's where I'm pointing that particular piece to so that's a way to describe it in there. Also, in your models in your data models you should assign a status to each attribute entity and relationship, and all those statuses should be draft, initially, because if you put them in as draft it will give you the opportunity to go back and validate it and somebody can say, you know your data model is done is it not me so well no it's not done until I've changed the drafts into the validated pieces. A way of giving yourself not just more time but also a better approach on the process because you're unlikely to be perfect on your first attempt with all of this anyway, by the way, I haven't noticed on this as well. So we may have a bed transponder here because I'm sorry. Hang on too fast there we go I changed it I wouldn't want to get off that slide just yet, because the other part of this was to tell you the story, which is that this was a DoD piece that we were doing for the public health or Veterans Administration I forget exactly which system this was that we were working on on as I really have the data models in my attic because nobody else wanted them. Silly silly silly stuff around this. What we found out though was interestingly that the the beds were going to be the way in which they were going to keep track of the patients so they're going to put a transponder on the beds, and the beds were going to keep track of these transponders that were going to tell us what room each bed was in. So the first question is, are we missing bed transponder ID as an attribute likely. That's going to be the case but second of all we found out, this probably idea was not going to be a good one. Because, for example, we asked the question, what room is the hallway. Oh, I hadn't thought about that so can we incorporate the concept of near a room or just in a room right and again that's GPS, which we give you lots and lots of components around that, whether that's technically feasible or not again, if it's not in the data model it's not the specifications if it's not the specifications is not in the system if it's not in the system people generally going to be unhappy for not having the functionality but they're looking forward to getting. Let's picture Fred Brooks if you're not familiar with Fred Brooks he's the mythical man month are the guy that said you know you can't just take one month of peace and create a baby by working in parallel it doesn't work that way. Lots and lots of these around here but the idea was that he also said the data representation is the essence of the programming and if you show me your flow chart and conceal your tables I'll be mystified but if you show me your tables I won't need your flow it's obvious. So the process is used to define analyze and understand these data requirements that gives some sense of the purpose of what's happening in here that you have ideas of constraints that are going to be supported by or not supported by depending on which information systems your data model is interacting is it internal or is it between. By the way Jason is often considered to be a representation of the data model usually. It really gets a degree upon data structure it may or may not be considered to data model depending on what you're doing in there. The idea is that if go in with a specific question and your data modeling don't data model for data modeling sake, put at the top of your paper. The purpose of this model is to answer the following question and there's going to be a lot of specific variations that are going to permit efficiencies around this let me just give you a couple of examples. Really quickly here's a data model that describes clearly the difference between the relationship between account charge bill and subscriber and more importantly, the model purpose statement says this model codifies the official vocabulary to be used when describing aspects of any of the following organizational concepts subscriber account charge and bill. Maybe very clear, this organization using data governance properly, and in a modeling as a tool has specified the controlled vocabulary to be used throughout the entire organization. I'm not suggesting that this is a good idea for your organization, but it was a good idea for this organization in order to understand this. Let's dive a little further into these data models as an example second. Again, describing here the use on the upper left hand corner in the blue and I'm just going to call out. I'm going to dive here which is a one of many interpretations that are coming out of this but there's nothing in the data model that prevents an automobile from being rented to multiple customers there's no checkout service there's no the automobile is not available, or has a status in that set of concepts, could that potentially be a conflict later on it depends on your business model. Number three example here to two things on this one price notice is not part of catalog catalog is the table on the bottom row there again just describing at the high level of the entity, which means that variable pricing is probably there in there you know something component wise people are going to do different prices get different prices. And the database can't tell what part of the word of the invoice pertains to because it doesn't have any section of it it just gets an amount and pay but it doesn't know within there with the larger structure is necessarily around that. And one last one here, again a model for a hospital system. Again, official vocabulary defined here, and saying that this charge in this case there must be a reason a one to one. One correspondence between admission and discharge. And so in this case is kind of sad but discharge by death must be a disposition code was not something they wanted to have on their documentation but it turned out to be unnecessary and proper in order to do that. This next slide is very busy in the sense that I'm going to show you nine different modeling options options that you have in terms of data modeling across the space here. I'm going to try to memorize these please I just want to give an overview remember this is all tapes you can come back and check it out later on. We start with where we start most of the time in the academic world of doing what's called forward engineering building new systems. Remember only 20% of our dollars are invested in building new systems today and 80% of our dollars are staying with existing systems here now are three part pieces that we did before the what we need to do the how we're going to do it in blue and the image itself in the gray on the right hand side. Again being conceptual a data model sitting at the middle there and implementation as the other end of it here. We do 80% of the time working on existing systems we should get better at reverse engineering we are getting better but we should get better still, because it is a necessary but not widely understood skill. So our first data modeling is to recreate the initial data implementation or second is to recreate the original designer third is to recreate a data model of requirements, turns out the data requirements are the most objective and most testable form of requirements, therefore the most manageable and least subjective, which makes them enticing to specify and be used as proper. Our fourth way is to reconstitute the data design by looking at the existing physical implementation to reconstitute the requirements by looking at the existing design will then cross the line between new and existing, popping down here if we're not going to do any changes to the requirements we're going to redesign the existing data. If we are going to change the requirements we need to come down that far left loop. In order to do that by redesigning the data, and then re implementing on that so they have nine different ways, lots of metadata back and forth oh my goodness how confusing let me take it a little bit more piece by piece. I'm sorry. Again, once again, hit that too fast. There we go forward engineering is this process of going from what to how to the physical implementation again, we do that this is the only thing that we teach young people which is very problematic I believe from an education requirement since we spend 80% of our time working on existing but that's a different argument that we have to have in here so forward engineering is building new around that we're going to now go look at it reverse a little bit more detail again backwards. And there's a formal definition a structured technique aimed at recovering rigorous knowledge of the existing system to leverage enhancement efforts. And we may, depending on the situation need to go back to the requirements many times you can simply stop at design. In order to do that so I'll stretch it out a little bit further in here put in our line, dividing the new and existing again, we pull to the left there for going to change our requirements. We need to reverse engineer the existing system to understand its strengths and its weaknesses every system does something well, and something poorly we need to understand the difference between the two, or we're not going to be able to utilize this information, going further. Then of course it's incumbent on us to use this information designing the new system I've seen people reverse engineer and then not use their knowledge and go straight with something else just boggles the mind. So when you have this much information now you pull it all together in here. So model evolution is kind of an interesting component in here to most of the time when people do data implementation they do what's called forklifting. So it's not how to do it but it's what is done. And they bring it into the new system in there and they map it in with a spreadsheet and again first thing you need to do is get rid of that forklift as a concept in there and understand that this technology dependent component to go from a to B needs to go back through the design again first reverse engineer the existing system to understand its strengths and weaknesses, particularly when you recognize that it's a data model but to reverse engineering in here that data model is going to be the foundation of things going forward for perhaps decade and use this to inform the design of the new system. So the way this is taught is typically we say you start out here with your physical as is then you go to your logical as is then you go to your logical to be and then you go to your physical to be and that sounds very nice, but nobody understands why and the reason that's in fit in sufficient is because you need to go backwards to pull this physical as is to the logical as the logical as as you start then incorporating different components of data in there and this is where your logical model changes in conjunction with the business, you formulate here as is and go forward in order to do that all of this modeling takes place within a system that looks kind of like this, which is to say you have your conceptual logical and physical just what we've been talking about all along and validated and invalidated and every modeling change can be mapped to a transformation on this network, each modeling cycle has its own specific articulated purpose key is to keep them focused on the data model purpose. So locked in this room is the way you want to say it. We're going to understand the relationship between soda and customer we have to make sure that that is correct, because we need to walk out the door with the data model of that relationship we need to understand the difference in a different situation between our hospital beds. If we walk out the door we can talk about the top three characteristics required to manage hospital beds that our mission is to put our systems handle job sharing if we were going to implement it tomorrow in response to a court order for example well. Now here's our existing and again we're not exactly sure, but we're going to have to make sure that we implement this very very rapidly in order to do it. Another key to this is don't tell people that your data modeling just write some stuff down arrange it a little bit. Then make some appropriate connections between your objects becomes fairly straightforward. Again you identify the entities from your notes, you identify a key for each entity, you draw a rough relationship between them what themes to be connected. You identify the attributes with the attributes on each specific entity that we have as we map them back and forth. Don't stop there though, again that iterative process is important because with refinement you will then discover new ways of organizing new ways, they should hopefully your model gets more stable over time. And the important thing to watch for not just the fact that it's done or not done in order to look at that and you may discover that there are other associations that you need to have in order to bring it there. All right, so finishing up here we've got, again, times and the type of modeling that you're going to be doing you're going to be collecting primarily at the early cycles of your modeling. So you should be done with the collection and just being analysis at that point in time that you're going to have some coordination requirements you almost always do in that process but they should decline over time and that you should be able to do increasing amounts of target system analysis which gives you the data model in the context in which it's actually doing the work and that your cycling time changes from refinement. Excuse me from validation to refinement in there that you get to the point where you have time to go through and validate the entire model and look at it. As is going on. There are lots of places that you can go to for data models I call the metadata models again here are four books up there. This is one of the models you buy Len Silverstone's book for example bottom right hand corner, and it comes with a CD ROM if you remember how to find me use those things anymore. You can put the CD ROM and then they get the data model it's fantastic what an amazing thing. All right, so we're finishing up here the goal, trying to get to shared understanding that data exchange is automated and it's very highly dependent on successful architecture and engineering, and it's got to go fast which means that data model has got to be perfect or any imperfection stays with it the length of the system, the modeling characteristics evolve. The model is problem defining as well as a problem solving activity, and the use of modeling is more important than the specific method that the models are living documents, and that they need to be available in a searchable manner, and that the utility is paramount that you need to be able to add color and pictures on it, if you need to in order to come up with all the bits and pieces around all of that. So again, quick note pricing on sale for the books, and we've got some upcoming events hopefully see you next month for data stewards give you a minute to think up some things and we get to the point of questions so I made the finish. And there we go. Yay. 101 slide Shannon. Thank you so much for this great webinar as always has been lots of questions about the slides so just a reminder, I will send a follow up email to all registrants by end of day Thursday with links to the slides and links to recording along with anything else requested throughout the webinar. Now diving in here, custom models require a lot of prep work what do you, is there what tools can you use to rapidly prep the data which can easily be flipped in the data model. I think the questioner is asking about your acquiring a pile of data, but you don't have a model of it and how do you get there. And that that is sometimes a lot of prep work and sometimes it's very easy. You'll notice for example that many programs allow you to import a CSV file on to the web services or something along those lines. So there's certain predefined things that you can can get to work reasonably well. Reversing engineering again becomes important when we're looking at this because the idea of saying there's a data set there I don't know exactly what it means, but if I can reverse engineer it or again big data components, but it's this thing is big data so we'll call it big data technologies but big data technologies can can dive in and come up with the schema less reads again depending on what your shop has. Many people though I've helped people read their Oracle catalogs, because sometimes they don't have any more definition and not in the Oracle catalog is a wonderful metadata repository if you know how to read it correctly. I think that answers the question Shannon I'm not positive though so hopefully, let us know. Yeah, absolutely. And you know we are a vendor neutral company we try and say vendor neutral but you know what modeling tools are out there that are available for help. I reached to my handy Erwin right is the the one that I go after for for most of the reverse engineering because they incorporated that early in there. That's great, like probably others that you know that because I think you know the space better than I do. There's lots of tools lots of really cool things I think it depends just on what your needs are and what tools what other products you have out there that you need to connect with and all that gravy stuff. You know, people often interchangeably confused between data model and database model. It just has some light on a clear cut between the two. Great question. Shannon I had to smile to because you actually said something, you know, was fun with our little geekiness on here so we're great all right let me go back to slide here. And the question is asking the database model of this let's do two different things. So a database model is the actual model of the database and it has to exist on the right as built if it is not built it cannot be running. If it's not running it's not built so as built virtually anything that is as built can be reverse or to some degree we can pull that data back and understand what the data design of that as built model is that can occur for a database and it can occur for a component of the data architecture, or just for a software system without the database they can also use data models so the data models are irrespective of the physical implementation. The question then between logical and physical. Okay, we'll call the middle column logical the left column the red column the conceptual logical the middle again and on the gray on the right hand side the physical. The physical would be how you implemented in database x or database why or whatever component that we're looking at or is implemented in cloud offering x and y that are out there. The logical component then says well the cloud may implement it that way with a physical implementation maybe this way, but in order to do what I'm trying to do to get to my most simple and my foot my most effective design around that. That is the key, and that can only be uncovered not by looking at the physical implementation but as a logical a technology independent representation and that's our definition of the difference between physical should have a one to one physical correspondence with the components that are out there in digital and that are out there, and the logical would have more of a conception, so more of a correspondence to business concepts that we need to speak about in there. Great question, thank you. I love it so um. Oh, so get this question a lot Peter so should a data model be the responsibility of business or it. Yes. Next question. So, again, the questions are asked like that all the time and what I say is that it would have a very difficult job doing it all by themselves so the best answer is to have a cooperation that occurs between the business and the it and the it go and say this is what we know and I understand about your business as by the way that's very different than asking them tell us about your business, because while people are happy to tell about the business if they have time if they don't have time they don't want to tell you about their business. On the other hand, people are much more likely to if you show them a design of some sort and say, This is how I understand your business right at the moment. Am I correct or not they they're very happy to add it and tell you what's not right about that and that's great. So, again, great question. Thank you. So, oh, my sorry my quit my questions has moved on me so will the golden record definition be the input for master data model logical entity relationship in master data solution. Well, it depends on on how you're going about it but in general that's the concept that you're looking for. So, again, I've read so many data strategies over the last year that say we're going to have our data input once and reused all the time those are wonderful aspirations but making all your systems actually get to that point is a very difficult component so. I think it's, it's the idea that in most of the context when you're looking at how all of these are organized, one has to understand that the formality of it is really the key. If we don't have it written down somewhere you have people who are guessing, and the guessing is not the answer that leads to rapid success let's just say that. And I might have got off track on that one. Did I get the question. So yeah and so the golden record definition with input and master data model in master data solution. Yeah, so MDM is a strategy not a technology and so if you're working towards that strategy that one of the components of the strategy is that you need to have golden records and this will help to identify those golden records or can be a help at identifying golden records I'll tell a funny little story on the way. One of the things I was involved in was the dody 2k component I actually created a data model for how data flew through the business side of the Defense Department. It's not on the doors like yeah hi we're here from this you know place we can't talk about but we understand you did this I guess I did. And we'd like your copies. And I said oh well be glad to make you a copy and they said no no we're taking your copies as well because turned out the putting of all that stuff together was classified higher than my top secret at the time or whatever it was that we had around that. Anyway long story short, the models are very important in order to do that so key key to them. So what is the effective structure of data architecture diagram conceptual LDM PDD and data cataloging tool. It sounds like they're asking a specific question about a specific tool but I think we can still get to some value there so let me go to the logical physical model slide here. There we go. And this section in the beginning of that I've got this poor presentation. So in my head alright and I didn't hit the right one there we go. All right so again conceptual logical physical. And as I mentioned before, all of these should be linked by an integrated data dictionary, the last three whatever it is that you're going to call it. We're using a common controlled vocabulary for the things that they have data models on, and that by itself is just as important as that first data model example that I showed you all that said this and this company says, this is the way we are going to do things. And they weren't that vociferous but they were quite clear about it and they wanted to make sure everybody understood it because they realize that the lack of standard vocabulary all along had been the one of the major sources of that bleeding unnecessary from too many cuts stuff that we did a little while ago. Anyway, I think I answered the question Shannon, if not somebody will clarify for sure. Indeed. So what's the difference between DKAM and DMBock to data management framework. Much for this one was to do a whole webinar on that one Shannon. A great question but it's just a very complex piece of know we're not going to get to that here. So what is the point of having enterprise data architecture which is unreadable shouldn't we be building data models based on critical data elements. Yes. Again, one of your duties as a modeler is to determine what is an essential component of the model. The model should only show essential components. So if flavor isn't important to your model and don't include flavor as an attribute. I'm not at all suggesting anything for against flavor as an attribute. So that is that is what your role is is to make those decisions around that. And the idea of having those decisions made by somebody who's not as familiar with them should be scary to organizations. So they shouldn't be just saying well, because this happens right the process happens whether you want it to or not it must happen in order for these systems to be field somebody is making a data model and maybe a good one it may not be a good one. And I wouldn't want to leave something that's going to be around for as long as data models are to chance in any way shape or form. Sorry I got on a high horse again Shannon did I get the question in there. You did indeed. And if the question has any extension to that definitely and I'll get that asked. So how is data architecture different from data model. How they both interact with each other. Yeah, so the idea is when you have a data model, that model is a part of your data architecture, and the architecture is comprised of the models that you have that make up your data architecture. It's unfortunate that we don't have better definitions in that I don't even have a good representation of it in here just this sort of multicolored blob if you can see where my cursor is waving on the screen there. There's one of many data architectures that I have, but but the question of readability is of course important. If however, the architecture is made up of model and let's let's just for sake of argument here for the sake of the questioner. Pretend that our situation is such that we have orange everything in orange on this chart. We have as part of our data architecture and everything else is relatively unknown. So the next step that we need to do is we need to go out and we said the green is the next area that we're going to attack. So we have the green pieces that are up here in order to pull all of this together. Each of these can be components to the system each of these are data models, and we may only have at the end of the cycle the green and the orange parts. So those still should be readable because they're built of data models, so that all the green and all the orange is now accessible to it and we have less of the overall organization that we need to have in order to look at it and as you're prioritizing projects as they're coming across desks, you can say look if I do this one I'll get this piece of the puzzle, and I'll answer that and I'll save some business on it. So I don't realize that these insidious little data problems are not represented to people as data problems right nobody's jumping up and down going, this is a data problem instead what they're doing is they're saying, oh well it's just Salesforce it sucks. And again I'm not picking on Salesforce, but if you, as most organizations seem to today, implement Salesforce, stick the data in and then decide that maybe you should have cleaned the data before you put it in Salesforce and then turned it on. If you cleaned it in place, users can't distinguish between Salesforce sucks and the data that's in Salesforce sucks. So they just say Salesforce sucks, and they try to cleanse that data in place. Let's let's put some things in place let's understand the Salesforce data model. It offers some wonderful opportunities to cleanse the data on the way in, so that we don't end up with it stuck like that and then if that happens to be the the red part of the data architecture over here the red part of the data architecture that becomes another piece of your architecture that you know more about. So what I'm seeing in most organizations know some pieces of their data architecture and they're part of their job is to figure which are the important pieces that they need to know, and which are the pieces that they can leave unknown, at least for the moment, in order to do this because let's be frank, most organizations are operating without good knowledge of their data that's just simply in terms of the numbers that are out there. So, very true. So what data quality dimension is mostly affected by a poor data model. That really insightful question. So there are two types of data quality problems. The first one is the what we call data quality. It's the idea that you've got the wrong field or the wrong value or something's not right or, for example, if you pull my credit report which is sort of weird, you'll find that I own a office building in Texas. I don't own an office building in Texas and everybody who works with me knows that's not the case, but nevertheless we joke about it every time because we can't seem to get it off my credit record. I don't have any money on it, but I just happened to own it, which is kind of cool. Obviously data quality here. Those errors are where most people's understanding of data quality stops and they can occur through of course the architecture the model, the representation and the value of the data item as well so it's a relatively close universe. So part of this, this may be what the questioner is getting at is what we call structural data errors and those structural components are very difficult. It means something along the lines of that presentation here so let's get to it. Jason, if I had plotted that quite planted that question I couldn't have done a better job on that. Alright, so here we go we're looking at club and club member. And you can imagine that the club would probably like to know who the members are, and the members probably like to know which club they're a part of seems like you know just sort of basic questions that we'd want to ask. I couldn't figure out this was an example I was pulling from somewhere else, how you would get from club to club member. So I put in this thing called a cluster. I had whatever words you like there but it's got two primary keys for the two other entities that could be used to cross match them. That's a really bad data structure and that if the software was served that way of Salesforce work that way it does not but if it worked that way, you would have all kinds of things where people be going. Oh yeah that's right and then their Salesforce is well known problem, having to do with connecting clubs and club members, through a very difficult type of join in the process. So that's the question of really that's a badly designed data models are going to leave those kinds of errors in your systems and like a P under the blanket, you know or whatever fairy tale that you want to think about or anything but it's an irritant, it's going to last, and the data part of it is going to stay there so it's going to persist in that format so it's worth the extra time and effort to make sure that it's done. I by the way request that data models be provided a software that we're doing business with as well. Because I would like to know how the cloud component is implemented or how the software as a service component is implemented and whether it's going to complement or whether it's going to conflict with my existing process architecture as I have it. Again, hi horse, Shannon, but that was a really good question. Do you like it when you get on your soapboxes. It's good. So Peter with reverse engineering you rarely have the descriptions of columns and tables in the database so going back to the analysis phase is only partial. So how do you fill in the blank source should you as much as you can in fact that's how I cut my teeth now. The only component of that was a really interesting one and I'll go ahead and tell the story Shannon even though you heard a little bit of it last week in your interview series, but it's such an interesting piece so let me. Yes, you get a partial set of information when you're doing reverse engineering, and my first job at the Defense Department I had the title us do the reverse engineering program manager. My job was very simple there were 37 systems that existed in the Department of Defense at the time, and we needed exactly one which meant they were going to be either 36 or 37 losers, depending on what Peter decided to do. One of the things that I was very fortunate to do was we invented the technique of data reverse engineering as a solution to that problem. And then I was ordered to write the book on it so there's another whole story on the side, but I did about $50 million worth of reverse engineering activities, both in preparation for Y2K, and in the process of going from 37 systems down to one system that won that particular title on that that was the DoD winner. And the idea was we had to get all of that data around and together and we had exactly the questions that the questioner asked some of the systems were old some of them had documentation some of them did not have documentation. So we inferred what we could. We tried to get a hold of people we were able to call some people out of retirement, believe it or not and get information from them. We put together a series of hypotheses and said it's either and then we'd ask subject matter experts and they'd go oh well definitely not these two so it must be the other one. You know things like that there are ways of obtaining more information from the factual information that you have. It's not a trivial process but it is very much like a puzzle solving process so again, obviously took me back to some fun days that I had I put up over that whole process because those were some interesting times that we had to learn about all that and all our challenges around. If anything, my task was simple because I had a mainframe. Right, whereas everybody else out there's got thousands and thousands of distributed processes and I've had an experience in that area so we'll let the youngsters figure this stuff out. Anyway, great question. Thank you for how much huge quote unquote future proofing is suggested for example abstract party can be individual or catering to international business currency codes through business is is a domestic today the style modeling could be future proof but a pain to model and generally business not happy at management response to too much future proofing. Let's work based on what you have today. Very insightful question it was always going to be a trade off it's going to be worth some things and not worth others. Let me give you a tangible example that I can talk about that was quite easy to determine the value proposition. I worked for a number of years for Deutsche Bank and their back office trading system is called DB trader, Deutsche Bank trader, and it was the best on Wall Street at the time because it possess three specific advantages first. It was a real time system they didn't have to start and stop it to. It was multi currency so they didn't have to do currency conversion they can see instantly what their various positions were at any point in time on the day. The architecture of this system was table driven that decision meant that they could build new products without recompiling the system by simply making entries in a configurable series of tables and the folks that they had there were geniuses at it. It was a wonderful system and it was specifically architected for that specific purpose. That's the good news the bad news was that all that wonderful intelligence and business and data models were all encapsulated on something called a Wang computer and Wang computers had been obsolete for almost 10 years at this point. And they were slowly going away so we had a limited amount of time to make the changes and going across and encountered exactly the situations that everybody's raised here, and truly I'll say this as well because I think it was insightful. The CIO at the time was very, very wonderful individual who never quite got what it was we were doing, you know, I was the data model or so you sort of expected me to be data modeling, you know, whatever that was but he did say that when I look at my shops around the world I go to Singapore and I go to Bangkok and I go to Tokyo, and I see the data models on the wall. I recognize them, and I recognize the people are using them. So I know you're making valuable products that my team needs in order to support this implementation. You know, I wish I understood it better but at least I understand that it's important. And gosh if I never got any other message through to him that was definitely the message that I wanted to have taken across to him so he didn't understand the data models and their utility. He didn't understand the data models. As far as that goes. Thank you for again let me put a ticket for a minute on it. Indeed and I think we have time for a few more questions here. So how much does data warehouse modelers have flexibility to redesign or future proven a better way in their own playground for example, example, dimensional models, or are the unfortunate groups stuck to consuming poorly designed source as is with risks. If you have the opportunity to develop from scratch there is actually a well established pattern, I think now I have not read bill in his latest book, but I'm very familiar with where he was up to this most recent moment and that is the idea that we now understand a different architectural construct in the data warehousing space called data vault, and that gives the opportunity to encapsulate business rule specific data in a way that is associated with other data, if you have the meta knowledge around it now again I'm getting a little bit esoteric here but it is now recommended best practice that you start out to design a data vault. And then if you need to, you can easily divulge off into a dimensional or a relational or both type of activity at that point but the guidance at this point is very much to start with a data vault first, and then go back and get familiar with it Google Dan Linstead and data vault and he's got a wonderful website that can get you started in that area. Great question and all of of course, those components that you describe require modeling. And unfortunately there are rather large number of poor quality known commercial products. There are a lot of specific names around them that have three letters and other things like that but yeah there's there's a lot of discontent over that. And perhaps the thing that you could do is take solace in understanding the existing data model that's there, and being able to look at the advantages and disadvantages that it has and maybe being able to design those into downstream products that can perhaps compensate for the bad original design I've seen several caching schemes overcome some of the ERP components that are problematic around that. But again, it's a fairly abstract problem but clearly bad data model we're not going to have much luck with it and you can hear the disgruntled mid and the questioner. So, can you explain the differences that we need to consider while modeling for a relational database versus cloud databases big query hive AWS. The real key that everybody forgets is the logical data models are going to be the same in all circumstances, because logical only expresses the business requirements. So your conceptual and logical layers if you're going that pathway should not be dependent in fact it is against the rules to use the word database during any of those first activities the requirements activities the things that are on the left hand side of your screen. When you looked at the things that are the what we're trying to do how then is a design decision and that's when you are allowed to pick a database or again cloud technology or whatever. But all of them should have some components of data models with the exception of course of the lakes where they index everything and those are appropriate for certain circumstances as well. So, depending on what it is that you're trying to do, but in terms of modeling, absolutely you're modeling the business requirements at the logical level. That should not change and if those requirements are not well specified in a model at that point I would hesitate to imagine how anybody would develop a solution that was a response to those requirements since they hadn't been developed in the first place. And I think we've got one more question and just enough time for that question. Um, could you put up the slide that has the pricing on it it was the next to last slide. The pricing on it. Is the next to last slide. I mean the books. Event pricing. Oh, thank you. Where's that pricing on that pricing on it's on the title. Go back. I'm a dummy open pricing. Okay, there you go. Oh, it's event pricing because that way you can get the coupon off while you're I'm sorry, Shannon didn't get this. It's good. I love it. Yeah, and we will. And of course, just a reminder, I will get the send an email up to everyone by end of day Thursday with links to the slides and links to the recording so you can have those as well. And all that information on how to get Peter's books and more information from him. I love it Peter thank you so much as always it's been fantastic presentation to all of our attendees thank you so much as for the great engagements and great questions coming in. And again just a reminder also get that follow up email by out by end of day Thursday. Everyone thank you so much. I hope you all have a great and fabulous day. Thank you very much.