 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer for Data Diversity. We would like to thank you for joining today's Data Diversity webinar, Conceptual versus Logical versus Physical Data Modeling. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Aiken. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A section. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, the Zoom chat defaults are sent to just the panelists, but you absolutely switch that to network with everyone. To open the Q&A or the chat panel, you may find those icons in the bottom middle of your screen for those features. And to answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link of the recording to the session as well as any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Dr. Peter Aiken. Peter is an acknowledged data management authority and associate professor at Virginia Commonwealth University, president of Damon International, and associate director of the MIT International Society of Chief Data Officers. For more than 35 years, Peter has learned from working with hundreds of data management practices in 30 countries, including some of the world's most important among his 12 books are many first. Starting before Google, before data was big and before data science, Peter has found several organizations that help more than 200 organizations leverage data specific savings, which have been measured at more than 1.5 billion US dollars. His latest project is Anything Awesome. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. Oh, you're muted, Peter. Sorry about that. Oh, you're still muted. Perfect. Now we can hear you. My apologies. Great. Welcome. And I blew it entirely by having my new microphone cluster not ready and where you go. But it's welcome to everybody today. And yes, the topic is conceptual versus logical versus physical data modeling. And just briefly, what we're going to cover is an introduction to modeling data in general, because it is contextually relevant to what we're doing. We'll talk specifically about conceptual, logical and physical modeling activities. They should all be seen, though, more over as a transition from one to another. And the main way of understanding that transition is this thing I call the washing machine diagram that's located to the right. Here you can see as it is and to be on it. We'll go through that in more depth as we get started around this. In fact, let's go ahead and just jump right in and start talking about this, which is really an interesting topic. And I had three of my dear colleagues reach out to me, David Hay, who has a wonderful book on the topic here on YouTube that he'd love you to get to listen to as well. Dr. Bernard Talheim, who's done some of the really fine academic work on this. And these are just a couple of papers that he's done that you might find relevant out there. And our dear friend, Gordon Evers, and he has a take on this. All three of these gentlemen reached out to me in advance of talking about this topic and wanted to make sure they got their perspectives. And it was absolutely delightful talking to everybody. What we're all collectively concerned about is that there's just an immense amount of data being created. And I could read these individually back to you, but just understand that it's increasing at an increasing rate. And it is useful to sort of get an idea that most of the data that we're working with was only created in the last two years. And that represents an amazing context for us, because data is growing like this. And our analysis capabilities are simply not even close to keeping up on this. You have some measurements that they love to throw up, but it's still pretty bad news all the way around on this, that if we want to do better, and this is, of course, assuming that the data is in really good shape. Unfortunately, most data is sort of a dumpster fire, I won't say all, but certainly much more than we'd like it to be. And not understanding what bad data looks like means we have lack of respect for understanding, in this case, architectures, which all organizations have, and could be a variety of different types. But one in 10 organizations actually tries to maintain one or more of these architectures. Some of these are understood, which is good. Again, you have to have the architecture if you exist. But if you don't understand it, if it's not documented, how can anybody understand it, it's going to be very problematic for you. And more importantly, the understanding has to be met by a shared group that is comprised of business users, also with the technical users, also with the systems users on this in order to make sure that we have full understanding all the way around of how all this works, because absent that, we create sand in the machinery at the absolute lowest level of the organization. And what we're talking about is very simple things, such as a common vocabulary. And the vocabulary, in this case, are going to be the nouns that are going to be attached to the models as we get to them in context here. But the challenge is that most data is encumbered by some form of data debt. And you really have to undo some existing things to sort of get to a zero spot before you can try to go deeper into it. But these these data challenges are vastly underestimated in most instances. Let's give a very precise definition to data at this point in time. We may throw out the number 42. And if you happen to be a fan of Douglas Adams, you would know that the number 42 represents the meaning of life. And the rest of you who have no idea what I'm talking about, hold on tight. Number 42 also was the number on Jackie Robinson's jersey. And in fact, the number of the excuse me the title of the film about his life. So there's another meaning for the number fact 42 that's there. I will go ahead and put a third one up because it's always good to do it in threes. And that's just the question of this Peter allowed to buy alcoholic beverages in the Commonwealth of Virginia. Well, 42 was my age 24 years ago. So you can do the math and see that in fact, I am eligible to drink alcoholic beverages. But of course, nobody should drink and drive around that. Well, but little deviation there mainly took your mind to starting to think about 42 as a fact, but couldn't be paired with different things to mean different activities and situations that are here. And only when the data is paired with a fact and a meaning, does it become in fact meaningful? That is the definition of it. And we also have to find a subset of data that is more useful to us than everything because everything as we've already seen is impossible to keep up on. Data becomes information when it's requested. And that form of requesting the data is the main thing that differentiated because it's still going to come back and say 42 is the number of Jackie Robinson's Jersey. If that's the question that you asked, but now it is transformed into information because somebody hasn't asked for it. This gives a wonderful. You can have data without information, but you can't have information without the data. Again, it's why they should be managed together as opposed to separately and all sorts of other components that go on around that. We've just got one more stop on this particular little journey here. That's the idea of what is intelligence. And intelligence oftentimes it's been called knowledge or wisdom systems, but it's how is that information used in the organization? Those are the three ways we can objectively define this architecture here. And that's that we're all on the same page. And these definitions we've been using them. I ran across them in the Department of Defense back in the 1980s around this. That thing that just bounced there was a data structure. That's a range of data showing that, again, useful data is a combination of a fact and a meaning and useful data when it's provided in response to a request. For that, it becomes information and when it's used strategically, it then becomes intelligence. But in mind is the data structure because there's a generalized data structure customer sales order, sales order line product, a way of pinning down customers and tying them to products that have been purchased. And we can look at a good computer science definition that says a lot of really important things. But what we're really talking about here are the rules for the grammar. What can customers do? Are they unique? Is there a balance? Are they arranged in a certain way? The more important question on all of these, how many do we want? Just as structures. And the answer is, gosh, as few as possible. So if we ever have an option of creating something new or generally keeping what we have and perhaps modifying it slightly as opposed to creating a new version, that should be our answer in today's data world. It's just a kind of crazy to operate otherwise. In fact, the numbers are so intense for some organizations. It becomes really problematic. If you look at this hypothetical situation of just six applications connecting each other, if everything was truly going to be connected to everything else, it would be 15 interfaces and the number of applications, six times six minus one, five, divided by two, gives us 15 on the top there. Well, that experience takes us away on this. For example, this number was given to me to use by the Royal Canadian Bank a number of years ago where they said they had 200 major applications and about 5,000 interfaces between all of these. Let's just see how that works on a comparison basis. If we look, you can see six going to 60, going to 600. We'll put their piece out there at 200 right there and 5,000. They're clearly better than average and they still have a very large and complex situation on their hands. Again, the fewer of these things that we have in an organization that's got to be a stated objective goal in order to work towards this. Each of these models can be put on a framework. The first one I'm going to show you is as is represented by the little four icon tool kit that's there in the left-hand side. We have three versions of as is. I'll come back and tell you about the minimum. The other is 2B. That's what we'd like to have. We have existing in the brown and we'd like to have in the blue sky version of that below. It's a three-dimensional model. We also needed a dimension going back. That is whether the model has been validated or not validated. That's going to be true whether it's an as is or a 2B model. You can now see we've got a space that's relatively defined. We just have one more row to put on it. The labels for that are conceptual, logical, or physical models. Each of these models at the conceptual, logical, or physical level can be either as is or 2B. If it hasn't been validated, it is unvalidated. We're going to keep that in mind. Again, it's just sort of a way to think about it. When we're doing as is and the way we've taught people in school to do this is that they build the requirements which are in some form of a leaf-leaf binder of some sort. We then create them into a model to come up with our logical model on this and then get the actual physical implementation of the system as it's put together. Every modeling change that we make can be mapped to some transformation on this framework here. Really, that's the best way to think about them is what are we trying to get to? What are the focuses, the goals of the specific modeling that we're doing? Again, as I've been saying before, most people start off with sort of a forward engineering. These are assuming validated models and they're simply building these. Now, there's a challenge associated with that and that is that when you get to the real world, we spend 80% of our dollars in the real world making existing things better and fixing what's gone wrong with our existing things and there's only 20% of our dollars go into building new stuff around that way. While we teach people how to do only that as part of school is just sort of crazy, but what we really need to pay attention to is the aspect of reverse engineering. Taking the existing systems, understanding them and as my friend Elliot Chikovsky wrote it a while back, evolved existing systems using a structured technique aimed at recovering the knowledge of the existing system to leverage enhancement efforts and that's of course the key to it. We want to go from physical as is to logical as is and logical as is to conceptual as is in there. If we need to change the requirements, if we don't, we can stop at the logical how in fact the re-engineering process more most organizations are attempting to do right now what we're doing relatively poorly is to reverse engineer the existing systems. After all if you don't understand the existing systems strengths and weaknesses, how are you going to avoid replicating the bad stuff and how do you go about making sure that you recreate the good stuff. Again as I mentioned before if you go to the requirements level and change actually the requirements you have to reverse engineer twice in order to come up with that. The next part is to go from your existing as is requirements to to be requirements. Now you have been informed by this you use this information. There's also another opportunity as I mentioned before which is just to go design to design. If you're not changing requirements it's perfectly acceptable to go straight there then make sure of course everything syncs together and put it in place so that you can get things rolling on that. This is the proper way to do it and yet what happens in most instances is that organizations work with the data directly from the physical as is to the physical to be without a single thought about what goes on meanwhile and that is of course crucially harmful to most of our organizations. This model I just turned on its side here if you will maps relatively speaking to the anti-spark stuff and I've got the reference to the Wikipedia article right there but you can see they both deal with the conceptual level and a physical level and the part they differ on is a logical level in between which we found in the data modeling community to be quite useful again depending on exactly what you're attempting to come up with in order to look at this. The conceptual model again is focused on abstract requirements logical is considered to be a refinement but not necessarily an abstraction in many instances and that's one of the things one of our contributors pointed out in their presentation and then the physical implementation how does it work in Oracle or in AWS cloud or whatever else it is you're doing but you should be able to change that back end in a way that doesn't impact the user so of course we all been through those pieces where we have in fact been impacted in a real negative way around that. These are all mappable to John Zachman's framework in this quadrant right here if you look at the inventory and process representations of what and how columns close up again they map very nicely onto conceptual logical and physical implementations of those models. Another way to think about the utility of each of these is the way they build bridges interestingly enough and this is I've been here to this one but this is a favorite of mine in terms of what they've done here the old bridge excuse me the old way of going was that yellow line that I'm drawing for you right there took a lot longer and was problematic enough that they were willing to strip this area entirely and make it a wonderful marvel of this it was focused on improving commerce around there. The conceptual model on this case was at the entity level it provided its focus and scope to the modeling effort was rarely maintained that once you get the original one out you just want to make sure it looks close enough to the product the logical then was really the plan to take the what in the conceptual and go to the how transition from logical conceptual to physical in there and this was the plan that's needed in almost all data modeling instances in order to do this generally these are developed to the attribute level and understood as a third normal doesn't mean they're maintained that way in order to do that and they really need to be refined until it becomes close enough to a solution that you can purchase maybe tailoring it as opposed to customizing it in today's environment and really having a good understanding that the business characteristics that you're modeling are going to complement those that your business is trying to achieve in the process of doing this finally these are maintained by more organizations the physical models again how are we actually going to build this are the blueprints for actually building any of these out and they are absolutely used on an ongoing basis in order to do this now I want to spend a second on this quick animation here I'll voice over here briefly just to let you see that they were putting this piece together where they were taking the bridge and actually just moving it out you can see this is just together like this and when they got the bridge together then it became structurally intact only then after they put the two pieces and actually connected them was the building bridge in this case stronger as a result of that connection that it would have been weaker otherwise and you may say to yourself wow that was pretty incredible how did they do that well they had one problem they couldn't push sideways on any of the piers that were up there this has become a very common way of building this and they use this wonderful device here they gave oops sorry I'm going to go back and do that again the sensor what happened is these things came together and that they were stronger after getting together very much like your databases will be stronger because it would have structural integrity as opposed to not around that again topic way beyond the we're focusing on today on this second part of this is really brilliant how do you move these bridges so that they put really downward pressure on them and the answer was you drag them up a slight incline that was about two feet of lift that they got to the entire bridge and they pulled it forward and then hold the wedge out from underneath it and set it down directly on top without applying any side pressure to it whatsoever just a really interesting way of doing this and it has produced some really outstanding engineering feats this is just one of them in order to do that they needed all three of these kinds of models to sell this bridge to the public in France so that they would pay for it in order to do this well let's get back to data on this and within that same context of where we were selling the bridge conceptually you might ask the question how are components expressed as architectures and the answers that details are organized into larger chunks and that's a pretty intricate process on that then larger components are organized into models and the models in this case become our data models but they also introduce physical dependencies in our data you have to have a record of a customer before you can do any charges with customer record and then overall those models are introduced into architectures which now really focus strategically on purposefulness we're going to keep all three of those intricacy here where they're put together in a way and here's going to be a little thing that we've done up here thing club ID club description club status sex to be assigned reason and reservation etc then they're organized into models so we now take these collections of entities and organize them and again into purposefulness now one of the questions people ask all the time is why are there no really good examples of data models there are actually if you go out and look at the DoD data model that we did more than 30 years ago I believe at this point it's still in use around DoD in various forms or another but it's much complexity it takes a long while to become used to them on this more importantly is to think about that stuff that I was showing you being a collection of stuff in a data dictionary a glossary a vocabulary level whatever you're going to call it in this case and that these data models and the architecture as a result is developed in response to specific needs so the organization understands the specific set of needs that they have to address from a data model perspective get some specific information system requirements again capture all that that we can in the trusted catalog whatever we're going to call it in there and now you can see we go around and iterate on these each one producing a slightly refined evolving version of the product in this case the product is the data model at the center as we're learning how to use these this is in order to avoid what we see too much in the industry which is the situation of the princess on the P as it's properly called from hence Christian Andrews I circled the P at the bottom there that green thing and there's the princess at the top who is sleepless as a result of having this imperfection in it now it's not just as trivial in this case as the princess with ADHD but perhaps more along the lines of a serious defect in a product can haunt it through the out entirety of its product and there's several companies that have had these kinds of oopsies as they've gone out because if you start out with a flaw data model it locks in these imperfections for the life of the product it restricts data investment benefits in the future and it decreases your organization's ability to leverage consuming in the process of migrating converting and improving 20 to 40% of all it budgets lack of all these things takes longer costs more delivers less and presents greater value in this case. Let's get a couple of quick examples. Here's an example of a database for what used to be called iTunes is now called the music in the Apple ecosystem over there and if you look I've purchased three rows of data in this case each data is associated with the song and there's the price on those pieces of data so yay I made a database great. Here's an interesting question that we like to ask what information would be lost if we deleted record number one and in this case if we deleted record number one we lose the fact that Peter purchased we met today as well as the fact that we met today cost 99 cents that's two facts in one row of data that's usually unintended and undesirable around and these are some simple tests that you can go through yourself and take a look at various data designs as you're presented to them here's an insertion anomaly suppose I want to insert a new circle named scuba and it costs $1.29 I can't enter it into this database until I have an actual purchaser for it so we can't insert a full row until we have an additional fact about that row again undesirable unintended update anomalies suppose I'm going to change the price of we met today from 99 cents to $1.29 the process of going through and examining everything in song and changing all of them as well as a complex process and of course if there's any spelling errors in there I'm going to miss them they're not going to work either point of all this being that there are correct ways to have data organized and it can be done for on the basis of flexibility adaptability retrievability all sorts of things and the bottom line when you get to all of it and this is not a course it is that you're going to try to have as little as possible one fact stored with one row of data so that may be our original are better set of doing it is to maintain a master set of tracks records songs and that when I change the price for those songs the price of that song is repopulated out to the entire database and there's a one to many relationship between each of the songs and the pricing database and each of the purchases that a purchaser has on top of that the problem is your IT workers don't know this your knowledge workers don't know this and it just becomes sand getting in the gears in order to do this let's go into the conceptual modeling here again this is the pink chartreuse if you will level this up there it still has validated and unvalidated and has is and to be as well in order to take a look at it the goal of a conceptualized data modeling effort is to harmonize standard vocabulary again I mentioned the three way that we had to put together before in order to get it to work focusing on strategic issues that are there and not on details really on how is this new system going to help us achieve organizational strategy in a more satisfying way we shouldn't have an unvalidated data model that just doesn't make sense would you want the word draft on your strategy for the organization I think most organizations would pass on that option so helps organize the the data concepts give us some ideas as to what the relationship of data things ought to be around their documenting these in such a way and trying to figure out our version of these system wide definitions as we go through the entire process also we're likely to get a somewhat of an idea of what a high level process interaction would look like given that type of an interaction with the organization all the discussions of architecture going to involve at least faster battery cheaper and my good friend Michael Adams used to have this on the back of his business card he said we offer three kinds of services you get to pick any two of them yes what a wonderful thing again good reason for not trying to do too much around this but it's a tried and true method in terms of that and if everybody says they can get you all three at once they can probably sell you the Brooklyn Bridge as well we need to focus this the real key of course is strategy we didn't use the word a whole lot until about 1951 the business people discovered it and discovered you could sell people 100 PowerPoint slides or 100 per 100 page report for lots of money this really made strategy into a thing again where is the strategy I would look it up well that's not how to use to how it was originally used in the military the definition is a pattern in a stream of decisions and this is much more of a process than it is a thing in order to do this Walmart's for example Walmart's former strategy was everyday low price you knew this Walmart knew you knew this the entire like crew going to Walmart Bentonville airport understands this it's cemented into everybody's head everyday low prices exactly what Walmart was about for many many years and it was a very successful implementation of the strategy example number two when grits in soccer he's definition of strategies he skates to where he thinks the puck will be after all if you're playing a game where you're chasing around a solid little rubber ball there and it travels much faster than human beings do you want to be where the puck is going to be as opposed to chasing the puck or you will never succeed in this particular game lots more of it at the Wikipedia entry on Wayne grits keys entry at the Wikipedia site up there one last example here Napoleon facing a larger army is in blue the British facing him and the Prussian in red for the British black for the Prussians are facing him and Napoleon doesn't know exactly how to get out of it the answers of course divide and conquer you're looking for a pattern in a stream of decisions around that let's review that last one real quick first of all the key is to hit both armies in exactly the right spots so that they'll break apart then we're going to have everybody turn to the right and defeat the Prussians and then turn to the left and defeat the British I know by the way can you do this please wow somebody is shooting at you it's not an easy task argue that the complex strategy and of course if you know anything about history you know that in fact did not work for Napoleon around this so strategy has to be simple here's even a simple strategy that didn't work for organization that I used to work for many many moons ago that was trying to gain additional efficiencies around this but by making a manager and a salesperson different types of people they weren't able to capture data about them and they were unable to gain efficiencies during a time when many other organizations were able to gain from those efficiencies other uses of strategic data models include the saber creation of the flake booking process a TNT credit card business literally overnight Amazon selling satisfaction or overnight delivery Amazon sorry Capital one reinventing solicitation around all these each of these are very much data base centric copies and they the actual data that they used at the center of this was considered to be organizational property at the highest level of security etc etc etc so let's again change topics slightly and just talk about the process of data modeling and what you probably want to start with is the idea of identifying entities big blocks of person places or things about which you're going to create read update or delete information about then you want to identify a key for each of the entities how am I going to identify a unique instance of that I'm going to find out this Peter Akin versus the one that's a lawyer in Florida versus the one that is a rock concert promoter in Ireland and after we've identified the key we draw a rough map of it back and forth to connect the various relationships that are there then we identify the data attributes and partial them out among the various entities that we're taking a look at it's perfectly okay in fact it'd be surprising if it didn't your model didn't evolve slightly at first if it's still modeling or evolving or radically changing you're not focusing in on a solution and that's of course one of the measures of this that you want to take a look at in order to do so you should discover new things in your data model incrementally and they should fall off into diminishing returns platform here's a very specific model that we did this is a logical model comprising five model views from the taxpayer client governance program delivery and vendor view for our state department of social services it was a number of years ago but still reasonably cogent on this so you can see here that the taxpayer view had information on payments taxpayers benefits social service programs and that the client view had information on payments well sorry I'm going to go back payments clients by benefits and welfare agencies in other words each of these was focused in on that point that was of most interest to them governance goth surprising was most complicated in that context around this that a program delivery view these are what our partners are going to see so each of these represent perspectives on the organization and the logical model is a great way to show how they all fit together in that type of a context looking at this finally the vendor view which is where do I get paid you know all that edit of course sums up to a very nice view of it I've mentioned before this glossary it's the start of your enterprise taxonomy it defines the initial entities for the conceptual data model it engages the business community on providing terms and I want to tell a very specific story about a use of a business glossary that I'm quite proud to have observed and learn from over the years and has to do with Nokia pre Microsoft acquisition they were a tremendously gifted company they had gone from tires and rubbers to consumer electronics phones on this and when I approach them at the phone stage fins are bilingual because 2% of their population speaks Swedish and they don't want to be impolite to that 2% Nokia also wanted to play internationally on the stock exchange and elsewhere and mandated the use of business in all business settings so fortunately for me who didn't speak Finnish or Swedish when I went to the meetings they spoke them in English lots of these words were unknown and outside of data modeling entirely first of all made it culturally bad not to ask questions so when the word came up that they didn't know they would look at each other and if there were two people that said yes we don't know what this means they would build a common vocabulary they would literally reach for their notebooks and they're to see if there was a golden version of the term in the Nokia term bank if not a quick vote by the work group would decide whether to include it in the Nokia term bank it would not be automatically included into a filtering group that weekly reviewed the submissions and put out a new version the new versions were by the way published as a single web page and the only access mechanism was getting access to the web page and then using the search function of the web page to search this is about as inexpensive as you can come up with on all of this again Nokia term bank wonderful experience unfortunately we were curious we were taking pictures of this thing that was in many of the offices and we asked questions about it and they said oh that's our cruiser collector and we said what is that they handed us a 50 page document that described much more in detail than they had about any of the data models that of course we were there to discuss around with and remember this is the environment that we're working in first we're reverse engineering something of an existing system and then we're going to understand that information as part of the existing excuse me the to be design and only when we use that design are we going to be using it next in order to do that let's look at our next chunk then the logical models that are here again this is the middle the orange row they can be validated or not validated they can be as is or to be remember every single one of your mappings goes into one of those so if you don't understand where you are and where you're trying to get to you're probably don't have a good enough focused session around this again from a logical perspective you now start to talk about size and shape where's the data going to come from what are the functions the downstream uses gives us the ability to do this free from technology considerations of course we know it's going to go in the cloud until the cloud continues to get hacked and then it'll all come back on-prem but you know that's just the way it goes I'm going to start to document some original and preliminary data designs that show how we're satisfying specific business objectives so the as is they're going to challenge the conceptual model you make them up with a difference which is great shows refined thinking in most cases explicitly incorporate relevant components starting to incorporate your as is stuff the two B's are going to be the principle around when new systems are built using that common vocabulary in order to do this now most of the time people will tell you when you're doing your modeling to use a definition so here's a definition of a bed something that you would sleep and I was taught by Clive to do it much more focused which is to come up with a purpose statement here we have one called bed it's a data asset type that's a principle data asset type so there's a hospital bed up in the right hand corner to give you a conceptual it describes why is the information maintaining in other words why are we maintaining information about beds now interestingly it was that second clause there it contains information about beds within rooms that prove to be interesting on us so we've had a source interestingly we have some attributes that we're using to describe it on there and we have some associations and this is where it got to be very interesting one room can contain zero or many beds how are you going to tell well this time you're going to put a tracking device on each of the beds and the beds were going to tell us where the patients were being lost in a draft form because we were able to show them that they had neglected to understand what is the nature of the room of a hallway or an elevator both of which broke their model and how is it going to be the case that we're going to maintain this information tracking once we get outside of the hospital itself so for outpatient services and things like this again just minor things but it was we were able to go through and focus in there so fix that piece validate the model move on here's another question that you want to ask at your logical level you've got beds and rooms great there's a relationship between them what should it be a bed's related to a room nice tempt but not good enough let's go a little bit deeper here we go many beds are related to many rooms well that's a tough information solution problem if you look at it carefully you'll figure out why and here's the final solution in this case for a particular instance which is that one room can contain zero one or more beds in each room that third requirement now gives you the most precise focus on it you will get very different systems and a very different people depending on which of those three requirements are instance what if beds can be moved about the rooms right and now we have other things that pop into place these are the five cardinality options that are postulated across these again you can have exactly one one many eventually one again it's a nuance of data modeling probably for more advanced topic but you're saying you're adding the element of time to it saying that you'd like to have in there same thing zero one or many optionals coming out along with each of these is going to give you a different way to look at it let's look at it a specific example here again this is our room and patient in bed situation and recall in this case we've got a one to many relationship between room and bed and a one to many as many relationship to patients and beds so bed is placed in one and only one room and room contains zero or more beds in this case of bed is occupied by zero or more patients and a patient occupies one or more beds don't ask too many questions on that it gets sticky really quickly this gives us the ability to come up with these conceptual data models and maybe like some of the information on this page here where you're simply looking at what is the relationship between an employee and a sales rep and all the rest of these you can see there's some one to many and one to one and other types of things there's lots of notions most people use what's called information engineering and really it's a matter of picking it to do here's another example of a work product that comes out of one of these exercises one of the groups may come up with a representation describing the relationship formally between account subscriber charge bill in this case and you can see it's formally tapped out and the knowledge worker to the organization say this is the way we will speak about these things in the future giving us again the ability to focus in on this type of an activity making it just really really easy to get on the same page and eliminating in the process often hundreds and hundreds of tiny tiny cuts that are stabbing in organizations because they do a bad job of this so again just remember we're reverse engineering we're using that information in the design of the new system and we're creating it and our data model is one way of transferring some information between these various stages as you're trying to move things around the organization lastly with again our so again this is the bottom end if you will of the the scale here going to have certainly has is and to be and we have validated and non validated and those terms become much more important in this type of modeling because you're looking at evidence based type of information these activities are more related to archaeology than they are necessarily between creativity some very interesting books we can talk about that at the end there give us some ideas but the idea is to look at and to be able to create what are the actual data structures that are used in the various data flow diagrams and entity relationship diagrams throughout the organization to continue to populate the dictionary glossary catalog whatever we're going to call it together so that we have these and there should be a one to one component verification between the components in the physical as a system and the physical as is model another question is why would anybody be hand-making DDL with today's tool capabilities they are so powerful that it's a matter of regenerating rather than trying to correct DDL they had to have that in order to put the system into production as well remember that business is going from the what to the how very critical on there because these do become the blueprints for the solution that need to be maintained in order to keep these things there's some interesting aspects from physical perspective again they should be the foundational system one of the tricks that people use when they're testing this as they walk into an organization and have a stopwatch and see how long it takes to produce the actual results that are requested there sometimes it's very close to seconds and sometimes it's very close to months in order to do that how do we go about accessing the data that's actually in this system we can oftentimes apply semi-automatic reverse engineering techniques to this again my first book was on that very subject for the US military in order to do that again taking a look at the specifications for accessing the data the application what are the current and future data elements that are maintained by this and again these things can be maintained semi-automatically in a way so the data is going to be persons places or things that need to be created, read, updated or deleted some people had the word archived onto that in order to come up with a CRUD model or a CRUD model these are attributes their characteristics of the various things that we have in each area here's an example we're looking at attributes and relationships we can talk about clubs and region so here is a region entity with a club ID as a primary key there's another one with a club reporting with a club ID and you can see that's the common key between the two again notice the crows feed one region has many clubs reporting to it but a club does not report to multiple regions other attributes of interest include name and weather and what does this tell us right away? what does clubs need to be identified separately from one another not sure particularly why but it's clearly a business requirement and that club specific information is likely maintained as part of this description around here that some level of organization exists above the club level which is again this region concept you can see by filling out some additional parts of this each club must be part of a region as a business rule that is implemented by this particular data model if you don't get this right it's very difficult to change in production almost impossible to change in software we look around the uses an organization may decide to characterize parts of things in a way I've got set up there on the right and whether it has these tables that are allowed so all clubs can have a status many reasons can be assigned to a reservation ID permits every club to be distinct from every other club and description is likely to be unique for every particular club so this gives us the idea that we can use this the model variances are really focusing in on trying to make sure we have as many things in common as we can because if we have too many of those data structures we're going to spend all our time transforming between data structures you think I'm getting I've seen it that's out there again what are the data things for what do they do how they interact we need to understand this because data member is going to become the most valuable thing of our assets going forward it maps critical business needs it contains essential data to the consumers functioning as a kind of sheet music again good set of musicians that are all able to read music are going to ask for the sheet music before they start to sing just knowing that things go better if you're singing off the same sheet of music and that the metadata is essential to other business functions the process is iterative it may include logical physical and conceptual models as you're trying to figure out but once the modeling is done make sure that you're trying to achieve a specific goal and not just for modeling sake now these are the five basic structures that you'll find at the heart of most production data a flat file a index sequential file a network database a hierarchical database and a relational database and then there's everything else and everything else is it really doesn't have much to tell you about it's usually a predefined or everything's index kind of thing nothing wrong with any of them but you're still going to come down to one of these it's important to be somewhat aware with them because these outperform the other types of databases in terms of production functions almost what happens almost always is that you make your breakthrough in the things that are listed in purple there and then you transform it into something that will be productionable so that you don't end up running into a cliff given that situation here's an easy one to understand just from an architecture perspective again you guys know that I'm a university professor at VCU we had something originally here called a student database master and that that's by all the parent child relationship between that and each of the other components even though you can't read them and please don't try I'm not trying to break your eyes but showing that one parent is related to many children in there and that this was the way in which this data was organized and ran for the university context for many many years perfectly fine instead of tasks in order to do this we were propositioned to a point with a alternative and we simply send us a model and they sent us this and we just had to laugh because there's absolutely no correspondence between these two models in terms of showing what one will do the more important part was that while we certainly knew this was the physical as is model that we did took that out of a class of students actually did that this was they couldn't even tell us was it a data model or a conceptual model a logical model a physical model again we couldn't read anything on there as you can't either and it was just a very unsatisfactory time but if they can't explain this to you and answer the simple question like is this a logical physical or conceptual model and is it represented as isn't too bad to be and has it been validated that's going to be a problem out here the differences between model levels are not in one position or the other just the same way as on the framework it doesn't work that way but that there are differences here's a conceptual model for HR again we're not going to go through this I'm leaving this as reference material for you to come back and look at because of course you know we're recording this but it gives you some information conceptually about what HR is going to take care of and a logical model which turns out to be actually simpler than that in there and a physical model overview we're going to look at the details there's the physical piece part one part two of four part three of four and part four of four you may say to yourself why am I looking at these well the idea is that these have very different differences it's a terrible thing to say for very specific reasons again different models are communicating different pieces to different audience your physical as is going to be closer to your technical people that needs to have the same common vocabulary so that when you discuss logical or conceptual models with your business people you're able to in fact actually participate in a meaningful conversation around this and again remembering in all cases we're reverse engineering first and then understanding our existing the only time you don't go through this process is when you're starting a brand new system for a brand new company and that does happen occasionally but not on a fairly regular basis so we're going to do some quick overviews here and I've got a wind up material in order to start again where we started was back at the top of the hour with an introduction to data modeling looking at three types of data modeling plus a couple of characteristics that we looked at them but to say that a data model is done so that you start to solve a specific business problem or answer a specific question in order to do that I'll give you an example of that coming right up conceptual models that are motivated by understanding architectural tradeoffs incorporating strategy in the data modeling and starting the concept of putting together your common vocabulary logical again trying to get to simplicity so as you take your original conceptual model it might be messier than you thought and somebody else who's trained at design work which is a skill in and of itself we don't teach anymore will permit you to simplify the original design motivate things towards standards and make sure that the business meets the strategy in that and then we'll get to the physical solution which is to say that we need to understand what's actually happening in our systems we're documented either before or after very quickly in order to do this ideally it's printed out so that you can use it to build and generate in this case physical construction of the solutions and used as maintained as physical solutions on that so we're headed towards the Q&A part but a couple more things to sort of sum up but there are correct ways to organize data all involve modeling data so if data modeling is not being done it is incorrect just by definition on their flexibility adaptability retrievability risk reduction these are hallways of optimizing your design which is a good reason to take do the transfers that we're speaking of in order to do this I give an example with the music database techniques include data integrity remember when that bridge joined and made it stronger data integrity is very much like that remembering things like smart codes are bad and dumb codes are good we could talk about that in the Q&A part if you want it a little bit but just imagine those of you that are older and I have to tell you I was just back from China where they have a saying in China that Dama is either gray hair or no hair on that and I followed about this categories and some respects but they certainly understood the concepts around figuring out some of these old ways of doing things architectures again things are just ways that work in order to do this but don't really amount to much in terms of what we're trying to come up with if we want world peace we're never going to get anywhere so start out by not telling them that you're modeling don't invite them to a modeling session their subject matter experts and what you need to do is you need to understand their expertise they actually do understand the information that you need to have but you need to communicate them but don't say we're going to bring you to a modeling session because then they feel pressure just start writing some stuff down I know that sounds absolutely nuts but you don't need a immense sketch several of the people I've mentioned before are very good at going into organizations and coming up with models by just talking to people and seeing what's happening and because you read some stuff down and then you arrange it that's how you do your modeling on that and then you make appropriate connections between your objects and as you understand those you can create a few set of data structures that you'd like to have but not a huge number because if you have more data structures you understand more time transforming between data structures then you are going to be actually producing useful work for people in order to do that so just keep it light and let people understand this mainly keep them focused on a data model's purpose so if you have questions or you're seeing that you're in a modeling session and people are confused for one reason or another put down at the top of the page the whiteboard whatever it is you're working on in here and say we are in here to understand the formal relationship between soda and customer and making this example up here but we want to walk out that door with an as is physical and logical model for this relationship and in this case it's not terribly sophisticated it's going to be some variant of what I'm showing you in the upper right hand corner there in order to do the soda is given to the customer and the customer selects and pays for the soda on there we can check that against future models and see if we've added features missing functionality etc etc but we formalize these things in order to do this here's another one going back to our hospital beds topic here understanding the characteristics that differ on this so we want to walk out the door when we can identify the top three characteristics that represent the brand with a logical data model so we're trying to figure out what are the things we need to incorporate in order to do this primary means of tracking a patient could become important in there that was sort of buried in the details and we've now surfaced it as a major obstacle and they're going to try and figure something else that plan to be in this case our third example here again what if we had to put the following rule change in place tomorrow for example or go back into a pandemic is job sharing permitted if the rule is a boy has exactly one position then that's going to be problem on this because the union is going to get on us or whatever we're you know trying to figure out and they're just at your own situation but the key is and I use this data line to confirm the fact that we have a position that we can put multiple employees in order to do this if we're going to make this presentation to the board and exactly one or can be filled by zero one or many many is very very useful given that type of a situation here so just finish up all this is our headed towards the top of the hour goals have to be shared there's just no other way of doing it as a three way of sharing it's between the business and it and the systems themselves if we don't have any disagreement or refinement going on it means we have insufficient communication and we need to go back and look for this if somebody is just talking about things and saying you know embracing they're going yep it's fine it's not going to work all of these exchanges are automated and dependent on highly successful engineering and architecture components they've got to have a sound foundation and data modeling basics because you need to understand the structures the capabilities of those components as you build them forward if you don't understand them you're building a house of cheese and you don't have a solid foundation on it but each of these components can be architecturally specified at this level and we don't have to reinvent the wheel we've got these things already existing we're incorporating purpose statements in our models in this case not trying to specifically just simply define them but look at it from what are we doing it's a problem solving activity as well as a problem definitional activities and that we are going to incorporate modeling characteristics by doing different modeling challenges to answer different questions the idea that we're going to have everything modeled of course is useless it's not going to happen but we can keep as much information as we can and have that information in a way that we can easily build onto it so the modeling use is much more important than selection of the specific methods and we're not going to get into arguments about that but instead maintain the models as living documents and there will be disagreements let's dive in and find out what we're trying to do figure out how they and the models need to be searchable in order to do that if we can't search them there's just no point whatsoever the key is to have utility and if you have nothing else just make sure you have a counter on your modeling components because otherwise how will people know what you're in fact actually producing in there silly things like color and clip art can really really help your organization gain understanding of these things and drive additional value from it but the value is a three-step process it's not just driving value because you've got data you've got better organizational data that's wonderful but you've also got to improve the way people use that data in order to actually help people use that data in supportive strategy so think back to our iterative design process where things are going to get better and better and better and keep working our way around improving each additional time we do this this can only be accomplished by using an iterative approach focusing on one aspect of the time and applying formalized transformations because if we don't have this process of doing this we are really focusing on the model for the model's sake and the model is there to communicate to give us the results of an analysis or some other question that we're asking that should be shared across the top of the puzzle I want to take just two minutes at the end here and give you a little bit of more information if you're interested to dive further into these topics there's some wonderful resources that I and other people have made uses of first one is the data management body of knowledge of the data management block in there that is the idea that data modeling chapters in there but I've done good justice around that contain these concepts as practice areas in here that's just part of the overall what does it mean to do data management if your first time seeing this please do check it out just Google this you'll find us online in order to do this I've got a couple of other works that I just including in here is a general definition for you so if you really want to get into this also my colleagues David Hay who wanted to make sure get in there and also I mentioned a book this one is by Graham Simpsons really more important on understanding data modeling at a next level down around this and as I mentioned before some current modeling topics from Bernard Halheim as well as Gordon Everest around these topics here so we're getting back into the top of the hour here again extra points off if you go to the next slide here and we're going to do a little bit more of this same thing by some books around that process but let's talk about what's coming up we've got a couple of things that we're looking forward to so Shannon back over to you and what do we have coming up is metadata and then getting quality right and then enterprise data world coming up so we're actually getting back together physically in person so hopefully we will see everybody in person it's just been great having the DGI cues I think everybody's really enjoying get back together we're certainly looking forward to it too so give you guys a second to get some Q&A together and say hey to Shannon Peter thank you so much for another great presentation as always and just if you have a question for Peter feel free to submit it in the Q&A portion of your screen as I'll be reading it from there yes I'm so excited Peter that EDW Enterprise Data World is back in person finally not digital this time is in Anaheim so you know we're talking a lot about going to Disneyland as well and in addition to going to EDW and you'll be there right so we can there's just so many ways that this is more powerful in person than it is over the Zoom session so don't get us wrong guys we love you on Zoom but seeing in person is really what gets the communication going oh I love the networking and meeting everyone in person yeah that's great well let's dive in here Peter there's some great questions coming of course because we have an amazing community so here is an org has decided to build business glossary for the enterprise then should that business glossary be 100% aligned with a conceptual model for example HR business glossary must match the HR conceptual data model is this a DM box standard I don't believe there's anything in the DM box about that as a standard that said it does make sense that actually let's say the converse it doesn't make sense that the two of these things are not linked up so whether you need to alias them or otherwise somehow make people understand your business glossary should be the nouns that people are using in the organization if somebody calls it a flibbit then that's what everybody needs to call it you need to have some place in your glossary that says flibbits are the same as row seven in this particular database or whatever it is that they've got in there that's the kind of consensus that needs to occur because otherwise we find many times organizations have carefully planned what they want to measure and still ended up measuring the wrong things and that could be a disaster in terms of reporting and other characteristics great question that thank you for that and great and I forgot to mention and answer the most commonly asked question Peter and people are asking it now is that yes we I will send a follow-up email by end of the Thursday with links to the slides the recording anything else requests somebody requested the transcript and I got to look at zoom to see how good their transcript is because and how much is going to take us to like scrub it I know that they zoom the zoom AI transcript doesn't ever get diversity right it's always diversity so you know we'll see I never hallucinates right shadow I will work to see what I can get for everybody because I know you all of that so it's been some great chat going on so timing continuing on here a conceptual data model has we have relationships from level one data concept entity to level three data concept in the same model are her horizontal relationships from different level entities to different level entities allowed or relationships must exist at the same level very detailed question thank you for that I'm popping back up the Zachman framework because I apply the same rules if you will so when you say things like allowed it really depends on the doctrinal pieces and whatever you do don't let business people or managers here you having any of these discussions about it but the best way to think of it is that each of these architectural I've just go ahead and drop down to the next one here those three and close up on them each of those three can have specific relationships to the bottom or lower levels and there's an architect the conceptual can be related to the logical conceptual can also be related to the physical but I say can and what you want to do is record the instances where it does occur and where it makes sense but don't try and get them all in there for the sake of completing them over each data model is designed to answer a specific business question find out what that question is use your model to answer it and then go back to doing what you were doing before you got the question on it make sure that you keep the answer somewhere where it's accessible so that others won't have to answer this question twice and there again great question I hope that makes sense the conceptual components can show up just as they can show up in the Zachman framework but they do not have to show up there is no rule or body that's going to come out and smack you about if you don't do it the way they want you to do it it really makes sense in your organization but I would suggest putting in there that they can show up because that means they don't have to show up and you certainly don't have a task that says each of these has to have many of these connected to each of those I hope that was helpful but a great question thanks really good question and you know Peter with you know chat GPT and everything else going on the big question you know coming down the pike will AI eventually replace data modelers answers now and the reason for that is because we've got a long way to go before everybody including the AI's understand the questions that are at so I'm going to jump back to the the sort of conclusion state that I had here let me just pop this back up the idea is that we've taught people that modeling is something they should do that's good but we need to go a little better than just modeling is something that we should do do modeling for the purpose of answering a question coming up with some sort of a documentation a design document something that we can use in one form or another in the organization something of value and so the idea is let's find a way to add value the reason we're locked in this room is because we've never had one of these again forgive my slowness here I don't have these pre done because I didn't know what questions y'all were going to ask me there we go it's yeah okay one of these we didn't we had we need one of these for our organization right we need a really nice definition which is help if we had even that much that everybody in about eight departments that all use this data would be really really useful in order to do so we want to come up with one of those we're going to lock ourselves in this room until we come up with and hopefully we've got a good facilitator around that that you've got is able to actually find the meaning for these things and come away with it so these are done for a purpose in order to do that and you find out what that purpose is and you answer the question and then you store the model in a format so that you can reuse the information of the model and not start from zero the next time that you do this question thanks for asking that indeed so Peter when when reverse engineering is a one person task or when reverse engineering is this a one person task or a team if reverse engineering is a team effort who would be the players involved I'm sort of surprised I literally wrote the book on this and so if you have any trouble finding it out there send me a note and I'll send you a PDF version of this it has the players it has the things all right so let's talk about reverse engineering and I think that's really what probably the questioner is asking I popped this slide up three different times during the presentation here we teach our students the wrong context for building out these systems they are not for the most part when they get out there in the real world going to be building out new things they're going to be working on existing things you want to know why threads sucks so bad I'm not saying thread sucks but it's just not a very good system right now and the reason is because they are reusing components that were never designed to be used in the way that they're using they may get it right they may be able to engineer over top of it but it is very definitely the P in the pod situation there the princess is bad you can do this almost always you are going to start out by doing some form of reverse or re-engineering and again I just love to tell the story that my name my title when I was at the Defense Department was us do the reverse engineering program manager was in charge of all reverse engineering for the entire Department of Defense and I had teams doing reverse engineering projects around the globe working on these various systems and it was a fascinating exercise we had measures and metrics and all sorts of things done in order to get ready for Y2K believe it or not there was other good rationale as well but it was easily justifiable under the Y2K piece now the question was can a reverse engineering be a single activity it may be as simple as a single activity you can do this if you know where to go read the Oracle catalog or other types of things and park it to a reverse engineering tool it will suck it and spit out the physical as is model that is your best solution if you have access to that some of the systems are smart and will report themselves out in order to do this but that is exactly what you are looking to do is to find out the various components of your system that are there then you can start to see how they are organized to support faster, better or cheaper remember not all three of them but once in order to do this now the other part of the question was are we going to be replaced by AI the answer is absolutely not we still have semantics to deal with and AI has not done well with semantics up to this point semantics is the idea that I give a phrase such as take the building I give that phrase to different parts of the defense department they will do different things with it if I say take the building to the army they are going to form a perimeter around it and make sure there is nobody bad inside it if I give it to the navy they turn out the lights and lock up and leave and if I give it to the air force they sign a three-year lease with an option to buy so all of those are interpretable in different ways and we are going to have very sophisticated AI's before they are able to handle that kind of understanding as for data modelers, data analysts, data scientists your jobs are pretty safe from AI for a while in fact you might actually look and see what are some of the active hallucinations that chat GPT is inserting into some of the information that it gives some of the information it gives is very good and some of it is completely made up and that does not work out so well there is a judge in the western country that has got a case in front of him right now where they submitted some briefings with chat GPT submissions and they are not a pretty sight Shannon, not a pretty sight. Oh yeah, they lost. I think you did work. Yeah, the judge said you cannot use chat GPT to write legal legalese. Collisionations are a problem folks. We will see how that progresses but so Peter this question came in super early and you did cover some of this absolutely so but I'm going to ask it just in case there's anything you want to add in context. So again this came in right at the beginning of the webinar will we also discuss enterprise conceptual information modeling as in an enterprise reference model for understanding the business. And you did talk about some of that but anything you want to expand on in relation to that question and how it's phrased. Again, I hope that the whole point of this exercise was to give you an idea that it's more trouble to manage data and information separately than it is to manage them together as one asset in there and so from that perspective really if an organization said we're going to model our data this way and our information another way, I would suggest that it's probably not going to be a fruitful exercise because there is very little difference between them here to the addition of an attribute called request to an item already called data that contains facts and meanings about it and trying to measure that trying to manage the two of those things separately is absolutely crazy. Now I think the questioner that was asking more along the lines let me see if I can find that slide where we talked a little bit more about conceptual modeling on this. We've done some work over the years that has been very much fun and others how does one actually make use of these models and I found two areas that are really useful first one is sort of as a target that somebody says hey this is what the industry does and I'm showing you this because this is actually nothing to say it's a very abstract concept but people do make these and many times you can purchase them I had one group that had purchased one of these and decided they didn't like it wanted to get changes to it it's like well no no it's not the way these things work these are done by and maintained by a separate group and they claim to have at least best practices encapsulated throughout the industry so it reminds me Shannon one time I was out at the DGIQ it was a Sunday one so it must have been a June one and somebody out in one of these calls asked me specifically hey do you know where I could find a data model for a pharmacy in this case a pharmacy cash model and Len Silverston literally was walking by the room at that point in time I grabbed him and he told me what page of his books had that particular data model already set up and it was just a real fun spontaneous session you remember that or not but the idea is again each of these things can be at the enterprise level and you can say you know there are some things that we can say about the enterprise and we probably should say them they may involve for example specification of lists on the top row here of the Zachman framework or other types of specifications but you're never going to specify the entire thing perfect for the entire enterprise in all likelihood unless you are limited by size around that and so the best thing that you can hope for from a conceptual level is to in fact try to get some things that we'll use we'll use these vocabulary items in this picture to represent these concepts for us for the entire organization and that makes sense but where these things go off the rails is that organizations then say great we just have to specify everything and it's like nope it probably follows an 80-20 distribution curve 80% of the value can be gained by modeling 20% of those pieces try to keep it focused on that and don't get distracted and you'll more than likely end up with a place where people recognize these things are useful let me make one more point Shannon before I toss it back over to you and that is that many executives don't understand the purpose of these models and things and I'm keeping it up on the Zachman framework for a very specific reason and that is that I had a CIO that I worked with in New York City for a number of years who was a very fine individual and he said I do not understand what it is you guys in the data world do but I understand that it is important and I understand the why of why we do it so when I go around and I see the models that are produced here in this office by reverse engineering our legacy systems and they're using it to plan enhancements going forward that they are useful that it is a valuable thing that you do and that out of my organization of 400 people for them to vote into this seems like an appropriate amount let me know if you need more or less in terms of what's working here so they also understand or can understand you don't get the nuances of the we've talked about over the past hour they can understand the value the guidance that people get from these things because they are representing the golden version of this what's actually happening out there and the sooner you're able to get good at this process of going from conceptual to logical and perhaps even traversing between the re-engineering components of all of this I went to the wrong slide there the more value that you'll get in your organizations all the way up and down so let me just pop that back slide up and make sure I reiterate that point as I get ready to toss it back over to you Shannon again that's the one I wanted and the key there is of course again first reverse engineer what works well and what doesn't work and if it's working well we want to keep it and if it's not working well we probably don't want to keep it in there so we can take it to that anyway Shannon back to you and the questioner expanded on that same question it is more about conceptual models using other conceptual models because the business doesn't change conceptually when enterprise we don't build applications but have you know 1500 running great example in terms of that yeah the key is again where does it make sense for this thing to be true and that's words of wisdom from Amos Traverski in there and if you look at it from that perspective you'll find it's very very useful but if you try to put it out as more of a governance kind of thing that's a prescriptive so it's not going to be useful but thank you for the clarification so Peter how does data modeling fit into reporting and analysis from microsurfaces domain driven design architecture yeah so here's the thing no matter what flavor of architecture that you're practicing you're still going to be taking some form of an ipo model as an input process output model and in that context you're going to be reading something something's going to be coming into your system that's the inputs your system is going to do something with those inputs well the things that you're reading and the things that you're going to do with those inputs are very much at the attribute level and if you have confusion over what that attribute is you will be producing the correct solution to the wrong problem and that happens enough that it is a term of art if you will so the key is to make sure that in all cases you have relatively speaking a good probably even an excellent understanding of what your data requirements are because if you do not have that understanding of them that means they might change and again the last thing you want in your build is somebody to say oh by the way one of those fundamental pieces that we were talking about underneath all of this stuff is changing on us around this so again I'm just going to rebuild this real quick make sure everybody follows on this our as is versus our to be what we'd like to have versus what we're going to have our as is should have a one to one in our existing data set in there as such the models will either be valid or invalid again not validated is the word that is used on there in the default setting for all of them and it should be written on all of your models that are in fact invalid in order to do that and finally we can break them up into logical physical and conceptual around I don't know why I always start with the middle and logical physical conceptual again our conceptual model is generally at a high level what is going to be happening but as several of you have pointed out it also can include however everywhere that's possible you should implement using this data structure and by that again remember as few data structures as we possibly can have the better off we are again that should then be transformed into a plan so the notebook on the left represents what the model in the center is the how and the physical is the resultant from course every modeling transformation can be mapped into this and that's to me much more important than arguing about whether what you're doing is conceptual logical what the rules are instead find out what is the business problem you're attempting to solve and use the type of modeling that is appropriate there conceptual modeling if it's implemented as I've described here of wherever it's appropriate to use then by all means can be useful but it certainly won't be authoritative in there so again keep those kinds of distinctions in your mind away around that I think that will help guide you in terms of that indeed and we got lots of great questions coming and we've got about nine minutes left so we'll get in as many as we can here so Peter seeing many openings for data modelers but requiring expert level and SQL to apply how much SQL skill is really required for a data modeler I guess you'd rather have a simple tool that is known by everybody in the world I say that with full understanding that not everybody in the world does understand it but certainly understood by a large people in terms of a data manipulation language so more SQL generally better around that I don't urge you to skimp on that understanding how you divide up large sets of things easily from data structure origins is a entirely good compliment learning what you're learning here and if you don't eventually what will happen as you're talking to somebody you're trying to explain to them go oh you mean this in SQL and you'll be like yeah why didn't you just say that in the first place so yeah keep a focus on SQL, SQL good thumbs up I like it so when to start with conceptual models isn't a new company startup actually the best place to start is if our technical debt is accumulated from poor architecture use me I sneezed on that but yes that is exactly the conditions under which conceptual modeling makes sense but make it make sense such that I'm going to make this very specific to your startup such that the founders continue to care about it and understand it so there's the challenge back to you yes absolutely if you're in a situation where you have zero in front of you there's no reference or standard model that you can go out and rent or purchase on the market to give you some guidance and you've looked all through Len Silverston's wonderful books and David Hayes data model pattern books and all the rest and can't find any place and then absolutely starting off with a conceptual model makes sense and hopefully you'll get some good use out of it in terms of that but at the same time if you what you're trying to do is write the world's next payroll system I'm not sure you're really going to need a conceptual model fair fair answer I hope thanks Jenna fair answer indeed so would the conceptual model be the reference for the business capabilities model providing the terms to use well it's an actual one way of getting it to work that way I once had a it was in fact that same CIO that I was telling you all about earlier who said to me what I'm going to need to have prepared what's going to make the modelers happy and of course you know we'd all love to have perfect information but it doesn't work out that way in the world so yeah I think the kind of thing that you just described is saying alright so we've got these pieces of it what can we do and what can we make sense what is going to happen with them where are we going to be able to apply each of these concepts and that is a analyst looking through their toolbox trying to figure out what is the right tool to use to apply to the business problem and of all things is much more valuable than somebody says I'm going to try as hard as I can to make this fit into what I understand is logical data modeling around this again I hope that's good good answer for you but it's a it's very definitely a value judgment thing that you're going to get into so okay so we've got time we have five more minutes here Peter so should a conceptual model have too many relationships between people places and things or is there a limit to the number of relationships well actually the what I just referenced in the previous question is is a good limiting concept that's in there if you could make it interesting enough that the founders will still pay attention to it then it's worth it and if they're not they can't see why they'd want to pay attention to it all then probably not the key to it if you're looking at conceptual modeling is to make sure that you incorporate strategy security privacy these concepts in early in your design because they are almost impossible to accurately retrofit and so the key to conceptual modeling is to say what are the big things that things in context that model is going to have to do I was brought to my attention by a close friend of mine today that one of the local newspapers has a sign up that says on their website we're just not going to mess with GDPR so we just blocked our website from all their GDPR sites sorry if that offends you or bugs you but it's quite frankly it's not worth us to invest in that kind of on sets interesting concept of around all of that and I can guarantee you that their systems were not designed in order to do that in fact they're designed to promote exchange of angry information back and forth but that's a different topic entirely thanks Shannon great question indeed so happy new one more question here happy new added to a data model that time traveling function that is to have a permanent log information about the change of a data value who did it when did it where did it etc each instance should have its own control fields or should be a separate instance link to the instance or entity well of course that's a question of granularity is it not and there are well examples out there time and time again you do not need to do this and venture on this and in fact it's such a well studied area there's research that you can look up and find out what has been helpful in other contexts you know how did it work out if we were clearing the data lake house instead of a special cube that was done there and you know it was fresher data the lake house but it was faster access on the cube and what was the value that that freshman which is you know end up fairly nice neat little project that you can jump on to but yeah it really if you got the luxury to do that and the ability to do that I'd love to find out more about what you're doing to help you find team that because most organizations don't get to that well good of an understanding to even be able to apply the problem for it in there but anyway great great question today thank you guys so much it's always a pleasure talking to y'all Peter thank you so much and thanks to everybody for all the great questions but I'm free and we are coming to the end of the presentation today again we hope to see you at Enterprise Data World in Anaheim in September you can meet Peter myself in person love to meet everybody in person shout out to those of you who are already already who are registered already super early bird ends next Friday a week from this Friday so and again just a reminder I will send a follow-up email by end of day Thursday for this webinar with links to the slides and links to the recording for everybody so thanks Peter I hope y'all have a great day thanks y'all and I thank everybody as well and thank you Shannon have a great day