 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer for DataVersity. We'd like to thank you for joining today's DataVersity webinar, conceptual versus logical versus data, physical data modeling, sponsored today by Irwin by Quest. It is the latest installment in a monthly series called DataEd Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. The questions will be collected by the Q&A section or if you'd like to tweet, we encourage you to share our highlights or questions by Twitter using hashtag data ed. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And to open and access either the Q&A or the chat panels, you may find those icons in the bottom middle of your screen. And just to note the Zoom chat defaults to send to just the panelists, but you may absolutely change that to network with everyone. And to answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days, continuing links to the slides. And yes, we are recording and will likewise send a link to the recording of the session as well as any additional information requested throughout the webinar. Now let me turn it over to Andy for a brief word from our sponsor Irwin by Quest. Andy, hello and welcome. Thank you so much, Shannon. And thank you everyone for joining us. I'm really excited about today's session. Peter's got some really good material and we're talking about some of the basics of data modeling. And as as the sponsor, I just kind of like gives everybody who's on the call today just an idea as to what Irwin data modeler can bring into your data modeling practices. As we found in the time that I've been with Irwin now for six years and spent 10 years with CA. And data modeling is officially cool again. So what we're going to be learning today and discussing is going to be very important to building a data modeling practice in in your organization. So one of the things that just want to talk about we were an independent company for a while after we were divested from CA we became part of the quest family. Two years ago, and it's just great to be part of an organization that really understands how data modeling fits into your data landscape. So while there are very large portfolio of solutions that that quest provides Irwin is part of that portfolio with the data modeling data intelligence enterprise architecture business products business process modeling, etc, etc. So the best thing that the thing that we're trying to drive through here is is time is a yes and fast data is you know big data is coming around everybody needs to be able to see what's going on. Everybody needs to understand not only from a technology perspective but also from a business perspective and that's where solutions like Irwin and other solutions that are out there are helping organizations kind of get a handle as to what's going on. And one of the quickest ways to do that or the most basic ways to do that is to model what you have out there so that you can do a current state and then move to your desired state. Now we're going to be able to do with some of the solutions that we have is really discover what's out there. So enterprise management for quite some time of endpoint management now doing data management, really the biggest issue that any organization is faced with is finding what data is, and where it is, or what assets you have even to take it at a higher level what assets do you have who's using it. How is it being used, and then ultimately today because they governments around the world have started to put some teeth into their compliance regulations. And it's not just a matter of, you know, it'd be nice to have this there's a lot of regulations that come into play so if you don't know what you have you you just can't manage it. So, Irwin can support all of your enterprise data requirements and then we have fog light and other solutions within the quest portfolio to help you manage, maintain those those environments performance issues etc etc. What's going on today, just in my little spiel here is Irwin Data Modeler, just to kind of give you an idea as to what we provide in our own solution stack within the quest portfolio. And this is a little bit of marketing here to get set the stage here but Irwin Data Modeler has been around for more than 30 years. I've been independent for six years or four years I like to say that we were a 30 year old startup. Pretty much every transaction that you perform today, either personally or professionally is running through a system where Irwin Data Modeler has a presence. We have a customizable modeling environment we have a new user interface made tremendous leaps and supporting data structures as they come online, and are being used in the industry. Not just your traditional RDBMS but we also support no SQL structures no SQL databases graph databases, and all of that provides for the ability to reverse engineer that create your data catalogs document what's out there, and then use that as a, as a platform to move forward. And with native support we provide for the ability to compare your physical models to physical database round trip engineering what's out there, fix it, make it better, move it back out there and then integrate all that into a business perspective. Now what Peter is going to be talking about today is going to be around conceptual logical and physical modeling and Irwin Data Modeler can do that, along with dimensional modeling as you would expect. And one of the functions of Irwin Data Modeler provides is design layer architecture and where that comes into play is we're going to be able to create a high level conceptual model, derive that to a logical model and then downstream we can derive it to multiple physical models that's just one particular scenario so by doing this you can have a very high level overview that the business and all the stakeholders can understand at the conceptual level and then you're going to get down to the logical level where you're going to be adding more details such as attributes etc. And then finally you're going to have your physical model. And we can do all that with Irwin Data Modeler separately, but the bigger, the more powerful component is this design layer architecture we can start with a conceptual model at a very high level we're collect all the business objects. Everything is going to be set at an entity level. This is going to be used for as a source for your logical models and it helps provide focus and guidance to the modeling efforts and focus really just means that the eliminate the shiny things So when we're talking about putting a conceptual model together, we want to drive the discussion around the data sources themselves, business keys themselves and not be overwhelmed by any of the foreign keys or anything else that may be related. And then we can take this and derive it to the logical model. This is what you need to achieve the transition from conceptual to physical this is sort of like that middle, that middle leg there. And then we can develop to the attribute level and understood it at the third normal form so we're going to have one type of data point within this within this structure. A logical model to develop to be refined so you're not going to mess around with that conceptual model too much, but the logical models where you can add new entities or attributes related to the organization now if it's an entity you probably do it at that conceptual and then bring it down but at any point you're going to be able to sink and bring in new information from upstream into your models. In a in a managed process. Now when you're working with a logical model as Peter is going to explain more is not tied to any specific database management system is literally a logical component. And this is a little fun fact this is where Erwin gets its name logical modeling is was actually technically is called entity relationship diagramming. And that's where Erwin got its name from started as a solution with logic works entity relational diagramming for Windows hence Erwin. Now when you do this derivations we will have the link back to the conceptual model so we do add our business object up there we can bring that into our logical model. And then finally we can create a physical logic physical model now. In this case, this is a, this is for Azure Synapse, we can take it and we can have a logical model threads out the snowflake Azure Synapse SQL Server PostgreSQL Oracle anywhere along the line so you can, you can actually test these data structures or do your black box lab work with these data structures and different platforms all coming from that single logical model. This is where the DDL is created this is the blueprints that you're going to use for that physical solution in the data modelers, ecosphere we can have different versions so as versions are created of a model we can generate the DDL to, to alter existing structures or to create everything from scratch. We have was mentioning numerous times we do have a very broad range of data structures that we can work with, including relational and non relational data structures. And then ultimately this these models can also be linked that the fact that our logical model. So we as, as we add an entity or an attribute we can, we can model that in our in the physical model as tables and columns, etc. So that's your word from the word from your sponsor. And at this point I would like to hand it over to Peter to take us through some of the details about conceptual logical physical modeling. Andy, thank you so much and thanks to Irwin by quest for sponsoring today's webinar and help making these webinars happen. And if you have any questions and they will likewise be joining us for the q amp a portion of the webinar at the end. Now let me introduce to our speaker for the webinar series Dr Peter Aiken. Peter is an internationally recognized data management thought leader, many of you already know him or have seen him at conferences worldwide. He has been in 30 years of experience and has received many awards for his outstanding contributions to the profession. He has written dozens of articles and 12 books. Peter is experienced with more than 500 data management practices in 20 countries and consistently name is a top data management expert. Some of the most important and largest organizations in the world have sought out his expertise. Peter has spent multi year immersions with groups as diverse as the US Department of Defense, Deutsche Bank, Nokia Wells Fargo, the Commonwealth of Virginia and Walmart. And with that, let me turn everything over to Peter to get his presentation started. Hello, and welcome. Welcome to you Shannon and Andy, thank you for a great intro. I'll do a quick little plug here to just my publisher would get mad at me if I didn't say the books are on special sale at the moment. I have a website called anything awesome.com but that's not what you're here for today. And there are a lot of you here is wonderful we have people from New Zealand, all the way around the world, and it's great and I'm looking forward to a real good discussion on this. So let's jump right back in as we go with the program here I'm going to give you an introduction to modeling data for starters, then we're going to talk about the three types conceptual logical and physical modeling. And of course we'll get to some takeaways and references as we go through the overall process, look at learning how all this works so let's just dive in here and I want to start out by saying there's lots and lots at the end here for more. If you're interested in this topic to go further but one place to start for sure is my colleague and friend David hey, who has a terrific book where he dies into these things in a lot more detail than perhaps we have the opportunity to do right here. As I said, let's start out with just sort of the data tsunami bit that everybody has to do there's a wonderful company out there called domo.com that does a great job of talking about how much data is occurring and these numbers are fabulous if you need to relay them to somebody in your management. So here in YouTube, working in your tube, you better manage your data well enough so that users are able to stream almost 700 hours of YouTube videos for every minute of the entire day for the entirety of 2021. And this leaves us with a very large pile of data we go sort of pre digital age versus post digital age you'll see that the amount of data that we're working with has largely been created in the last two years. What leaves us with is a situation where we have an incredible demand in terms of trying to analyze all this data and figure out what it means. And yet we're not even close. All of us that are working in the area have been working as hard as we can, but you can see there is still an enormous gap between the amount of data that we have out there, and the amount of data analysis capabilities that we'd like to have. But of course if you don't understand what your data is or you don't do things with it in a structured fashion and I'm going to give you a little different word for that. If you let everybody do data on their own, you will end up with this, which is a mess. And it looks like a hoarder. It probably is relevant to say a hoarder bit about this 80% of the data in your organization is redundant obsolete or trivial. And that's an incredible statistic on this. We have the idea that we have a shared understanding of this and data models are how you achieve that shared understanding. All of your organization have conceptual data arrangement from the business the process the systems of security the technical the data and information and several other types of structures that you have, you have them. The question is, do you and the rest of the organization understand them. And if you understand them that's great. However, if they are not documented, they cannot be useful to others. And if they can't be useful to others. The idea that understanding a data model is that we understand it as a set of requirements that we are specifying for the data component of one or more systems that we're working with. And importantly, that shared understanding occurs between humans. It's literally the same thing as musicians singing off the same sheet of paper, but we also have to make sure that the systems understand this in the exact same fashion. Now, modeling addresses something I call data depth, you'll see this term more and more and it's the idea that if you haven't been paying attention if you haven't been modeling your data if you haven't been following best practices around this you need to sort of get back to zero, and then move forward in order to move forward. Now let's do a quick model of what data and information and intelligence are and I'm going to start with a little story that starts out with the number 42. And if you're a fan of Douglas Adams you'll know that 42 is the answer to life the universe and everything in the hitchhikers guide to the universe. Obviously it is Jackie Robinson Jersey number. And if you take 42 and add 21 to it you'll have the number of years I have been on the planet Earth, clearly making me old enough to consume adult beverages, at least in my state of Virginia. What I've done there of course very briefly is put together a series of meanings around the fact that the fact is 42. The meanings are, what is the meaning of life, what was Jackie Robinson Jersey number, and is Peter old enough to drink. And that combines one or more meanings and each factor meaning combination is the data. I've already mentioned the rock factor 80% of your data is written an obsolete or trivial. So you need to understand and distinguish between data and useful data for starters. In addition to that we can also objectively characterize the difference between data and information as information is data that has been requested in one form or another. By adding that simple rule and it's an objective criteria, you can see within follow up with Dan more and wonderful statement you can have data without information but you can't have information without data. We're going to take this one step further now and figure out what do we mean by intelligence in most cases. And while we have information that we supply data in response to request for. We also have the idea of how is this information used at the strategic level, the strategic use of information is where we get into intelligence, although you can see from the bottom of this chart, this definition was created in 1983. So many, many moons ago and we've used the words knowledge and wisdom up there in in support to me in in replace of intelligence in that that context here so this whole site this chart here has been around for a while with a very good objective way of describing these three concepts, and these concepts of course all together combined to form a model for us. Let's talk specifically about a data model and I'm going to call it, at first, a data structure. So we have a standard sort of computer science the definition here which is an organization of information and memory for algorithmic efficiencies such as a cue or a stack or a link least or a heap. I won't read you the whole thing there you can get the picture of it, and you can see it's important to have this because you may need to order your, your data in certain ways you may need to change these lists around. You may need to identify them and discover that there is just one and that you want to catch it. I had a phone company I worked with once that did not have a single master data for all of their phone number so when a customer's long distance bill is applied. So to the right number, but there were actually three of them in the instance that we have model for management and that was of course, a problem for them. The key for this though to look at something fairly complex like a data structure is how many of these do you want and the answer is quite simple, as few as possible, because you see as you look at how many interfaces that are required to resolve any type of integration problem in this case we're just talking about six different pieces of application code, which means we have 15 interfaces the formulas and times and minus one divided by two, in order to come up with this. This is a very crazily increasing problem in this situation I'll give you an example the Royal Bank of Canada told me I could use their numbers these are old numbers I'm certain they're encouraged anymore but they have 200 major applications and about 5,000 connections between the two of them. Now let's just take a quick look at that if we look at it on this complexity scale. You can see that if I have 200, the maximum upward theoretical complexity I could have is 19,900 almost 20,000 interactions. In order to do this, Royal Bank of Canada is down here so they have only 5,000. Well, this gives us the idea that we have to use models in order to understand to make the computers understand and to make the business and technical people understand what it is. We're talking about, as we're trying to leverage our data in our various practices and I'm going to give you a little bit of a complex model that's been called the washing machine model for obvious reasons. First of all, when you are modeling there are a couple of ways in which you need to characterize your model before you get started. One, is it an as is model, or is it a to be model, the to be model I put a cloud on it doesn't mean that it's in the cloud it means it's aspirational in nature so that's the first differentiation is your model reflecting current reality, or is it reflecting your aspirations where you'd like to be. There's another way of looking at that same set of models so dividing it up into two types of categories in this case called validated and unvalidated. That's an important distinction will come back to that in just a little bit and one more layer of complexity under the whole thing of course the conceptual, logical and physical. And when we look at all of this what you see happen is that people understand these models from a paper perspective originally conceptual is going to be some sort of a narrative or an enumerated sort of specifications for the system logical is going to be a data model, as Andy said before, it is a entity relationship model. In this case popping up and physical of course means the database itself. In order to do this, every modeling exercise that you are doing can be mapped onto a transformation in this framework. And I'll show you a couple of examples, as we go through this. So at forward engineering, which is the only thing that most students are taught. So you all that are on this call, understand that this type of material is not covered in any graduate program to my knowledge anywhere in the world, there are certainly classes that are taught in it, but it's not part of a curriculum it's certainly not knowledge you would expect a standard it person to have in order to do this. And our forward engineering should be done with only validated models if I'm building something off of an unvalidated model. It's probably not a good idea we can get into that a little bit more in the Q&A section on this. And that of course is that this is largely related to building new stuff, which consumes additionally only 20% of our total IT spend, whereas enhancing existing stuff consumes 80% of our IT spend. Let's take a look at that, because that is something that is different and Andy mentioned it before 80% of organizational time and effort particularly with data models is done in a context of reverse engineering. So there's a proper definition for reverse engineering I was just on the phone with Elliot Jekofsky a little while ago where he was giving a webinar on another group that I'm associated with here. But the idea is what are the requirements, how do we understand them and how are they built but we have that what we typically have as a database. We have to have physical as is and have to derive that logical as is sometimes even deriving all the way back to the requirement asset. Again, this is not taught in school and the modeling technologies, oftentimes can be a great assistant in order to do this. So let's look we've looked at forward engineering and reverse engineering let's look at reengineering which is a term that you'll hear from your management. You haven't already reengineering means that you are first reverse engineering the existing system to understand its existing strengths and weaknesses. You may only need to go from your lot, excuse me physical as is to your logical as is in that reverse engineering. However, if your requirements may have changed, you may also need to go all the way back and use this information from the reverse engineering of the requirements to inform the design of the new system so two options here when you can go blue arrows, yellow arrows, and then back to the green arrow or you can stay straight with the blue arrows. If you are certain, the requirements are not changing. Now I'm going to flip this model on its side for just a minute here just to show you all this represents and maps directly back into the standards body. So I'm going to go to the Wikipedia article that I show you right there on this the empty standard model of this the conceptual view is the community view that we see in there, and the physical representation is there as well. They don't tend to show the logical view and Andy mentioned it before we discovered after the standard was created that you really do need a logical perspective in order to look at all of these things so the conceptual is how everybody in the enterprise believes the data is used and the logical one is that translation function to take you from the on paper the thing I showed you a few minutes ago to the actual physical plan to get from how we go from one to the other. And so when you're doing this we should be able to change the physical implementation of this without affecting the user's perspective in order to do this. Take a look at a modeling example that has nothing to do with data models very briefly and this is them allow viaduct to the highest bridge in the world as a 2007 one of my favorite pieces and if you look at the little yellow there that shows you the route that you used to have to go from place A to place B. So they use conceptual models and that's the model that you're seeing on the right there is if you want to drive the orange path that goes all the way around it's going to take you longer and it's a slower route, whereas if instead we produce this highway that can go in, it will now achieve business results and help out in particular for the towns that are located along the edge of the highway as development tends to around the world. This is a business focus model it's really typically focused at the entity level and we're just trying to show how things are related to other things that are in there. Sometimes these models are maintained but rarely are they maintained in order to do this so that's conceptual models when they were building this bridge one of the things they said was we're going to put in these towers that are here and you can see them across the bottom there, but there are what they call temporary peers that they're going to use in building this now notice the bridge is held from the center and the bridge is suspended from the center there with traffic on each side of it. So logical models can be developed all the way down to a third normal form level, but let's wait on that for just a little bit. Again, we'll see some tailoring options that happen, and it allows the organization to figure out and say, Oh, you do have a good plan that will get you there now for you all that are listening to this. Somebody comes to you and tells you they have a plan for implementing the data and they can't show you a logical data model of how that is supposed to work, and that that logical data model is not understandable by you and technical people and the computers, you have not succeeded in terms of what you're attempting to come up with. Finally, the physical models have two basic purposes they become the blueprint for the physical construction of the actual solution here and again remember I told you these temporary peers notice they're in red. As we look at them on the screen here the real big bridge pylons are in gray, but the red are temporary. And what they said they were going to do with the movies bridge across but they said how are we going to do this. Well, the other part of this is these blueprints are used for maintaining the bridge and so even though the bridge was constructed in 2007 it's obviously more than 10 years old at this point, heading towards 15 almost 20 years. So when we get down the road. Where do we go to check out on things, and you will see this happen in any sort of disaster recovery or whatever. Looking at this so the physical models are critically important for this as well. And I want to just show you one more little piece of this, which is that as they were selling this bridge, they had to say how are we going to build it to remember the other pieces of the red ones that you see in the whole animation here. And when we look at what's happening here on the bridge, these pieces will be brought out across the overall bit. Then we're going to put them on these little mechanical movers, and these movers will move the bridge very carefully pieces, and they do it by lifting it up so they don't put any angular pressure any sideways pressure on those particular pieces up and we can let it down, go back, reposition the dragger, if you will, and do it again moving that bridge two feet at a time in order to complete this wonderful bridge, which is also by the way held in place with a very deliberate amount of welding to make sure that you have structural integrity on the bridge here and again remember where we started just a few minutes ago with our as is versus R2B. So let's switch back into data when you're doing data modeling, we're doing details, and these details are organized in the larger components that the details are very intricate in nature. The larger components are then organized into the various models, which introduce dependencies, as we go through and you'll see this as you can't use a website until you register for it I mean very simple one in order to do that. And these models are then organized into architectures, which talks about purposefulness, mission orientation, etc, etc, and the data world of course, it works the same way we have attributes that handle the intricacies the data models that handle the purposefulness, and the architectures that handle the purposefulness. So why don't we ever see any data architecture models like this the answer is because they're too big. You can't do this by hand you need to have tools such as Andy was describing in order to do all the work around this modeling is iterative in nature, the data models the architectures are developed in response to specific needs. You don't even need to modify or create a new model and you already know now, four times out of five it's going to be to change an existing model in order to do that. The data model then authorizes and articulate very specific information system requirements. Now notice one other piece in here which is a bit that we forget to mention a lot. And that is that we also have to have some sort of a trusted catalog. It's going to the various case tools and things. However, if you don't have a case tool, you still need to start a catalog by the way that catalog is the data governance glossary they should be the same tools, if we're working on them. And of course, if we do a poor job with all of this we end up with a situation that hounds Christian Anderson being the world's first data model or I'm totally kidding. But he did come up with a wonderful book here called the princess on the P. And if you remember the story, the P down here at the bottom of this mattress is affecting the princess sleep up here. What does that mean for us well they didn't have a catalog in order to do this typically these are undocumented, which means that if you do a poor job with the data modeling you fail to understand the role of data governance within the proposed and existing services actually doesn't strictly the data governance I just happened to have that particular word up there. But when you have that P you've locked in those imperfections for life, and it restricts additional data investment benefits and decreases the leverage that you can have in your organization. We spend 20 to 40% of our entire it budget migrating converting or improving our data. And if we try to do it faster, we will take longer it will cost more it will deliver less and it will present greater value. Thank you very much Tom DeMarco, or another way to think about it. We're simply pouring sand in the high, highly functioning gears that are in our organization in order to get this to run. So, let's take a couple of quick examples of how data can be stuck in bad ways. Here's a very simple example. This is the itunes formerly itunes using music database on Apple computers, and it's showing you what somebody might have put together if they wanted to maintain your music collection that you have in order to do this I'm going to ask a very simple question I'm going to go through these very quickly. Remember this is all recorded so you can come back and review it at your leisure when you get the slides. But if I deleted record number one what would be lost the answer is the leading record number one will do two things one will lose I purchased the song we met today, but we also lose the fact that we met today cost 99 cents. And that is usually undesirable and unintended. Let's take it that's called a deletion anomaly by the way let's take another example, and insertion there's three types. Suppose I want to add a phone called scuba, and it costs $1 29 to this existing data model that I have put up well I can't do it until a purchaser buys the song scuba. So that is again, unintended, we can't insert a full row until we have an additional fact about that row unintended undesirable insertion anomalies we also have update anomalies as well. So that's the way that I just wanted to increase the price of we met today from 99 cents to $1 and 29 cents. Well the answer might be to change the data items, such as song I would have to read every row in this entire database in order to go all the way through it. And if I read all of them, even so I still might not get it right because I would not catch the spelling error of we met Toddy, they haven't spelled it wrong on there but you get the picture on that. So there are good ways of organizing data that can be optimized for flexibility adaptability retrievability risk reduction, and there are some techniques that we can use. I'm afraid smart code is good. Excuse me smart code bad dumb codes good, etc, etc. The key to all of this is that it should be done as much as possible to store one fact for row. So again row two as an example of purchaser number one has bought sushi and it costs 99 cents, but these are two distinct facts and we really should keep them into completely separate tables. So let's dive into the various modeling types now and the conceptual one, we should be talking about architectural trade offs and strategy. And of course we should be introducing the glossary at that level. So here's our washing machine model we're just on the left hand side. The pink stuff is the conceptual model it can be validated not validated to be or as is. In this case the motivation is to standardize and harmonize the vocabulary. I've mentioned before between the business and technology but also between the humans and the systems to focus on the strategic trade offs to provide specifications providing the organizational strategic data objectives. Data is used to support the organization's achievement of its strategy and these requirements should be demonstrating how you are going to satisfy the business objectives. The validated models require the word draft on them. And I urge you to put the word draft on any model that has not been validated. You'll be amazed at the kinds of things that you are doing because what you're telling people is, I think this is the right answer but I'm not 100% positive fingers cross will move ahead everything will be okay well I wouldn't fly an airplane as they ran that way. You shouldn't let people do data modeling in that. Again these conceptual modeling models help us to understand the various organizational concepts and hypothesize about how these data things relate to various other data things. We may need unvalidated ones to understand and just do some brainstorming, but we may validate them as well to understand the various data pieces to go around it and understand what system wide definitions are. The architecture is always going to involve these trade offs and so should your conceptual model. Good cheap fast take any, excuse me take any two of them is a back of the business card that a friend of mine has used for many many years around all of this. Let's dive into the strategy for just a quick second here. The old word for strategy and I say this comes from the military context and about 1950 you can see it started getting used more. And when it was used more this is because the business environments got it and turned it into a master plan well strategy at that point became a thing. The original concept of strategy and the way I prefer to use it is that it is a pattern in a stream of decisions related more to process than to a thing. The idea of strategy is that you something you can get good at as opposed to something that you write perfectly in order to get it done. Let me give you three very brief examples. First of all, Walmart's former business strategy is very simple. Every day low price is known by everybody in the business. And that's, they've done an excellent job of making sure that you understand that. If on the other hand you're a sports fan, and you asked the question what is Wayne Gretzky's strategy how to become the hockey great that he came to, he's got a great article of Wikipedia as well he escapes to where he thinks the puck will be. After all, if you're chasing a large metal, excuse me a large plastic object on the ice. It's going to be much faster than you are, and you will be unable to keep up with it so you have to position yourself to where you can correctly take advantage of it. For example, here's a little bit more complex I need to set it up for you it's Napoleon at Waterloo. How do I defeat the competition. When their forces are bigger than mine and the answer is divide and conquer. So what does that look like from a strategy perspective will remember strategy is a pattern in a stream of decisions and as a complex strategy this is actually kind of awkward. First of all, we're asking low paid soldiers to hit the armies, just right at exactly the right spot so they separate. Then we're asked all our soldiers to turn to the right and defeat the Russians, and then turn to the left and defeat the British. And oh by the way, somebody shooting at you the whole time. So complex strategy is generally not a winning piece to go on data models are used to support strategy here's an example that I experienced directly when I was much younger. I was a manager of a store, and we were going through a recession if you guys remember the first oil recession crisis in the late 70s or actually mid 70s, they told all managers that they had to sell. Well here's a situation where you can see from this conceptual model I'm either a salesperson or a manager but I couldn't be both. And consequently it was very difficult for this organization to track sales from the managers not that I was a great sales person as well, but doing that on an entire store wide basis made it very complex for the organization. So we look at strategic use of models you can see that they were created the entire flight booking business based out of a data model that they are still using today. The company invented a brand new credit card business overnight again using the same data models that they are using today Amazon is using the same retail models that they invented when they were starting out 20 years ago, and capital one of course reinvented the solicitation algorithm around all of these. So you're not putting together a model well again the data modeling process is you identify the entities to identify a key for each entity and a key is a way of identifying one specific record in that list of records to the exclusion of all the other records that are associated with that particular entity. And this map of what the relationships look like this is where you get into the trade off that I was describing before identify the attribute values you should have a list of those from your system already if you don't. You're too far in your systems development practice and you need to go back and do some more requirements and allocate those attributes to the various entities that we have mapping them back and forth of course don't forget. We're going to have to keep up with our glossary or dictionary whatever it is we're going to call the way we're going to all speak the same language as we go through this exercise. Now model evolution is good at first you should come along and refine your original model if you're not you're probably not paying enough attention to it for devoting enough time to it in order to do it but if your model is changing a lot and a lot and a lot and it doesn't stop changing. You may have a good understanding of what it is you're attempting to do. You may in fact add some additional relationships on it in order to take a look at it. Here is a conceptual model. I'm going to put that out there on pizza for a Department of Social Services that we have here in the state of Virginia. And you can see here the model is just simply describing the entities in this case that's not to say that you shouldn't have additional attributes in them. But each part of the model from a client's perspective from a governance perspective from a program delivery view from the vendor perspective, all are different and when we put them all together to come up with that conceptual model. We all have a good representation of the major things that our Department of State Social Services creates reads updates and delete data about as it's trying to go through and help the people of the Commonwealth of Virginia in order to do this. I mentioned one more time again the business glossary it's the start of your enterprise taxonomy it defines the initial entities that are in your conceptual data model and it's how you engage the various business entities in order to pull all of this together and I want to tell you a brief story about the business because these are crazy hard to implement and a story about Nokia company I had a long association with really interesting organization started out before I got there doing tires and rubber products consumer electronics. When I got there they were doing mobile phones, and it turns out that the fins are bilingual all things learn both finish and Swedish because 2% of the population speak Swedish and Nokia wanted to play international so they wanted everybody to learn English on top of that, which meant unknown words. First thing that Nokia did was make it absolutely clear and encouraged by giving people rewards for asking questions. It was not a bad idea to put your hand up in a meeting and say I don't know what that term means. But interestingly after I got there they decided they had wanted to build a good common vocabulary. So when an unfamiliar term was used anywhere in Nokia, the group would instantly turn to their laptops and access the NTB to see if there was a golden or a standard definition that was used. Now, if the term was not in their NTB, they would then take a quick vote to see whether they thought it was important enough to include it in the NTB or not. So the weekly the NTB group would then look at the submission and create new versions of the NTB by the way NTB stands for the Nokia term bank. So they had a wonderful way of doing it. Everybody understood this and I'm just going to give you an example of how this was when I arrived. This was one of the first things that we saw. This was my non Nokia phone, which was a faux pas just for starters, and took a couple pictures of it and said, you know, I'm not sure quite how to describe this thing, and somebody looked at the picture and said oh that is our cruiser collector. Well, unfortunately, I was able to go in front of their sea level in just a couple of hours and say that they had more documentation on how to maintain their rubbish than they did on how to maintain their data. So let's move now from the conceptual to the logical and the key in the logical piece is trying to motivate complicity of both operational and design considerations, moving towards standards, as opposed to standardizing everything and here is where business actually meets the strategy. So again we're now out of pink and into the orange here right and take a look again motivation wise. It provides investigation about the information about the effort such as five shape provenance of the data, the various functions, what downstream uses might the data have it freeze and discussion from technical considerations that are separate from the business objectives and this is the reason for logical modeling. We discovered that it was great to theorize about something, but that theory needed a lot of implementation to get there in practice and modeling between conceptual and physical in order to let us understand and document these various data design considerations that come up there. The design consideration should satisfy specific business objectives, and mostly you should be trying to generate the stuff as much as you can, instead of the logical as is models challenge the conceptual model I've seen many times where a logical data will cause somebody to go back and revise their conceptual data model as a result of increased understanding of both the existing system and the business requirements that you're attempting to do. You need to explicitly incorporate information from the existing components in order to understand that if you're looking at a modification to an existing system, it is impossible to do this well unless you understand the strengths and weaknesses of the existing system the B models serve as an organizing principle around which the data capabilities are built and get this to that common vocabulary. Something else that's very critical to incorporate in all of your modeling types, but particularly at the logical level, is that definition reporting does not provide context. So if you ask somebody to define a bad they'll say something you sleep in, and that is simply unhelpful. Clive Finkelstein taught me a purpose statement incorporates the motivations in here so don't let people define things instead, make sure that they describe them with a purpose state. So the bed great let's go here's a purpose statement why is the organization maintaining information about the business content. Well if they shrub structure within a room that the his substructure that the facility location. It contains information about beds within rooms, a little better. We actually found out a little bit more information on this and I'll play how that goes in just a second but let's look at a couple of attributes in here it's a partial list. But what you can see here is that we now have the bed gender to be assigned to we know we're going to maintain certain types of beds that will be assigned to female versus males and everything else that we need to pull into that category. Also we've got the association in here as well and again it's very crude way of saying a room contains zero or many dead. Now, what was fascinating about this particular example this is actually a military health system example. The idea was that they were going to use the beds to track the patient. So you can see the bed has wheels on it in the upper right hand corner there, but the idea really was not well thought out because as soon as we pointed out to them that the bed would be pushed into the hallway in all cases. What room was the hallway going to be or better still if they figured out a way to create a room number for a hallway. Then the question was, what room number was the elevator because that's the main place that main place the patients get lost in hospitals is the elevator. Also, of course we want to incorporate the status of the entity whether it's a draft, or whether it's been validated in order to do that. So let's take a couple of examples because we're now looking, not just saying these two things are related but we're asking how they're related. A bed to room. Well, what is that? Well, a bed is related to room. Interesting information. Here's another one. A bed can be put in many rooms. Many beds can be related to many rooms. So there's our wonderful many to many relationship that clearly needs to be resolved at some point. There's another way of refining that particular process. Oh, okay, many beds can be associated with one and only one room. And of course that works only until you add a temporal dimension to it. Now you have a really complicated model that gets in there. Many beds may be contained in each room in each room they contain many beds is a very precise data requirement specification and what if these beds can be moved. We now have another level of misunderstanding in order to do this. The various types of relationships are card cardinality options exactly one, one or many. Eventually one which is a time based concept zero one or many optionally and finally eventually one or many are the various options that you can have again at a logical model it's not critical that you'd be perfect with these but it's a good place to start adding these associations here's a little bit more of the model again we've got the associations remember we're doing all this within the trust and catalog in order to do so now we're looking at rooms, patients and beds in here and you can see the bed is placed in one room only these currently these occur in pairs, reading it the other way a room contains zero or more beds. The other paired relationship in here is in Brown, a bed is occupied by zero or more patients a patient occupies one or more beds. These precise business specifications are the precise, the keys in the pod that I was describing earlier on that Hans Christian Andersen understands around this. Here is an actual data model it's from our Denbach body of knowledge in order to take a look at these things gives a lot of ideas how all of these things come into play. And this becomes reference material that we use going for everything else. Now let's move from logical on your town into physical as well. Again, very bottom of our chart of the washing machines they can be as is or to be validated or not validated in order to do this. From a motivational perspective what we're really talking about is the specification of production systems with entity relationship diagrams and the glossary all data models are incomplete without the definitions which means you have to have them done. Now, the question here real quickly is that most of the tools of today will generate this data definition language for you instead of hand coding it so please don't take the time to do that if you have access to these tools. It's become the maintenance for the future solutions and can be foundational for the system documentation they're required to access the data in the system you need that data model to figure out how to go get it. All of these specifications can be generated semi automatically. Let's take a look at how that works. Again, these are lists of organizational persons places or things that need to be created read updated deleted and some people add the words archived into that as well. Again, just to look at one example here we've got clubs and regions. So we might look at how these two are related. Here we have a club ID that we're putting in place so that's how we identify a specific instance of either of them. And clearly from a regional perspective we're looking at some weather type related right well I'm not sure exactly what type of club it is. The clubs need to be uniquely identified based on this particular piece and that we're going to maintain some club specific information tables assigned reservation. Again, not any particular example here, but you can see just as the basis of this physical model is the to be model in this case, each club must be part of a region they cannot exist without doing that. And these uses then give us the idea that we should now talk about which specifications we're going to put in here so all clubs can have a status. There can be many reasons for the reservation but it's a free text field and that may give us some data quality problems as we go forward with all of this. So all level variances, addition of keys definitions are all relevant. They can apply it conceptual logical and physical levels. They typically are more detailed as you get closer to the implementation process that you are working within in order to do this. So a couple of quick things here these are the basic database structures that you have a flat file, an index file, a network database a hierarchical database for relational database. And then of course all of the other what I'm sorry I'm going to go back on that. Again, flat index network hierarchical and relational database, plus a whole bunch of new ones. And this is where it becomes very challenging from Andy's perspective to be able to be responsive to all of these types of data models and I think we've got a couple of questions already on some of these which will be great. When we get to them. These basic structures, however, allow us to understand the basics of the system. So here's a system that we use that Virginia Commonwealth University called the student database master. So you may not understand a lot about this particular database if I tell you that the parent entity has been circled in yellow in the upper corner and it's called funny enough SDBM, and that every other entity on here is a child attribute of that SDBM file, you know how it's a really good idea of how this system works. This is a real life example by the way we replace this system with our banner system, but one of the other vendors tried to sell us this to be data model. Now if you can't look at this and say oh my goodness this is silly. We've failed miserably in our little attempted education here. This is of course a poppycock model that makes no sense whatsoever and the people who were involved in it narrowly escaped going to jail but that's another story entirely. There is just to finish out the concept on physical. This is looking at this is people soft actually the HR model in people soft looks like this at the conceptual level. It looks like this at the logical level knows our goal for simplicity but when you implement it in Oracle, it gets very detailed and again I'm showing you these slides just to show you up detailed. How this model goes through all the various bits and pieces in here in order to look at them. Well, we've taken some time here today to look at modeling and giving you some motivation which is that we have to have three different modeling types depending on what you're attempting to do conceptual, logical, and physical plus the two characteristics of is it an as is or to be model and is it a validated or an unvalidated model there are reasons for doing each type of modeling. The reason should be driven by your business needs so make sure that you write at the top of all your models we're doing this modeling because right, we've got conceptual models here again architectural trade off that we've talked about me. There's glossary at that point, the logical models which should be simplicity oriented give us an opportunity to harmonize to standardize to make things as simple as possible to avoid the enormous complexity of all this. And our physical models which actually become the blueprints of what we're doing so as we're headed towards the Q&A I've got a couple things to sort of summarize where we are and wind up on this will invite Andy to come back and join us for the discussion. There's correct ways to organize data. All of these correct ways to organize data involves data modeling. Again, it can be done for flexibility adaptability, etc, etc, and the techniques include smart codes, good architectural joins, etc, etc, and avoid if you possibly can all of those keys that get into the design, because those people have not been trained in data modeling in order to do this, including the ones who produce our commercial software in many instances. Similarly, as you're working through this with organizations and groups, don't tell them that you're modeling, just write some stuff down. If you have a hard job, some things on a piece of paper, then go back and arrange it and make some appropriate connections between your objects. There's lots and lots of different ways of doing this where you don't have to make it a formal exercise and scare everybody that you're doing something technical that they may not understand. I'd like to tell people the reason we're locked in this room, the reason you guys are spending 90 minutes with us today is hopefully we're delivering some value to you about these different types of modeling techniques on this. So the reason we're locked in this room might be to understand the relationship between customer and soda, and our outcome that we're trying to achieve is to be able to walk out the door with an as is physical and logical model of this relationship. It might be soda is given to customer and customer select soda. Oh wait, we forgot my original specs to pay for it as well. Yes, I did see that in one particular instance people. Yes, we're just going to hand them photos if they ask for it. Again, what's the relationships with the different characteristics between our hospital beds. We want to walk out the door when we've identified the top three characteristics that represent our brand of hospital in this and how our hospitals are going to take care of this. If we decide that the purpose of this is the primary means of tracking a patient and we're going to do it with a bed. As you can see they had a time at hospitals you know they now bar code you so you don't need to worry about being actually in a bed which if you think about it makes a lot more sense. But in those days we didn't have the ability to mark individuals around this one final example on this again just as our focus. Our system handle the following business rule tomorrow. So let's we're in the middle of the great designation we're having problems. Can an employee work at more than one positions or can a position be filled by more than one employees. Well, if we look at our existing as is data model we can figure out for sure that a position can be filled by 01 or many employees that means we do have the ability to do job sharing with our existing system without making any specific changes. These are all examples of where we use data modeling to answer specific business questions that came up during the course of maintaining our systems or creating new ones but remember the ratio on that is one fifth new thing for fifth and understanding existing things. So the key for data modeling is to obtain some business value. We've got to have a good series of understanding here. And if somebody shows a data model in front of a group, and there are no disagreements, no refinements that to me means we've had insufficient communication. There is simply no way we can show a data model, and everybody will always agree it's perfect the first time that said those disagreements and refinements should be easily understood and moved, so that we can now start to modify the model to where it's going to work. Data exchange is automated and dependent on successful architectures. Everything that's happening out there in our world is dependent on high speed automation. Do you imagine if you had to craft each individual email that you were doing by the way, each employee in your organization has learned data on their own. And so that's kind of a bad thing. It's a different topic that the data literacy topic but it is an important consideration, and it's why we have so much variance in the modeling definition is a problem defining as well as a problem solving activity. And it's important to keep that in mind it's one of the main uses that you should have with data and if you have some things that you're doing with data, you're not doing models with them. You've got a problem and if you're doing models without the glossary, it's even worse. There are a few different modeling challenges for different problems you may have use of the model is going to be much more important than any specific modeling method. In this case, the model should be always seen as living documents and should be available and easily searchable manner. And that the value is derived from improving the data and changing it and helping us go through it piece by piece each time, so that we can get better at the process. And also include the use of color and animation in there it helps out an awful lot around all this if you want to learn more about these topics we breathe through the very quickly there's of course our dim box. I've already mentioned works that are in the dim box yes that is sideways I had to get it in there so it would all fit there David Hayes book. I've seen wrote a wonderful book data modeling theory and practice he's also the author of the Rosie project if you haven't seen that one, and our colleagues at some other universities have put some other bits together here. Shannon I went 10 seconds over I didn't do it very well. Anyway, back to Shannon and Andy and thank you guys for listening to all of it. I'm going to note it that the moment that Peter went over. 10 seconds. So here's another fantastic webinar this is just amazing and love it. Been a lot of chat going on and questions coming in last questions coming in already just answer the most commonly asked questions just reminder I will soon follow up email by end of day Thursday to everybody to all registrants with links to the slides and links to the recording, along with anything else requested throughout a lot of great resources in those slides so make sure everyone gets those. Andy feel free to jump in as we get through these questions so this came in, you know, Peter before you even started but we've had three. On it. No sequel get to go first. There's no sequel get to go first. That wasn't the first one that I was looking at but. All right, but you go ahead. Okay, so I'm conceptual model, define any of these relationships and capture business data flow logical modeling, define keys for attributed physical data types not nulls, trying to understand where exactly we define keys. Well, in my concept and my experience of this and Andy please feel free to jump in here as well. I want to say it almost doesn't matter before you get to a physical you will need a key. If you have a good idea at the conceptual stage what your key should be I don't have a problem with noting it at that level and saying hey, we think this is going to be a key I would label it as a candidate key, if I were going to do it at that point. But if we get into arguments with business people talking about whether a key can exist in a conceptual model the business version is going to get turned off, because they're not going to understand the conversation. So things fit just think of them as becoming increasingly more robust as you move from conceptual logical and from logical to physical, and the introduction of keys can occur at the conceptual level, typically doesn't. So if you showed me a conceptual model it didn't have keys in it I would not say this is not a good conceptual model I would say we now need to start working on keys but perhaps that can occur at the logical level so the real key is, again, your keys are to do that what business problem are you attempting to solve and if you have a conceptual model that says, I want to show the major classes of data that we are going to create read update and delete information about in this system. I'm perfectly okay with keeping the conceptual model at just simply being entities related to other entities Andy what do you think on that though. Absolutely, there's no steadfast rule fast rule with any of this. And I agree that if you're arguing as to what's a key and conceptual and logical you kind of need to take a step back. But, you know, traditionally you would do it at the logical level but if that's what the organization wants. And that's how you get the buy in right and maybe you're going to have some some some discussions at a high level but like you were saying before. I'm nervous about even the concept of modeling so just keep it, you know hey this is how all our data is put together and, you know, leave it at that and that's where we're going to discuss this and that keeps everybody focused. And Andy let me thank you to reverse scenario on this because we talked about in both of us going forward on this. I've done a lot of work going backwards as well so when you start out with a physical as is of course you're going to start out with the key right. You can implement a database with that and some of the tools that we've talked about do that automatically. So I've actually been hired to come in and people say I've got a database but I don't know what the conceptual model looks like or the logical model looks like can you get it for me and I said you have a copy of tool X in the case usually Erwin right will put Erwin up to it connected and five minutes later is done and I'm out of a job, which is great. Right, because now you've got the information that you actually wanted to have. Right, exactly and that's that's one of the things that, you know, with our data modeling solution we can point it directly at that physical structure and create the ER diagram in line so it'll be an accurate reflection of, you know, what the actual data structure looks like and when we show a conceptual model, we're literally just showing the entities and the relationships the attributes and everything else is associated in the actual full blown logical model but we just expose the entities which there may be keys there. But when we talk about it from with the business teams, we're just going to see the entities and how they're related, and that would be reflected in, you know, in the underlying physical structure. And just flip back a couple of slides to illustrate that Shannon we go ahead and drop to the next one so again here is our physical as is structure, which is very dense and detailed you can see there's a close up of bits and pieces of it. But that's not where, again, a business person is not going to be able to relate to that diagram at all whereas they more likely are going to be able to relate to something here at the logical level. Exactly. The key to keys. You mentioned a question earlier that came into the chat early on, you know, how do you model a SQL database. The first thing to understand about that is that most SQL stands for not only SQL. The key there is to say that there are some pieces of SQL that can be used to model very, very easily. Again, the DDL that is in there can be fed into these various case tools, and they will pop out a model based on what their SQL is so if the SQL is very specific it will have lots of information that of course does not represent the totality of your not only SQL database the question comes up with what other blendings are you pulling into place. Are you doing a fully indexed piece like a data leak kind of a situation or are you doing columnar databases, you know how is that organized so it goes back to the question. What are you attempting to do don't look at this as the logical definition must contain the following things. Look at this as what what what answer am I trying to come up with. So how is this data arranged internally, you can get some very good information from pulling it out because of course, not only SQL is going to include some SQL and that's the best place to start off with her. And how would you add to that and there's probably several other things besides just the SQL that we've talked about as well that I know you guys have seen some more exotic data structures these days. That's a great question because we just added support full, full blown support for no SQL starting last year and it was probably the number one requested feature in our solution. And shortly after it came out, there was a lot of questions why am I even modeling this. And it's a document right it's already documented for me. Well, there's you're going to have, you know, those JSON structures are going to be part of the overall data flow. But even more simply, if you're talking about any sort of compliance, the information that's in those documents need to be governed. So the way that we can work is like, like we're saying before you pointed out that data structure and create your model, and then identify what you need to be watching, identify your personally identifyable information and anything else that may be contained in those documents in the DevOps world, these structures are changing very very quickly and they're designed to do that. So, being able to model how that is set is, you know, how everything is set and then be able to govern it identify the objects that are out there. In addition, what we do is with our data modelers we can be normalized which breaks all of the rules. I was just at MongoDB world two weeks ago I think I was about three weeks ago now. And one of the features that I kept on showing was the denormalization of your relational models. If you want to, or if you're moving into a no sequel environment and you want to maintain your relational structures and a no sequel. And there's no sequel structure you can take your models and then denormalize them and convert them into, you know, it's a JSON as a structure or into MongoDB or couch based Cassandra any of those structures. So having that model as to how everything actually exists today relationally, and then denormalize that into an unstructured data. You're starting from a from a single point and if you try to do that yourself, you try to denormalize everything that you have. Well, first of all, you're going to cringe because you've been working towards getting everything into third normal form for the past 30 years, and you're going to make a lot of mistakes, intentional or unintentional But having a model to start with is is your starting point. And then you can denormalize out into all these new structures with familiarity of what exists today. Let me add one point to what you said Andy to and that is that many people say well I don't care what the existing structure is because it sucks. Okay, sucks is not a very precise word in there and I will stipulate that all systems do some things well and something poorly, but if we don't know which ones are which how are we going to avoid making the same mistakes. So this is the reason for looking at reverse engineering in this fashion to say we really do need to understand these things because if we don't take that information that we have from the existing system and understand it again my favorite example is the old hard drive I used to put on one hard drive a through D and on the next hard drive I'd put it on the E through J and you know blah blah blah divide up the alphabet that way. Why did I do that well because each hard drive was 10 megabits in size, and I couldn't put any more data on it well that's a really bad reason to carry that design forward. So how can we do that in a new environment we've got tons and tons of practically free storage. We've got to be done in a programmatic fashion and that's I think the point that you have to say Andy, not that everything can be programmatically done but more and more we are learning how to put programmatic things in here that will help translate those XML structures and other types of activities within that. Absolutely. Thank you to the question of keys. Also from the same question or original question also facing another issue in logical data modeling. We have a similar business concept with different primary keys. How do we represent the same concept in logical data modeling initially I thought this should be like super type and sub type but the rule for super type and sub type is to have the same PK. So, how do we define these in logical data modeling parent child. So, some sort of synonym or other types of redefinitions that you find I I see this insurance companies quite a bit where the same concept, people will think it's actually two different things. And you realize if you look at all the attributes back and by putting them in a data model just exactly what we're talking about here, and comparing them you'll find out that the two things are have more similarities and differences, and that there's probably a difference that will allow you to fit into some relatively normalized type of structure around this. And I've certainly seen bunches of instances where organizations come to me, these are different concepts, you know how are we going to model them and how are we going to work together and all by the time you actually do a thorough analysis you find out they're actually more similar than they are just similar. Absolutely. Yep. I love it. So, I'm increasingly horrified by comments made by my colleagues in data architecture that are telling me that with cloud based systems we no longer need to waste time on data models for the data lake data breaks or snowflake. Just move the data in and let the business users do their reports and analytics. Are you seeing hearing this to what are your thoughts. I hear it all the time how about you Andy. Yeah, yeah. And I tend to disagree with that opinion. Yeah, I think we both kind of cringe on that but but I would say, really, again, back to the 8020 rule, the, the hype that has been around this is that what we're describing here, take some time and to do it carefully. And most people say well we don't have the time to do it we just need to get the data out there well you know if that does that enriches your cloud vendor. And that's right in a divorce right you know who makes money on that. The divorce. And if you throw anything up into the cloud, you're going to end up with a mess and the cloud vendors are going to be extremely pleased. In fact, I have rules of the cloud I think that data in the cloud should have three characteristics of the data outside the cloud does not have inside the cloud should be of higher quality than data that is outside the cloud I think everybody would generally agree to that but you have to actually take some steps in order to make sure that it does happen. Data inside the cloud should be by definition more shareable and that's an architectural concept that can't be engineered without a thorough understanding of data structures and the taxonomy if you will have the organization in order to do that and by virtue of one and the data volume in the cloud should be considerably less than your volume of data outside of the cloud. Now, that is not the way most people approach this they simply walk up they take a forklift and they throw all the data in the cloud and they make Amazon or Microsoft or whoever. It is very very rich and now you have an additional barrier of I can't just manipulate the data directly I've now got to go through some sort of cloud based interface. Andy, what kind of arguments do you use to push back on people when they say you don't need to model this because it's going in the cloud. Well I generally say oh Contrerem on frayer and then I'll go into a little diatribe with the big thing that's being missed if you don't model in the cloud is lineage. We're talking about modeling today, and when we're talking about the information that goes into the cloud. Where is it coming from and how did it get there to your point we want to make sure that we have good data up there. And generally you know it's if we're going to be in the cloud, there's going to be analytics involved that and there's probably an EDW or it could be a data lick. And Bill Inman frankly would would, he would be very upset to hear that nobody wants to model in the cloud because that's ultimately the lineage as as to where the data is coming from. And with the data model we can manage both your sources and targets and anything else in between. And then you have your, your, your established structures that either end and then everything comes together via your mappings and ETL ETL anything that you're using to move the data into that so that's, that's the other reason why you want to model both ends of that of that data flow, so that you'll have a better understanding as to what's going up there. And how it's being how it's being manipulated before it goes into that if you don't have a data model, or a way to look at your mappings and comprehensive way in lineage. You're, you're, you're just going to be flying blind and you're just, you know, it's the old guide go right and we're just going to take bad data and we're going to move it someplace else and pay more for it. Absolutely. And if you think about it, if you've got data in the cloud but you don't have a map as to what data is in there. You're going to have every individual analyst go through the exercise that Andy just described, which is a real waste of your human capital. So what we see in our data science organizations is that most data scientists spend 80% of their time trying to figure some of this stuff out and you show them a data model which I fully contend that half of all data scientists out there have never seen a data model period. And you show them this and they go, Oh my goodness, this is great. Why didn't they show us this in school and the answer is because you were in a rehash statistical program that somebody decided where they're going to call data science this year. And you get a little bit more of a job bump from it, but it's not going to help you with business problems unless you've got that connective tissue they lineage the provenance that you are referring to Andy. Absolutely. Yep. Data modeling is not dead. No. So with external data dictionary and business class regaining traction where column names can be searched. Is it better to fully qualify column names, for example, on table like credit underscore card underscore account, naming a column account or sort of number used to be this users always search in context of column and table. However, these days since users can search a column and calibre allation etc would be better to fully qualify columns and call the above credit underscore account card underscore account underscore number because there could be other tables like loan savings account etc with its own account number. So that's just to the question of how far can you apply your naming conventions across the various types of models. And again I'll go back to my PeopleSoft model here at the very end. And you can see that PeopleSoft did not use that particular, sorry, you can see it and I didn't show you the slide, try that again. They did not use the same definitions all the way through if you're able and in a position to be able to do that that's wonderful. But Andy mentioned something as we were getting started on this as well. It's really got as much to do I think with configuration management as it does for the rest of this. So when you're looking here at the HR conceptual model that PeopleSoft has here you can see they're calling it skill and address. They're still using these names here at the logical level but as soon as we get down into the physical level, we've now got, well there's an address but address breaks up into city index and person and address pointers and you know all sorts of other types of things that are going to allow you to to get very much more detailed with this. I was part of an effort in the military, the US military at one point where we were going to name everything all the way from conceptual all the way down to physical with exactly the same name. And of course this was in the days of eight plus or minus three. Right, thank you Andy I know you're not. You know there's just no way of doing you could do it it would be a code in order to do but that means everybody looking at this code would have to decipher an eight or 11 digit code in order to do this. But tools keep track of that detail for you. So you can define something as an alias in virtually all of the case tools and certainly an Erwin and say, this is how it is appears in the physical but this is how it appears in the conceptual model so when you're having a presentation to management use this set of terms. When you use the presentation to the DBA as you're using a different set of terms. Exactly, exactly we can actually do that through a name standard mapping. Traditionally it was used and it's just funny to shift perspectives here because we've been kind of talking about building your data models to move forward but the more common scenario that I find is, you know, documenting what's already out there. And with our name standard mapping traditionally was used to take a logical object and, you know, abbreviate into SQL issues I call it. And what we have now is we can actually do that reverse we find a table called Cust, we're going to create an entity called customer, and it uses that same, that same mapping through there for the name standard mapping so again that can jump to the Cust. And as we look at the, you know, the abbreviated the code as it were. We can quickly translate that to the business terms and then, in addition to that what we can do is we can make the attributes having the same definition as the column column so we could leverage that already there to build out our business glossary. And in some cases, the attribute or the logical business term might have a different definition, we can accomplish that too. It's interesting, these models that I'm showing here, coming from the original people soft systems. I'm fully reverse engineered people soft and come up with a complete set of these models. And we're trying to get people soft to include them. And I can remember today, many, many years later Rick Burke was to the CTO of people soft at the time, that you know I just don't think our customers are interested in this, and I agree. No, but that's not his fault and it's not our fault but it's the university systems fault we have not told people the importance of the various modeling what the various types are which is why there's so much confusion about them out there and we've got to do a better job of it in, in educating both our data professionals as well as our business users just some aspects of this. Absolutely, absolutely. And I'd like to add to that. And just to emphasize the importance of data modeling in the industry today is Microsoft's common data model to build out their data lake so that all of the Microsoft stack will have a commonality throughout. And if it wasn't as important as it is I seriously doubt the Microsoft would have made the investment in the time to build out that common data model so to help everyone and and start pushing information into the data lake that could be used by other data systems. I will say it has been wonderful to watch Microsoft become smarter about this process. Many, many times in the past we tried to get them to pay attention to this I've been in and talk to their enterprise architecture groups and things and they're very frustrated because Microsoft is very much of the tools oriented organization. I don't have any idea here and I'm sorry I'm flipping the slides around trying to find the one I was looking for but I'll just give up on that but it is correct you're right and they've done a great job of sort of getting religion on this. I'm saying no there is something that you can put in place and map back to absolutely for the entire Azure stack in here and that's a very welcome to the development. So many great questions coming in and we'll get to as many as we can here in the last few minutes we got about eight minutes left. So what do you think of no code environments for it application development that generate their own versions of the data they use. You don't take that one first Andy go ahead I got kind of saw on it but I don't know good. So, the key with no code of course is that it makes it sound like you don't need anything this is the whole reason for developing some of these alternative models because people think that this process that I'm showing you on the screen here now takes too long. And that's the that's of course correct if you only do it once, but the key is to remember that this should be the basis for doing your planning when you start thinking about changing things in your it environment. The data are the most consistent, the most long running and the most stable components of your it architectures. One of the nice things about being on the planet as long as I have been is that I literally have seen organizations that have been around for hundreds of years, and they're still doing the same basic things. And I have seen organizations create data models that are still being used years and decades in fact later, the most one that I'm most proud of that I was associated with was the DoD data model that we created in the early 90s that is still being used today. 30 years later, and that was a tremendous accomplishment that was put together and that these these models have the ability to understand a particular aspect of the business. Data modeling is not going to answer certain questions. It's a question. It's a way of answering certain data questions. But there are other types of questions it's not going to address and but many other types of documentation have been developed and are used in these areas but for some reason, there's just been a dearth of information around data. And we need to get better about that whole process again just take the idea that half of all data data scientists don't even know what a data model is, and you start to get an idea of how serious this real disconnect is occurring. Yeah. So basically no code it's generally an idea that people are saying, can we get some predefined components to put out here and I love to tell this particular story on the QuickBooks online user. And QuickBooks online was in fact developed from the QuickBooks server models in order to put out. But interestingly enough, in the QuickBooks online, there is a kind of a missing key feature that a lot of accountants are really mad at about. And that is that there's nothing called a transaction ID that is accessible to users so for example somebody who's looking at a book might want to say I want to follow this transaction back to the journal so I can see exactly which columns did not hit and how it was processed internally, and then QuickBooks online you're not allowed to access that feature it does not exist, because it didn't physically exist in the server versions although clearly it has to exist or QuickBooks online, or all of the QuickBooks pieces could not function. So it's a wonderful example of how these things are very, very confusing to people and causing enormous challenges out there in the business community. All right, so we've got four minutes left so it's see if we can get this in four minutes. Sometimes I'm confused with business class read it is a place is it a place is for definitions of all attributes. For example, if there are 10 tables and each has 10 different attributes should all the hundred attribute comments exist in the business class read. And does this mean like the modeling tool should not store the column comments, and the column comments should not be exported to the database instead only place every attribute comment should be is in the business classroom. And do you want to take again with this one. Yeah. Now you definitely want to store the column comments and export them to the database because this way we, and we build that breadcrumb trail backwards to so we'll enforce what was coming from that the attribute definitions if that's the case and move them down on to the new model that that will help in the future as you're building new models or even if you're just maintaining your existing structures you go out there and it's going to consistently come back and be in that model definitions may change over time to, although they're very, very rarely will they but you know there's a there's an opportunity and then you want to document it on the disk you want to document it in your model, and then you also want that the documentation to exist in the glossary. I would absolutely agree and add just one little piece that I think the perhaps the user was looking for. And that is, at what level do we start to notice does a glossary include attributes and does a dictionary include definitions and you know where those purpose statements that Peter was talking. The key to this folks is that you want to have one place. You've got two of them, you've got two watches, and two watches are generally not a very good thing so, regardless of what technology that you're using every one of the well done case tools out there will have the ability to store this type of information. And what you want to do is when you find it to a reverse engineering exercise make sure it stored in someplace when you create a new component for that. It's stored in one place and put it out there. And so I really object to the idea that people saying well glossaries are only for business definitions you can't put any technical definitions in the business glossaries like all business glossary is is one view into this larger database concept that Andy was describing. And when you add them is when you get the information. If you get the information at it if it's not validated market is unvalidated and you can do something else with you understand as the candidate and be able to move forward with it there. You know, oh, no, this thing says only only attribute level definitions you know we're only only relationship level definite that's silly. Right. Yeah, yep, yep. You get some factual keep it. Exactly, exactly. And I got one more question here that, and I guess we can run it wrap up Shannon but from from use if can the process of transitioning from logical the physical modeling be automated. The urban data model or we will have a logical physical model. So, so you want to look at your entities and you want to look at your, your tables in that same model you basically just flip from one side to the other. So we don't necessarily need to derive from each other when we point the data model or add a straight data structure, we can create that logical physical model, and very quickly have the business terms set up with the objects, etc. But there's no, there's no automation, particularly there it's just basically reverse engineering and creating a logical physical model. Well Andy thank you so much and, and yeah so I'm sorry y'all, we're, we're out of, we're, we are out of time there Peter did you just want to add one more thing there. I'm just going to say that the other part of it is, when you're looking at all of these bits and pieces, you're trying to the tools will pull back as much information as you need to get. In order to do that you can filter this and say okay I want to look at this model view or only these three entities or whatever it is, but it's a way of helping to manage the complexity that's on the other side of it and they just do a fantastic job in terms of managing all that complexity. Thank you both for this great Q&A that is all the time I'm afraid that we have for this webinar. Again just a reminder to everybody I will send a follow up email by end of day Thursday with links to the slides and links to the recording of this session and there was a request for there's lots of resources there and Peter slides as well. And thanks to Irwin by quest for sponsoring today's webinar and helping to make these webinars happen Andy it's been a pleasure having you join us this month. It's been great to be in here. And thanks Peter. Hope you all and thanks to everybody and all the attendees for being so engaged and hope you all have a great day. Thanks everyone. Cheers everyone. Cheers.