 Hello and welcome. My name is Shannon Camp and I'm the Chief Digital Officer for Data Diversity. We would like to thank you for joining today's Data Diversity webinar data modeling fundamental sponsored today by CouchBase. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we would be collecting them via the Q&A section or if you'd like to tweet, we encourage you to share highlights via LinkedIn or other social using hashtag data Ed. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And to open an access either the Q&A or the chat panels, you may find those icons in the bottom of your screen for those features. And just to note the Zoom chat default to send to just the panelists, but you may also change that to network with everyone. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise in a link of the recording to the session as well as any additional information requested throughout the webinar. Now let me turn it over to Matt for a brief word from our sponsor, CouchBase Matt. Hello and welcome. Hi there, Shannon. Thank you very much. I'm going to go ahead and share my screen here. If I could find that button. It was there earlier. We are testing it. Oh, there it is. There's green right in front of me. It was a snake. It would have bit me. So I just want to say a few brief words about some JSON data modeling and CouchBase here before I hand it over back to Shannon and Peter. My name is Matthew Groves. I work for CouchBase. We think about modeling. Most developers are probably familiar with the relational way of modeling data. I'm talking physical modeling here, of course. But if you want the benefits of a non-relational database, you've got to think differently about modeling. JSON is flexible. What does that actually mean when we say that? What's the benefit of that? And one of those is the ability of developers and DBAs to adapt to requirements. Adding a single column to a database, a simple column like country ID even, can still be a complex risky operation often requiring downtime or at least putting in read-only mode. Making that change in a document database does not require that alter. As a developer, you can just start adding it or updating existing data. It does not require every document in a given collection to have a country in this case here. This is just a minor example. But if you extrapolate to a schema with hundreds of tables and the often changing business requirements, the flexibility benefits can start to stack up. There are many things to think about in JSON modeling, but the most difficult decisions often boil down to, in my experience anyway, embed versus refer. In the relational world, it's always refer. Foreign key, separate tables, and the overhead that goes with that. In the JSON world, you can choose one or the other or both, and there are benefits to each approach and you can use a combination. And further, Couchbase supports both choices or combinational choices through SQL queries, joins, and asset transactions. My recommendation here is, of course, all these bullet points are good to follow, but if you're in doubt, try to embed and consolidate first. Tooling is important for collaboration, documentation, and of course, achieving common understanding of a model with your team. So there are no SQL tools out there hack-a-lade or when you may have heard of before, may have accessed that already. But you can just get started with something as simple like a JSON editor online.org. It's a nice no-fills offering. And Couchbase has IDE plugins available for IntelliJ IDE. So if you're using any of those, you can get the Couchbase plugin. And that can help you with a relationship mapping tool built right in, among other cool features, right there in your IDE. And one last word before I hand it over here. Another issue that's important for determining how you build your model is access patterns. And there are lots of different point solutions, lots of database choices out there that offer different ways to access data. And if you keep using these point solutions, you're going to end up with something called data sprawl. And there's data duplicated across the enterprise and different tools and the pipelines between them. For each of these, you need to learn a different SDK and you have to patch, upgrade, scale, manage, license each of these tools and the SDKs and the pipelines between them. All those lines there between the different tools in your applications. And so this is one of the ways that Couchbase and Couchbase Capella is helping our customers to reduce cost and complexity. Now at the core, Couchbase is a JSON document database with a built-in cache. We like to think it's the original multi-purpose database, but it also supports all these access methods and use cases. So just a few of my favorite SQL queries. Yes, SQL for no SQL database, full text, geospatial search, eventing, analytics, mobile sync, time series and more of those things. To break down the benefits, we like to say Fave. Fast, affordable, versatile and easiest SQL. And we want to be your new favorite database. So check it out, couchbase.com. Thank you very much for letting Couchbase participate in this event. I'm going to hand it back over to Shannon. Thank you so much and thanks to Couchbase for sponsoring today's webinar and to help make these webinars happen. And if you have any questions for Matt, he will likewise be joining us for the Q&A portion of the webinar at the end of it today. Now let me introduce to you our speaker for the series, Dr. Peter Akin. Peter is an acknowledged data management authority and associate professor at Virginia Commonwealth University, president of Danima International and associate director of the MIT International Society of Chief Data Officers. For more than 35 years, Peter has learned from working with hundreds of data management practices in 30 countries, including some of the world's most important. Among his 12 books are many firsts, starting before Google, before data was big and before data science. Peter has founded several organizations that have helped more than 200 organizations leverage data specific savings, which have been measured at more than 1.5 billion US dollars. His latest is anything awesome. And with that, let me turn everything over to Peter to get his presentation started. Hello and welcome. Hello and welcome to everybody. Shannon and Matthew look forward to chatting with you more in about an hour. What I would urge you all to do is to take into mind what Matthew talked about, particularly in the area of complexity. One area that we absolutely agree on this and this is one of the main reasons data modeling is such a fundamental but underutilized and underappreciated set of domain skills in this area. You might learn some of this during your college and university degrees, but at the same time, we find that it's being taught unevenly and the emphasis is really a lot more on syntax rather than business value. Let me start off here by quoting a very dear individual who we revered in the community. He's a guy named Fred Brooks and he wrote something called the mythical man month, which was a good English explanation of why you can't actually do certain activities in parallel they actually have to have precedence or dependencies involved in them. And he also made an observation, which we found extremely useful data representation is the essence of programming said he show me your flow chart and conceal your tables and I shall continue to be mystified, but show me your tables and I won't usually need your flow chart. And that is really the sort of paradox that we're faced with today is that data as such an important aspect this particular piece, putting together a shared understanding of these arrangements is very, very difficult. So let's dive in a little bit. Again, the goal of the understanding is to make sure that we share it between business personnel between technical personnel and between the systems. So we have to have all three of those things line up in order to have communication actually work in the place where you line it all up is in some sort of trusted catalog catalog being one word to describe business glossary there's another whole list of these things that goes other places and makes other things but the point is you have to literally be all singing off the same set of definitions, or if you're musician at the same piece of music here. And there's another challenge that goes with all of this as well. Now, I'm just going to declare I do not know Seth Meyers, I'm a fan, but pretend at one point that I gave Seth Meyers my business card, and he turned it into a joke. So this is the joke Seth Meyers would have turned it into. And I tell this mainly because it's something we also have to keep in mind about data business cards really we're still doing this. I hope your business is waste management because this is going right in the garbage. Given someone a business card in 2021 is basically steampunk. Great. I'll give you a call when I need my cotton gin repaired. Thanks for the business card. It's a great way to be sure I'll remember you in six months when I'm cleaning out my wallet. Dinner receipt, dinner receipt. All right. This... Call me if you ever need data solutions. And what are those? You'll have to call me to find out. What I think business cards are good for is to put in that fish bowl at the diner to see if you can win a free Rubin. Hey business cards get bent, you burn. So while Seth of course has a great sense of humor about all of this, it is more difficult for many people to understand why we need to do data modeling because they don't understand it. They don't tend to teach it as well and that becomes a problem. So what we're going to cover in the remaining time that we have together for this session is what is data modeling required for. And then section two will be why is modeling required for data understanding. In other words, you need modeling to do a lot of things and then we also need to use it to diagnose the way data comes out and then how to use models effectively. Then of course you get to the part where we invite Matthew back on and go for some Q&A around this whole subject. So let's dive in right away and take a look for just starters and say who out there knows what the number 42 means. And you may be saying to yourself, wait a minute, why do I care? Well, it's got a good story here. So the story is the number 42 is a prominent character in the Hitchhiker's Guide to the Galaxy where the white mice and the dolphins who turn out to be working, running the world in Douglas Adams mind. They also decide they'd like to find out the meaning of life. They create a gigantic supercomputer. It ranks for 300 years. Excuse me centuries. I apologize. Centuries and says the answer to the meaning of life, the universe and everything is 42. Now, I tell you all this not because you all need to go read Douglas Adams, although it's certainly ordered you to do that. But that everywhere I've gone around the world for the last several decades, at least one person in the room knows what the number 42 is. So try it sometime. You'll be amazed how it works. I know we're headed towards a model distinguishing three important concepts. Bear with me a little bit. I'm going to give you two other definitions of the number 42. The next one is obviously you can see the Jersey on a very famous baseball player and also movie of the same name. Great, different meaning for the number 42 there as well. Take it one more step further is Peter me allowed to consume adult beverages in the Commonwealth of Virginia where I happen to reside. And the answer is you take my age to track 42 from it and you get the number 23. All of these things may seem rather esoteric and why we would be interested at all in them. I've given you a series of facts. The fact that what's the white mice and the dolphin think the number 42 is the meaning of life that the Jackie Robinson Jersey is 42 and that the age for drinking alcohol in Virginia is 21. The question of what you're doing and how it becomes data is that we have to pair that fact with a meaning. I just gave you three different meanings for the number 42. None of them is correct. That is where you all are experts, you know, understand your business is better and understand what the facts are that are relevant. And what meanings you compare against them each fact and meaning we used to call them a daytime individually. What we really need to do is distinguish between data and useful data because of course, once again, outside of this call here you're probably the wisest data person in your enterprise as you go through this. So you're looking at useful data. I say why is this I mean that in the sense of understanding data on this lots of other attributes as well of course on all this but fact and meaning right so 42 meaning of life 42 Jackie Robinson's Jersey number 42. Peter's allowed to drink in the Commonwealth of Virginia if his birthday was after a certain date in time again these are all different facts useful data. We also then have to distinguish in almost all cases what's the difference between data and information. And the answer is quite simple. When data is requested it becomes information. That's pretty easy pretty objective and actually quite useful grounded in the library science community around all of this and observed by many this is just one particular quote you can have data then without information but you can't have information without data so now you've got objective definitions of both data and information and we're going to take it one level up as well and find out this last thing which is that everybody's been calling knowledge systems or wisdom systems or intelligence systems be I all these things is that next level up top of it. How do we use the information that we have requested to be turned over to us as data items in support of the mission of the organization that is the definition of intelligence so now you have three different components up there that you can look at and all of these are arranged into in this instance a model we're going to call it architecture for the purposes of this but it is just as easily a model in this instance here showing how these three terms often misunderstood can in fact and have been defined rather well since 1983 but once again I'll say that the the lack of focus in these areas has led to more confusion perhaps than not on this. Why do we need to look at this modeling in particular the answer is because systems evolved over time. As we built systems in the old days and even today we still build a lot of systems where the data is integrated with the actual application so you've seen the payroll data the marketing data, etc, etc. And everything's fine as long as everything stays in its own silo silo and told is a. An acronym for single location and much I believe that one at all, but you can see how it gets to be troublesome when you start to exchange data whether it's cloud based or otherwise you still are starting to move things around. And what organizations eventually realizes the complexity that Matthew was referring to earlier becomes very problematic from a data debt perspective will jump on that in just a little bit. They say what we've got to do is put the data in one place now one place could be one place conceptually what they really mean is really more of a branding exercise and physically moving the data but it's getting to the point where we know at least the characteristics the quality levels the. Permissions that are rounding data in this particular place we start to gather the data in one place it becomes called organizational data. And eventually we can go back and re architect in our organizations because now the marketing people understand what their data needs are and also have access to higher quality. Easier to find easier to serve up better understood data around all of these. Well, this is the key to understanding how to serve the information needs of the organization and what your organization unfortunately faces right at the moment. Is kind of akin to a puzzle like this and Matthew had a very similar diagram up. There's a mathematical formula that I'm putting up in front of you with it shows how many interfaces are required if you want to connect every application to every other application. This represents the upward theoretical complexity and in this instant six applications would require total 15 interfaces if we wanted to connect everything to anything else now, like I said the upward theoretical complexity is one thing but your your answer is going to be that between that and nothing. Now, one of the groups I worked with the Royal Bank of Canada, gave me access to its data and told me we could reuse it from several moons back on this, they had about 200 major applications and about 5000 batch interfaces between them connectivity. I'm showing you here. So when you look at that we can look at these numbers here on the left hand side where I vary and up from six to 600. And you can see it's a quite large number I only grafted at 200 and the fact that Royal Bank of Canada has 5000 interfaces at the time. It means that they are below the theoretical complexity so they're in understanding relatively good or relatively typical shape for organizations at that point in time. So hopefully that gives you an idea this complexity is something that we want to eliminate and eliminate point to point integration as much as we can databases play an excellent role for that and that's certainly one of the ways in which you can evaluate this. Another one is the architecture component in here and this is just showing a hub based architecture that only requires a new application application seven to have a new development for it. If it represents a new type of data being integrated in with the hub. So it's easier, but it's never really easy around that. Let's take a look at data models. The key for them is to understand this is the best way to capture the most objective way the most testable way of capturing and maintaining the formal system data requirements. These models are critical. They give you the structure that is regarded as complete and contains again the things. The definition of what those things keep track of in this case it's creeping track of data at rest and how those are interrelated with the other things that represents architecture at its most basic concept in here and it also represents an important aspect of systems in general because all systems are comprised of people processes hardware software and of course data. The challenge around the data. This is a wonderful thing to discover after being in this business for decades and decades is that the data is the most stable component in your systems over time I've gone back and visited organizations that I first met 3035 years ago, and guess what they're still managing the same data but data almost always has taken a backseat to software and of course lately the app development that is clearly sexier than doing this but also represents the biggest challenge to coming up with delivering good data quality around this. The idea of incorporating organizational business rules into an app per app is not good we clearly want to maintain those from a control perspective in a central place and just a simple question I've seen organizations get twisted around the maples where I was going with that get a project be owned by more than one department becomes a crazy, crazy discussion. And if the data model doesn't reflect that policy discussion, then all interpretations have to be suspect around this. So key is that we want to minimize the interconnections have few hubs as possible. They do give us a single point of failure if you have one hub it's probably too few, but you can steal and manage them but the key is the data model is at the heart of every hub. The only reliable way of conveying the enormous amount of information required to run these kinds of things in this type of integration, whether you're doing it with a database or other types of technologies that are in here. It also allows us to objectively determine and infer what can be determined back and forth data modeling has not been considered a necessary it skill. And quite frankly it is potentially possible that your organization can get by by renting it, but it's safer to have it in house at this point in time because business and it people are really not knowledgeable about data decisions and it has typically assigned responsibility to data. Well, excuse me for data to the business whereas this confusion has reigned for decades and data has accumulated a series of data debt as a result of all of this. So today's it environment has a number of different software apps you're at least starting in the cloud now if you aren't already out there, and maybe you're in the environment of the wonderful world of multi clouds right. But there's probably some stuff left on premise in here and when you look at this kind of situation here. The organization data model is the map is the only thing that tells you what goes in and out of these packages and across and how to integrate for a minute it's literally the key to unlocking the IT environment legacy is just anything that's in production so I don't mince words around on that. Let's let's back up a little bit further model is a representation of something that exists or pattern for something that's made again Matthew had a good definition as well. It can contain diagrams it can use standard symbols in data modeling in particular, we're going to call this the product and the process that we use to define and analyze organizational data requirements. Resulting in that integrated collection that I spoke to earlier employing this standardized root symbols that we use in order to do this. For the process of data modeling that is we're understanding discovering analyzing scoping data requirements and then representing them in a precise form called a data model that is used to design the data structures that support these specifications. And I know I had to take a breath in there to do that because it is a bit long but this is what is underneath all of it so we like that because it literally does provide this foundation to be a stable for organizations as possible. And stable shareable data means that it needs to be thought of as evolving rather than not being paid attention to and then all of a sudden somebody coming in and reengineering it on a regular basis makes for some horrible messes that I've seen out there. Data models are required to share information about data. This uses mutually understood definitions and becomes candidates for what we call enterprise standard data. They support the data supports the stable business models that are the skeleton of the business they bones if you were a gardener around that they have been this most stable structures and mentioned earlier they are required as a condition of employment. You cannot go out and build your data products without having some sort of model and sometimes it's just a file definition and that's actually oftentimes completely insufficient. Many are not as useful as they could be organizationally but they are considered a basic part of any system documentation which means if you're employing the services from the cloud or other types of vendors, you should get a copy of that data model from them. Oftentimes it is missing. I actually recommend organizations now that if the organizations that's proposing to do business with them can't supply a data model of it that they should probably look elsewhere to take their business around there. Data models are also used to support strategy in the same way as architecture supports strategy, the more flexible and adaptable your data structures are the cleaner and less complex code can be created, which ensures that the strategic intent and measure instrument can be rather easily built incorporated and that future capabilities can be built in as well you may start off being restricted to one country but you want to build in multi language capabilities to your app over the long run in order to do that. These give you the ability to understand merger and acquisition targets, which is left off of so many times I can't believe that private equity world doesn't pay more attention to these things because they help these great, you know, I've got a million customers and you've got a million customers and we've got one set of systems. Let's put them together. Well, we can actually evaluate that. I'll give you a much less. But easier example though than a margin acquisition here. Here's a data model for a system that actually was relevant back in the time I first met Clive at this point, which is how we ended up with this, but that we were going through a recession as we periodically do in the in the world. And they wanted all managers to be salespeople but this data model did not permit the manager to be a salesperson so I had to have a second ID as a second salesperson so that I could meet my share of quotas for the corporate office that was there again the data model did not well support the organizational strategy and consequently it was very, very difficult to make this. Well, again, the idea of increasing the understanding of systems to humans. This is how the data model is the currency. It's precision. It's on the data model. It makes it into the actual production system is not on the data model. It does not. It talks about achieving simplification goals that are in there that you can do as the basis of that. And again, that's going to hit some of those points later on as well. There are some alternatives, but it's good to know the basics on it in here. And this gives us agreement within that and understanding of the ability to deploy these models in a way that does support the organizational strategy. So why is modeling required for data understanding. Well, let's talk about what a model is. This is a wonderful diagram from Ellen Gottsteiner who made it many, many years ago, but you can see there's lots of different ways of using a model. And it's a wonderful, wonderful set up tools. In fact, another way of describing the same thing comes right here. I'm not sure why they played stairway to heaven behind here. But what's the little balls as they're doing? This is a physics demonstration, but models then represent expertise. You can store and formalize information, filter out completely extraneous details, define an essential set of information, help understand complex system behavior, monitor predict conditions, communicate more effectively, both from a novice to an expert perspective, but also from business to technical. And I mentioned as well the last loop that you want to make sure is that humans also communicate correctly with the software. There is streamlined documentation monitoring and predicting system responses as they change, gaining information from the process of developing and interacting with the model, evaluating alternative scenarios, outcomes, understanding behaviors, illustrating patterns, and meta patterns with on all this. Are you tired of stairway to heaven? Yes, I hope so. But I do think it's a wonderful illustration, because imagine if I had tried to describe to you how those balls were hanging from the string there. It's an absolute perfect precision and there gets you past the really boring parts of the demo, but at the same time it's a little bit tongue-in-cheek as far as all of that goes. Certainly not the stairway to heaven anyway. That may be one of our follow backs. So why do a model? Well, first of all, let's switch our focus here from data to a house real quick and use the same analogy. Why would you want to build a house without an architectural sketch? Well, the sketch is the one that says the house is worth a million dollars or whatever it is that you're planning to spend for such a significant investment in that. Why would you have any idea how much it would cost if you hadn't at least seen a picture of it in there? So it gives you a very good idea that if you're going to build the house out of Coke bottles, which by the way Coke bottles are great insulation, plastic Coke bottles are great insulation, but it's again the parent's part and the strength part that are the issues there. The model gives you an idea of how demanding the work is going to be. If you hired construction workers from all over the world to build your house, would you like them to have a common language? The model is the common language and I can tell you as most Americans, I know exactly one language and I have about a three-year history of trying to learn Spanish in high school and that. I can say hello and thank you and where's the Lou and about 12 other languages beyond that. But I've seen people work on models who do not speak common languages and it is in fact quite good to be able to focus on those pictures. Would you like to verify the proposals the team gets before? Yes, the models can be reviewed. If it's a fantastic house, would you like to do it again? Yep, the model makes us give us ability to redo it in case of disaster or in case won't be duplicated. Would you drill into the wall of your house without a map of the plumbing and electrical lines? And the answer is no, no, no, you need those blueprints. So for all complex systems, they have data models. The question is whether you know them or not. Because if you don't know them, if you don't understand them and have them documented, they cannot be useful to you. I'd like to take us now to the princess on the P reference here. Here's the P down here that hand Christian Anderson said kept this individual sleepless up here at the top because it was a ruining the sleep. Well, unfortunately, if you get a data error in your system somewhere, it can either be a data error of itself as in the data needs to be cleansed and it may take literally a half a decade or so to get beyond certain problems that you have in terms of fleshing things through, but it also can be a structural problem and you can be stuck with that structural problem literally forever in there because not understanding the way in which data works. And it locks in imperfection for life if you don't have a good data model at the center of it and you'll talk to many people who'll say yep, we've seen that. It restricts the investments that you might get and decreases the organizational data leveraging but it accounts overall for 20 to 40% of it budgets that can be devoted, excuse me that are devoted to migrating converting and improving the data all the way around it. And back to where we start data lack of data capabilities that wrong data models cause the projects to take longer costs more deliver less and present greater risk. Thanks to Tom DeMarco for that wonderful list there. I'm not sure who invented the term data debt, but it is equally as cogent here. We can see looking in this particular place at this, you know, example of a mass is a problem, but you can't really see how data debt slows your progress decreases your quality increases your risk and causes things to slow, given that I give you just one example here as a easy one this is a query that somebody was doing at some point in time and they had simply never known what a query optimization process was. And so we did a little optimizing with them and came up with a better running query. Fine, great, no problem and everybody says, so what well we took a quarter of the second off of that query but it turned out that that query ran actually billions with a B of times each day those quarter seconds add up. And many people try to describe this by saying, it's kind of like death by 1000 cuts but first of all, nobody dies. So it's not the right way to describe it. It's very better to say you've got unnecessary discomfort from lots of small cuts but more to the point where Shannon started off a little while ago. My lifetime total of helping people out with these challenges is over a billion and a half dollars saved around them. Let's keep going in on this data modeling now. This is a process of discovering analyzing and scoping requirements. We'll talk about the very specifics there. It represents a communication requirement in a precise form and the process is iterative and may include conceptual logical or physical models. I'm going to theme off of this page here for just a second. So here we go with the process of discovering analyzing and scoping data requirements for organizations. There are people, places and things. Those are the main categories of nouns that we'd like to understand in our analysis process and the information that we're gathering around these needs to be created, read, updated, deleted. Some people include archive. Each of these pieces are called attributes. They are characteristic of those things and they move us towards it. Here's an example. The attributes here are ID, description, status, sex to be assigned and reservation, reason. The thing ID means we need to be able to identify one unique instance of them out of all of the other instances. And decisions about this mean that we've made some requirements decisions around them. For example, the organization is determining that female things have to be available to be reserved. If we don't, we're doing something that isn't correct. All things have a status that we can look up. Many things can be assigned to females as well as other genders that are there. ID permits that distinction. Description gives us a ability to go in and describe each of the things individually in there. These things represent communicate requirements in this precise form, the data model. So the attribute is the characteristic of an instance of business things. Again, here I have an attribute club ID that tells us something about this data collection. In particular, clubs need to be identified separately from one another because we use a little pound sign there to tell us that it's a primary key. Club specific information is therefore likely maintained and that some concepts, some organizational level exists above the club level in our abstraction that we're looking at. So here's club ID in the context of what several other things attributes describe an entity in this case is a current promotion, a maximum period for obligation. I won't read these through to you. They're in very large type for everybody, but it gives us the ability to understand specific information about club number one versus club number two. And then we need to organize these things in a way and what we're saying in this diagram is that there's a relationship between club and club member. But we can't associate them directly together because that's a many to many relationship. And so that becomes a challenge for us. This is just one of the types of problems that if you see people poo pooing these saying that they're not important or that you can fix them in production. It's just not likely to be the case. So you'll see what I've done is do a connector that goes in right there just to make sure everything connects the two of them in order to get together. There's four different notations that people use. This is something that our colleague Peter Chen developed. I believe he was the first to publish it. Randy Bachman put together one that was looking like this. They all tell you essentially the same information. James Martin had this style and then this is done by Clive who it's called information engineering. This is really the main thing. Just use most the one most people use which is information engineering over here. But if you see any of these other models you go into an organization and they say we're a Chen organization here. It's okay. None of this is rocket science and you can figure out the translation very very quickly. Again in this context here we're looking at from an information perspective or these are the possible ways in which it's not just that information is related not just entities. Groups of attributes are related to each other. It's a related to each other in precise ways of taking a look at. So here's just an example. A bed is placed in one and only one room. It's many because it's got the circle and the one right at the top. A room contains zero or more beds is the rule here. The bed is occupied by zero one patient or a patient occupies at least one or more beds in there. Now these may sound nonsensical but they were developed for a particular reason or they were developed badly. And it is important to find out which one we're talking about here. So how does a patient occupy more than one bed. I'm not really sure what if the bed is moved. Then we have a challenge around that as well. What is the room actually we'll have to come back to that in just a second. Again, representing and communicating them in this precise form you're seeing here is the data model. I have a thing and I'm connecting it to thing one excuse me thing one to think to in this case. Each thing to must be accompanied by a thing one. This is what gives our models what we call structural integrity, the ability to make sure that things work the way the business needs them to work. Here for example, as a relationship at the first level it just as a bed is related to a room, not terribly precise. If I put some precision on it this says many beds are related to many rooms and recall a few minutes ago I said you can't really put together a many to many you need a special constructor in order to do that on here but look at let's see if it works this way. Many beds may be contained in each room in each room may contain many beds. That's the most precise of all of these that we've put together in order to look at this what if beds can be moved once again it becomes an interesting question around that. We do our data modeling in response to organizational needs and these needs become instantiated and integrated into the data models. The models then articulate and authorize system requirements that then go back and we help to reassess the organizational needs but data modeling isn't an iterative process that is developed in response to specific needs, but we also need that trusted catalog in order to be able to bring that forward. So by iterating means the next time the requirements change and the business process requirements change on a regular basis, as you end up going through this process and becoming good at the process of doing these data modeling cycles as opposed to trying to get good at it and keep it up as a constant need because hopefully your organization isn't changing that much. But it is particularly important of course for new systems development or anything that's going on in the area of data science which all represents new development. Each data arrangement then is a data structure. Here's one here that just calls the relationship specifically between customer sales order sales order line and product and what you see there if you look at product order and sales order. It's exactly the same construct I put together between club and club member. One of those things that if you've been doing this for a while you can see it over and over again so the characteristics of data structures. Are they provide a grammar in order to do this, and these give us constraints for the data object forcing them to be unique, moving them into an order if that's appropriate. Again, lots and lots of options that are out there in terms of looking at it. We want to get of course the right balance and optimality in there as well, taking on the way in which they work data models are put together in small components and these components as they're put together, then become larger. And they are more intricate at the detailed level these detailed small components. When we go to the next larger build up though that is the idea that we're getting into larger components and this gets us into dependencies and the larger components the larger models are organized into architecture, which gives us purposefulness. All three of these words are incredibly important to data modeling. So, the attributes that we organize into the business things again they're persons places or things that we deal with in our organization the nouns basically of what we're doing. This represents the intricacy and those intricacies are represented as business rules around all of this. There's several examples here's the one I showed you a few minutes ago thing ID description status sex to be assigned in reservation was risen. Reservation reason there we go get it out entities and objects are then organized into models and I mentioned the key word there is dependencies. So we've already seen an example where if you don't have one of these you can't have one of those or you must have one of these first. I'm sorry sir we can't find your record so I can't add your points to your record. You know it's the kind of request you would get from that and purposefulness. Again, architecture should be organized purposefully but they're very difficult. Here is, for example, a picture of an architecture or sort of actually return it right side up and then shift it from left to right now it's facing you in the right direction. And it's still unhelpful although you at least can see that whoever's done the work on this and I've had actually a bunch of my old great here to friends call me up and say, where'd you steal my data model from Peter. So you can see there's an orange part in a green part in a purple part and again these are ways in which people maintain this type of information ideally you'd take this and put of course into a case tool so that you didn't have to look at it on a piece of paper and try to figure out what's actually going on. So if you haven't heard of it, this is the Dama Dimbock and if you Google it we come right up on the first row which is wonderful to see that we've gotten to be this kind of an area it's the goal of showing practice areas within database and data modeling is one of those practice areas and you can see it also includes the analysis database design implementation and additional data developments that are being put together around all of that. So modeling is required because first of all you want to use models in the way they're intended the models the idea that you're going to preview prototype somehow. What you're trying to do it's no less valid in data modeling it requires precise agreement. It requires a way in which organizations can precisely understand which means adopting one of those four codes of representation that I described earlier. There are suboptimal data management practices that that force organizations to deal with data debt met data debt is very costly and accumulate data debt is what counts for a large amount of the savings that I described earlier. These modeling resources really must be correct and perfect in there and they should be considered a primary internal development context. What do we really need it's the way in which you'll learn what your legacy environment looks like and very good to have at least somebody on staff that regularly understands how to document and understand these data structures. So let's go to the last chunk of this which is how to use models effectively. And again you can see there's a couple topics will come back and review them at the end but where are your data blueprints and in most instances here, you should actually have those blueprints be approved by some group within your organization. It is real clear that if you're doing engineering blueprints there are boards and ways and supervision that is required in order to get these approval stamps we don't have that existing right now for the data blueprints that we'd like to have one of the ways. We're trying to do all of this by hopefully all of you getting smarter about it and going out and preaching the Gospels of what small things this does but it does them very very well the idea of data blueprints so there are correct ways to organize data. They can optimize towards flexibility, adaptability, retrievability, risk reduction and that technique means that you can include data integrity that you have rules like just give you a couple of them smart codes good dumb codes bad sorry I skipped the next page too quickly on that. There we go. And again table joints will be our last one so let's just start out with smart codes bad dumb codes good just to give you an example and first of all apologize for this but it was so bad I thought I had to show this was an advertisement that the bell system used to tell us. And the reason well, let me play it for us. It suggests naturally a watchful feminine presence at a switchboard and the supplementary agency that in a few seconds can. Amazingly sexist type of thing and one of the more interesting things is the development of touch type phones, as opposed to rotary dial phones came because the bell system determined that every female in the entire United States of America would have to be employed by a phone operator. Excuse me as a phone operator by the year to 1920 or so. And so that is what prompted them to develop touch tone phone dialing. For those of you that remember that a long way but what we used to do was in my own hometown Richmond Virginia 804 was the traditional area code we've of course grown out of that but it was a N zero and type of algorithm and if a telephone piece of hardware saw that pattern it came through a new it was doing long distance, but we ran out of those numbers. Most phones have most people have multiple telephones at this point to considering your iPhone perhaps a home phone number, and perhaps an additional device that's connected to the internet in there. An example here again just all when they switch this they had to go back and reprogram all that equipment. A second one here was I had a very smart business school dean who was looking at courses that we had here at the business school and you can see they're labeled 360 through 369. And they wanted to add another business school course and the dean said but you can't. You've used up all your numbers. Now, wouldn't that be a really bad way to run a railroad. One of the things that happened is people don't understand all of these. Another one is a very large delivery organization that I was recently talking to and they were explaining that they had a primary key that they were going to have to expand. And it was required to acquire what birds of 100,000 system changes in order to manage it so clearly there are ways that this could be done that would have prevented all three of these disasters that we're doing it. If we're talking about table handling. It's equally as interesting tables a collection of data items that relate to each other. The representation of the database itself can be quite confusing. I'm going to give you an example. I'd like to give you something you could take away with. So this is the music app in itune. Excuse me in the Mac OS platform. I'm certain there's an equivalent in the Android world. And I'm looking at just a particular example of something I was looking at the Oscar Peterson collaboration that he did from Helsinki. So wonderful able to look at it. What it does if I inserted this as a CD into my keyboard. It's my old computer where it had a CD player. It actually knew only this much music about it. Sorry information about the music. It knew the start time and the stop time. So I knew the length of the song, but that didn't really tell it much else. So kind of an interesting way of looking at it, but it's a it doesn't actually store the link that stores the start time and the stop time, which is infinitely more flexible than actually storing the physical length on this. So again, as I mentioned, it used to be called iTunes. It's now called music. Here's a way you could have put the data together if somebody who was uneducated in data tried to do it. One information might be lost, for example, if I lose record number one. So if I do that, it's what's called a deletion anomaly. And that means that I've taken that record out, but I've also lost the fact that purchaser number one purchased a song called cool walk live. Again, if that's the table that I have that is the result. It may be unintended and undesirable. It's usually undesirable. There's another thing called an insertion anomaly. What if I want to add a new song suppose I want to add a new song called cake walk, and it costs $1.29. This is the first fact. The second fact I have to have both in order to add it. It requires somebody to buy cake walk live so I can't add something to that row until a future purchaser buys a copy of the song. Again, undesirable, unintended update anomalies are fairly straightforward as well. So I want to change the price of cool walk from $199 to $1.29. Well, if I have all of them in a long file like this, I have to go through and review each and every one of them and change the file in order to do this. Also, that will not catch spelling errors because notice cool walk is spelled both correctly there and incorrectly down here, which means whatever I did would miss that particular inference around it. So how should it all be done? Well, we want to bring the tables out and store as much as possible one fact per row. And this is really key for large volume transaction designs. In order to do this, the original is up top, but not broken out one below. Again, the two pieces are done by joining the tables. So data from the two tables are joined. There should be one instance of a pricing record for every instance of a purchase. And that type of little structuring there leads to a whole lot better, more flexible data structures than before. There are many other aspects of good data design. And I'm books that will reference towards the end here that will be helpful to you. Let's turn our topic to the definitions. Now we've talked about them for a little bit. Again, a bed as far as being a metadata well a piece of furniture. I might want to use in order to do that. So I'm on a case tool a tool maintains all the definitions a glossary something you sleep in. Okay, very, very interesting. And here's one I encountered when I was at the Defense Department here. A bed was a substructure within a room a substructure of a facility location. Do you see why I was using rooms earlier on my examples. And again, it's got a purpose statement it's got a source where it came from it's got an association very poorly labeled but it's still clearly one or many to one in order to go into the room. In addition, however, we'd also maintain the quality, excuse me, the attribute about the data element, whether it's been validated or not if it has not, you can label all of your models draft. And that's very important because management has to give you the resources to turn it from a draft version into a fine final version of it. But here was what was interesting in this particular instance we discovered. Which the page on you I discovered that the they were going to put a transponder on the bed and they were going to use the room to keep track of the beds. And we wouldn't have discovered this we hadn't gone through the data model with them. The key bad idea this of course was nobody had a good answer to the question. What room is the hallway. And of course that became a real interesting challenge around all of that. Here's one way in which data modeling extraordinarily helpful to your organization. This is an insurance example, just defining in this case for specific items that a large group of people a work group of people beyond the physical work group. All get together to use and I've seen these things placed around organizations where it just says, look, this is a cheat sheet for how to use this type of it how these things are related. And by the way, this is the corporate reference to the term so it is not officially just something on the side. Here's a quick data modeling example what you see here is that there's nothing structural preventing automobiles from being represented to multiple customers. That might be an issue if you're trying to do something along this. Here's another one here says pricing is not included in the catalog. The pricing is on each line item which means if you were a customer of this organization, you'd be pretty sure you had to do some negotiating in order to get the pricing that you wanted to be able to get. Again invoice is not determined. It's not possible to determine what part of an order and invoice pertains to because of the way the data model being put together and you'll hear people make grumbles about all of these line items are particularly separated from the catalogs. Here's another one for disposition. Again, just a hospital example, but I'll say there's got to be a relationship between each admission should have a discharge. Okay, well if that's the case, then what that means is death must be a valid disposition code. Boy, we had some disagreements in the hospital on that. It was a fascinating couple of sessions around that. I'm going to introduce something here now that says not just data models but three types of data models. So if somebody shows you a data model, your first question for them should be what type is it? Is it conceptual? Is it logical? Is it physical? Conceptual is the highest level of abstraction. Typically does not have a lot of attributes and details to it. Logical might still have many to many relationships. There's lots of arguments about that, but more along the lines it hides the details. Physical means I'm getting ready to implement this in my cloud-based solution or my on-prem oracle or whatever it is that we look at. All models go through a kind of sensor here of where things go. So first thing is to differentiate your models between as is and to be. As is is what I currently have to be is what I'd like it to be. I also mentioned before the validated and unvalidated are absolutely critical in order to have and then we've added now on top of them the conceptual, logical and physical models that I've described to you in the session. All data modeling can occur within this framework in here. And the way we do it is we talk about conceptual being sort of a notebook or sketch or requirements. Then the logical actually having the model and the physical being there as well. But every modeling transformation can be mapped onto this larger framework. We talk about forward engineering. What are we supposed to do? How are we supposed to do it? And what's actually built? That's terrific. We also need to pay more attention to reverse engineering because that's what's actually going on. The structure aimed at recovering rigorous knowledge going from the physical as is to the logical to be, excuse me, the logical as is. And then around if I'm going to do it properly and revisit the requirements, I have to take my data model through all of these stages in order to do it, which is problematic for a lot of people. It looks like a lot of work. I'll show you why that's critical in just a second here. And the answer is normalization. The most organizations when they're swapping data, just simply forklift it from place A to place B and that while it seems cheap and easy, and I can map it on my, my spreadsheet in there. It doesn't give you the ability to understand it. Remember, you first got to understand the existing system to get its strengths and weaknesses and use that information to go back into it. So people say I go back to my physical as is my logical to be and then back into my physical one. Why do I have to do that? Well, it's important because I need a technology independent piece to understand how best to implement it. I could have implemented something based on rules or technology that was two years old that is no longer valid. And I go forward and people go, why am I doing that? Well, the answer is as well. When you're doing it properly, you're also coming to this stage and you're incorporating additional requirements into it. These are business requirements that can't be communicated to the users at the physical as is level. And now that blend of orange and green, the next version of the data model that showed you before becomes our technology dependent it. And most importantly, you have now made your technology switch for your users a whole lot easier. Each of those cycles has a specific focus that you want to do this. There's just a couple of things here at the end as we finish up. It's hard to get people to want to do this. They don't understand why they're doing it. So don't tell them. But do at least keep them focused on the model purpose. We're trying to understand the relationship between soda and customer here at PepsiCo. For example, in here, we want to come out with a model that represents that relationship to a much better understanding than we do now. We need to understand the characteristics that differ between what do you mean? It's the primary means of tracking a patient. How is that going to actually work if we push things around and go into a hallway? What hallway is the room? And finally, here's a mission right here. Are we allowed to job share because COVID has just hit and I need to be able to do job sharing really, really quick. Can our systems do that? Again, a very quick reverse engineering exercise can get you the answer there real quick, but don't tell them that you're doing data models. Just write down some stuff. This is what I'm doing. This is why. Arrange it and make some appropriate connections between your objects and you've got what you need to do. Data modeling process. It follows fairly straightforward. You identify the entities. Again, slowly don't tell people that you're trying to do this. Identify a key for each entity. Draw a rough draft of the entity relationship model, then identify the data attributes, assign those attributes to each of the data models onto the entities, and then map them. And that'll be great. But your modeling should evolve. If it doesn't have some changes in it, it's probably wrong. If you have no disagreement at all, it's going to be, they're not paying enough attention to what goes on. You may discover that you need to continue to refine your model as you get more mature with the process. There should be amounts of data collection that decrease over time and that you increase your analysis within that context. Context that the project for coordination requirements go down over time, but increasing amounts of target system analysis, what should to be modeling looks like, and then the modeling focus should vary between refinement and validation here. And that's critically important. Do not start this out yourself with nothing at all. There are so many out here, just some of the past books that are out here. I'm happy to point you to other ones. In particular, Len Silverstone's in the bottom right hand corner, very one, two, and three, if you can find a physical copy of them still on Amazon, they come with the CDs so that you can actually drop it right there into your case tool and be able to use it. So we spent quite a bit of time here looking through these things. Again, understanding what is data modeling required for its communication tool. It's also a communication tool between business and technical between business and technical people and also our technology. Those three pieces make it a little bit more difficult, a little bit more challenging, which means we have to be very precise about what we're doing, achieve the simplification goals that we're trying to do, focusing in on some very specific folks of agreement and building and deploying stable business models that support the organizational strategy. Why is it required for data understanding? Well, first of all, you have to understand what a model is. A model gives you the ability to understand something without actually having to do it. It gives you precise definitions and understandings. Mostly if you're doing sub optimal data practices, you're accumulating data debt. Data modeling is considered a best practice into that. In fact, the important details of the model have to be perfect just as syntactically if you're creating a code, it has to be perfect as well. It should be considered a primary iterative. What's development process? I'm so sorry, I hit the wrong button there. There we go. It should be really in organizational capabilities to understand and document data structures. If you've got any interns in your organization, it's a great place to go and be able to get them and pull them to use. How do you use data models effectively? Well, first of all, it's a standard communication form. It reduces these corrosive anomalies that just keep grading and grading on people as they're trying to go through things. It gives you a motivation for focusing on those purpose statements, understanding that you're not defining it because it's a bed, but you want to describe the purpose of why you're actually requiring it in there. There's a goal of forward engineering that you're going to develop or build, but if you're trying to understand, you're going to be reverse engineering and both of those skills are going to be needed. So we're just about out of time here. It's just a quick summary of what we're doing. The goal has got to be sharing that there can't be disagreements. It's a simple way to exchange information. It provides a solid model that's been there forever. The modeling characteristics should evolve if they're not, nobody's not paying attention. Modeling is a problem defining as well as a problem solving activities and you want to use modeling as understanding that it is a living document. It's got to be available in a searchable manner and it's paramount to adding color and diagrams to whatever you're trying to do. In other words, you don't want to stick with just the boring parts of it. So we're right at the top of the hour. We've got next things coming back up, but it's time to invite Matthew back on and welcome Shannon back to the channel. Thank you so much for another great presentation. As always, if you have questions for Peter and for Matt, feel free to put them in the Q&A portion of your screen. And just to answer the most commonly asked questions, just a reminder, I will send a follow up email for this webinar by end of the day Thursday with links to the slides and links to the recording along with anything else requested. So diving in here. You know, do we need to do data modeling all types of data models with unstructured data in graph or vector databases? Absolutely. And I probably should have given a little bit more of a disclaimer there to put what I'm talking about in context. The vast majority of this material is focused on what we call tabular data. And that's the data that you typically see in a spreadsheet. Matthew, I'm sure you're ready to address the other types of data that are in there because nothing ever stays the same in IT. We always add to the complexity of what we're doing and thank goodness we're able to do that. Matt, are you there? Yeah, I'm here. Yeah, so I agree with you for sure. And, you know, one of the things I mentioned in my little spiel beginning was some of the difficulties of technical difficulties in making changes to relational data. And that's one of the things that we can address with databases, you know, the NoSQL databases, like we're mentioned there, graph, vector, certainly JSON as well. Thanks, Matthew. All right. So is there work regarding combining clearly defined physical model reusable and shared logical model and high level conceptual modeling of business process into a single quote unquote intelligence model. I'm looking into defining the different layers into an archival description model and such work needs and ISO are similar to reference. I'm chuckling a little on that because of course everybody wants to get the one solution but I haven't seen anybody have any luck with that. Certainly the more that you mechanize and automate the process of doing data modeling, the more likely you're able to have the ability to switch back and forth between models. I can remember something from visible corporations, many, many moons back where you could specify the logical model, link it directly to requirements planning statements and other kinds of things. I don't want to give them an advertisement for a competitor around that but it's a it's something that most people would consider kind of natural. But here's the problem. We are not teaching students in school that case tools even exist. So it makes it especially tough for Matthew who comes out and says, look, I'm trying to tell you that the traditional old world will get you so far but in order to work in today's environment you've got to explore these other options around it and we are just not doing a good job of educating the kids around that Matthew what do you find in terms of customers is it is it a conceptually are they getting everybody likes easier but do they understand how the easier and are they mature enough to be able to implement it in a good fashion. Yeah, it's definitely an issue industry wide. And I again I mentioned kind of one of the symptoms of this is, you know, we're using different point solutions and just adding complexity and stacking complexity. And, you know, certainly in your presentation as well you kind of showed some of the benefits of having like a hub or a more central repository and some of the use cases we call a customer 360 is where we have customers who have these dozens of disparate data systems and, you know, putting a piece of picture of a single customer for instance is impossible to do so you have to kind of gather that data into a single place to make that more easily viewable and as one piece. It is extremely difficult I think one of the things I didn't emphasize perhaps in the story. I tend to say that a data modeling exercise should be designed to complement some other type of analysis that's going on. So that, while the best practice used to be the organizations would try to maintain an enterprise data model and have good logical and physical data models of all of their systems I don't find that that's actually the practice more I can remember Larry English back when he was with us would walk up to a front desk at an IT shop and literally with a stopwatch say how long is it going to take you to find the data model of this and so we're doing as much reverse engineering now as we are looking through libraries trying to find things, but all of this gets us towards the point where we need to have more of a focus on this and I just find that there's a big deficit, particularly at the leadership level of understanding this. They tend to think that, oh, if I don't, if I'm betting something else I'm either going to do one thing or the other, or that this new thing will be able to do all the old things as well and sometimes it's the case but sometimes it's not. You guys that are on this call are the best experts at having to decide that and these models can be what you can use to help persuade people that the ability to insert a row is going to be an ongoing capability that we're going to need and consequently and inflexible traditional database may not be the best way to do it. Thank you. Peter you didn't mention this already but I'm going to ask the question verbally anyway because can you recommend a couple of books to read on the subject to reinforce the concepts you share today. The real treaties on these are published by a company called Technics, which is the company I publish with as well. I would suggest just head over to their website and they'll be on site with us at EDW2 right, Shanna. We'll be there for sure. Steve Hoberman the owner of Technics Publication, yeah. And if you tell him just walk up to him he'll be the guy standing behind the website but just walk up and tell him and say hey this is what I'm looking for or just write him. He's me at that Hoberman or something like that. Very easy to find. If you really have trouble finding him let me know and I'll connect you but there are lots of different ways of getting into this. There's a really good architecture book by a woman named Abby Covert that is available on the Kindle platform and many companies have subscription to it so they can literally just walk over to a place and download it. Anybody in the company that is the best book I found on information architecture for non information architects. But if you're already into it. One of our colleagues remember Graham Simpson Shannon. He wrote his doctoral dissertation on whether data modeling with a design activity or documentation activity. So lots of places to get started. I would suggest that if, again, I'll go back to that page that had the pieces on there a couple good, good books on that but like I said Len Silverstone are the main one because he has so many patterns. Again, I'll just tell a quick story but Shannon and I were both at a conference in San Diego a couple years ago and my talk happened to be just on that day. And we were talking about it and somebody in the q amp a session just like we're doing right now said, what if I was looking for a data model for a cash based pharmacy system. And then can't make this stuff up happen to be walking by where I was making the thing and I grabbed him and pulled him in and said hey Len, what page is that on he said that's a volume to page 220 or something like that was able to point them directly to it. So there's lots and lots and lots of these good references that are out there. David Hayes data model patterns book is just a good book to start to look at how things are actually organized and all of this. And lots and lots of opportunities but I would definitely do that also talk to other people who are around, who've done find out what works for them. But there's lots of good things. Thanks for the question. Matthew got a favorite books and data modeling that you get people to start with. Yeah, so I don't necessarily want to highlight any competitors ether but there is a book. There's a well known on developers called data driven design by Eric Evans and it's a lot of the same concepts that you've presented here are there as well as kind of the domain language is is central right it's not just about the technical implementation but coming to a common understanding. I'll have to look that one up I haven't seen it. And there's lots of links going on in the chat there I put the links to techniques publication in there as well so I love it. Being curious and consuming lots of education is good. All right, so are there any tools you can recommend online and free as a bonus that can help generate models from existing spreadsheets. I don't know them right off the start but I every semester because our university budget for case tools is zero. I start the students out by looking at case tools and free case tools online. So there are lists of them and it's not hard to find a couple of one that I use that I wish was a little better is something called Lucid chart. It's a very nice modeling tool but it doesn't have the integrated data dictionary what that's usually missing from all of them so we, we maintain that out on a spreadsheet sometimes because that's the way it works on that. Matthew what do you guys do when people want to do modeling of your data capabilities how do you guys approach that process. Yeah, so if we're talking physical modeling I didn't mention some tools to just have relationship diagrams in my session there so we have a ID plug in and some third party vendors that provide tooling for all kinds of databases including couch base so those are the kind of the physical tooling physical tools that are out there. Super bunch of good recommendations there. So, how do I provide high level tangible business value for data modeling to senior executives as part of SDLC we are trying to set up the DM review process. I'll tell you a quick story Matthew maybe you'll have one too. I got invited to China last fall to speak to the Shanghai data exchange and one of the things they wanted to know was exactly that question, which is a good question how do you achieve value from it now my solution here is not going to be a solution to your problem but I guarantee you there is something that you can talk about. So, the challenge that we had in this instance was that we were trying to procure from management a half a billion dollars, so that we could reengineer and get off of an old cobalt mainframe emulation system. Just to tell you how fraught for peril it was how risky it was, and the concept of the model was first of all used to sell management because the process had been a big bang before. And I'll just give you one statistic was that their end of day job after they converted ran for 48 hours. And you might think to yourself well that's not so bad but that end of day job has to run every day so yeah doesn't doesn't work. And so management was reticent to invest any more money into it and we had to show them a data model of how we were going to do the system replacement parts and I have that as a, another presentation. Can't show it to you guys right now but it is a really a model that was super super useful to help not all of your presentations are going to be management and you know going for half a billion dollars in that kind of a process. So the next question comes up well how can this be useful. What we found is that the waste of human capital remember that 20 to 40% of it dollars are spent. I've seen people who have been sitting in a room for literally days trying to figure out a problem, neither of whom knew data modeling and I show them data modeling because that's exactly the tool we needed to have. I wonder why neither of us knew this. So it's not just that it will save you know produce more sales or decrease your costs and things like that, but it's really going to be talking about saving and preserving your human capital, your training departments will absolutely love you to have logical models of your systems because people understand from logical models, how things work remember Fred Brooks is quote at the beginning show me your tables and I'll understand and I won't need your flow charts. It's a very very true concept around that. Again Matthew any stories on how that. Yeah, no specific customer stores are coming to mind, other than you know we've got some customers that are, you know have gone from three or four systems down to one or two. You know some of the benefits really depends on what they're trying to accomplish like their industry they're in so it could be risk reduction right it could be better cost efficiency, could just be faster time to market. And of course there's like you mentioned just the kind of the, the additional overhead of having to maintain upgrade patch multiple tools. If you can consolidate your model to fewer different point solutions that's going to potentially give you those benefits. Think about it when in your environment you actually get the time to go in and look at a system and say how can I reduce its complexity. How can I help to improve this picture, remove that objective down there at the bottom that's causing this other person up here at the top to be sleepless. Not often but if you do have that as an option. That is a very good way of looking at it and saying reduction in the complexity that the system generally will result in a cleaner system, a more understandable system a system that's less risky, and a system that actually is more understandable. So the question of course is how are you going to understand the system at all if you don't have any data models and I have seen that happen a lot, but one of the things we face anyway good good question thank you both. It's true is it's we hear that a lot from companies. So, communication and or we've seen that a lot from companies a lot and no data models. So communication maps between entities has always been my understanding of a conical model I'm very confused. I hear many differing thoughts, can you expand on that. I think the question was what's a canonical model. And you're right that has been a an ongoing source of confusion. What we're talking about here and I'm going to flip back to the books paid actually because. Matthew you may have a completely different understanding of this to build feel free to jump in and go the other direction but it's been an ongoing discussion. What, what Len was actually trying to do in his data model resource book was put together a bunch of canonical models he called them universal models, but the idea is gosh. Well, let's actually let's step back for a second. If you've traveled internationally you've probably seen the SAP add in the airports, all the best companies run SAP. Right. And that's probably a true statement but if you think about it if they're all running with the same processes, what do they differentiate themselves on the answers data in there but if I also say, I need a data model for a cash based accuracy in there and it's on volume to page 22. Why start from the terror of the blank screen. And that's really what happens in most instances organizations are just terrified of that blank piece of paper to go in there and so let me start modeling and don't look at somebody else's find something somebody else is done as close to what you do and evolve that towards where you're going to go, instead of trying to create a completely from scratch or even better still reverse engineer the last version that you had, you'll get a lot of good information out of their real fast so canonical model is sort of the way it should be, given all other things are the same. So the canonical model for doing research a canonical there are canonical models are doing clinical trials. So the FDA knows how to connect with you and understands what processes that you're running and what what tests that you're running on all of them. Some people like to say well therefore canonical model is something that, you know, proscribes what you're supposed to do and I don't think you should quite take it that far. This is got to have some sort of differentiator and your ability as an expert data person in there is your, it's tied to your organization your ability to help the organization realize the value that comes out of its data. I went on for a bit Matthew David time to come up with one. I don't think I have much to add to that other than, you know, things like building a canonical model, you know, building these kind of logical conceptual models. It does take time right and organizations. You know I often think of that that cartoon of cavemen with a square wheels and someone comes up and says, this is a, this is a round wheel it's going to help you and we don't have time to do that, we're too busy. You know, that sort of thing can be can be freed up if you can consolidate your models or consolidate your data stores and consolidate all the data pipelines and so that can free up some resources to work on these things which can lead to, you know, more differentiation and and and more value. If you try to build variation into something you're more likely to get it than if you just happen to assume it might show up. And I'm preaching to the choir here guys but that is really that shared understanding that Matthew's describing is what we're trying to do in order to get everybody literally on the same sheet of paper and by the way the process of getting on the same sheet of paper does an awful lot to team building to because both groups have an ownership in that model and they're attached to a good result in order to that don't forget of course the catalog in the middle of it because you're going to need a catalog. Great question. Thank you guys. Thank you. So, how can I model data stored in a data lake house. I think you want to do that one. I mean it's a good question. I think that really is the the essence of a data lake house right. I mean, you have a huge amount of data in wildly different formats and you're trying to get pictures of that in a standard way to better serve value so I think again you have to, you know, that is a lot. There's a lot more work to come up with a conceptual model for an entire enterprise worth of data right but that maybe you can start with a small piece and go from there and see and see how that serves the applications and the reports that you need and and keep building on that. I think that's exactly where I was going to go. Most organizations are using this construct as a landing place so there's all kinds of different data and you probably don't have any interest in most of it but clearly something has perked your interest and said, I need to understand that this chunk of the day so make sure you do a context model and show what's in scope and what's out of scope in there. It's interesting the reason I chuckle on the data lake house thing is that it has been the one, I think, true big success over all of the things that have happened. I mean, have you guys actually seen anybody doing anything with big data I don't mean they're not doing but we've been doing big data for years let's for centuries let's let's not get ourselves at this latest focus on big data makes a difference. What is different is that we have new tools. Again, just going beyond the traditional tabular data that I've described here is a big venture for many many organizations. As they go out and try to learn this, but you end up with this context where you're trying to find out in the data lake house, what is something probably the best thing to do is to take it out of the data lake house and put it in something and see what it looks like outside of the data where data lake house. Now let me just pontificate on that for a quick second because I think it's worthwhile I mentioned the lake house was very successful. The reason is because organizations haven't had a good place to go in and drop something in but it turns out that a data lake house requires shared understanding so in the picture that I'm showing on the screen. This might be the users of the data lake house it typically is optimal use for a work group. Now work group is probably under 20 people although we've seen them go up into hundreds in some instances, particularly around something like a master data kind of play. The idea is that the work group shares an informal understanding of the metadata among themselves, and they don't need anything else to do this and this has proven to be a very very successful process in many organizations and it's now considered to be good practice to have a data lake house as your landing place as you figure out what's going on. So the question then if you've got some data that's interesting and requires some more. That would be a suggestion to take it out of that lake house environment put it in something else and use some modeling technologies in there to take a look at it. The key with the data lake house of course is that every attribute has its own primary key, and that's very difficult to model in the sense that you just sort of end up with table recursively connected to object and personal and object ID. You know, I can make that whole data model in one entity that's in there. Yeah, that's one level but it's not useful what you're saying is somebody hasn't needed somewhere to understand some piece of it. Find out what is that piece of it not the whole thing but one piece of it and then try to stick that problem on as Matthew said picking up piece by piece. Great question that thanks for asking. I was wondering if this question would come up and it sure did. Are there a tools capable of producing data models similar to chat tbt writing SQL code. There's been a couple suggestions already but anything that you're aware of or any any warnings to be aware of. Have you want to take this one first. Have you used AI out there. Yeah, absolutely. So this is a this is right in my wheelhouse so couch base. Capella is the cloud version of a couch base and we just recently released something called Capella IQ which is exactly that it's a. AI generative AI assistant to help you write SQL queries. You can also write indexes for your your couch base data so absolutely and this is definitely a great use I think of generative AI and of course the the the accuracy might be the question like people worry about hallucinations and things right so something like chat tbt is good general use it can write SQL queries of its own but like Capella IQ is right there in your data so it can actually look at the structure of your data the model if you will of your data and provide more accurate queries and accurate results and indexes because it is right there in the thick of things. Take it from the other perspective to I had students that were working for one of the federal agencies in a class project last semester, and they just got curious and started to see what chat tbt could do for them. And it went all the way and produced a reasonably accurate again about 85% on a first draft coming up with something like that so it has that level of understanding from a large language model perspective. It's not going to understand perhaps the syntax of it although who knows maybe it's gotten smarter about some of those ignore AI at your peril is the key to this and that's why Shannon's always laughing because it's going to put us all out of business no no AI is not going to put us out of business. You're going to get put out of business by somebody who knows you have to use AI better than you do but it's not going to be the AI by itself. That's going to get that trick and Matthew have you seen informatic tagline they did a good one this year. I'm familiar with that one now. Everything's ready for AI except your data. Not a bad one. Okay, and tie these last two questions together for a second if you don't mind. One of the things we see from data warehouses data lakes data lake houses and so on is oftentimes they kind of lead to, you know, dashboards, you know, executives love to love to see dashboards and get interesting reports out of them but oftentimes it kind of ends there. And so what we can do now and what we're starting to do at couch base is a concept called adaptive applications is where we can, we can use, you know, interesting reports from dashboards maybe curated by humans or built by humans but those can feedback into the end user application to help those apps use generative AI or other tools to, you know, get the give the user something they want without having them having to ask for it specifically so we're seeing that kind of those applications start to pop up. It is much fun to be alive at this point in time and see what's going on. I just hope we don't screw it up. It's very, very true. Yes, I'm certainly seeing a lot of data modeling jobs right as a result of AI popping up. All right, so diving into the we got about five minutes left here I quite I'm quite interested in the wash machine model to improve data model quality. You mentioned Peter in the discussion is there any specific materials and books example that I can get more detail to guide the forward and reverse engineering. There is on my website, a couple of IBM articles, describing the process. Washboard is exactly the way we just looks like a bunch of washing machines that are up there on that, but it's actually, you know, really just explanation more of that one slide that I had in there about the ISO models. I've been around for a while, people absolutely agree that these are there but the question, you know, comes up you get in these holy religious words of is a canonical model existed the conceptual level only or can it exist at the logical model as well. So you wanted more reading if you just start googling those you will find all sorts of discussions T Dan, another part of the diversity has a lot of really good articles really in depth on this in order to understand it. Yeah, I do find that the wash basin stuff does tend to work pretty well. When organizations, you know, I'm going to do a data model well what type of a data model is it existing or is it aspirational and you know how do you go about putting all the rest of the components in there and how can you use it to improve quality. And I don't know whether you want to comment on any of these but that's sort of the way I think about things and say that, you know, in general, we can get most of the stuff we need to done within the lines of that that framework. I don't think I've anything add to that I think I agree. In the sense of going forward and backwards from physical and conceptual. I think that I did like that approach you brought up in your presentation. Yeah, I'm sorry I didn't mention the website is anything awesome.com so you can navigate over there and grab a copy. In fact, maybe Shannon we can attach it to the thing so I'm ready to get a copy of it. It's always nice when you do the articles yourself and nobody has to worry about copyright. Sorry to I did put a link to your website there in the chat as well so people can look up that article. Okay, so we've got two minutes. I'm going to slip in this last question maybe we can so get your elevator pitch. Can we perhaps point this AI tool to the data lake after office gating the PI is of course, is there an easy way out for safeguarding such assets from AI and data lakes. There's lots of privacy frameworks but of course your data lake is usually very far up in your data chain and has not been processed to that degree so it's probably more of a process question that a technology question I know that Matthew but thoughts on that. They're always fun to point an AI at something and say go forth and multiply right. Yeah, I'm just trying to understand the question is it issue of concerned about PI and giving that to to a large language model or is it more like what can we do with AI, I think I'm trying to understand the question there. I'm guessing they were trying to connect the last two just like you did somebody's got a data lake and, you know, how do we model it well, maybe put AI in there and let it figure it out right. Well, I think certainly if we're if we're talking about, you know, building queries and and ways to access the data I think you don't necessarily have to send the raw data just kind of the structure of the data the model, the metadata. But then I think if you're trying to provide additional context to an AI, then you're going to need to involve something. Probably, you know, the vector databases and vector search, I think is, is kind of the, the current way of finding related concepts via, you know, nearest neighbor and vector embeddings and such. So it's kind of two different approaches that AI can help with, you know, a lot of data. And the real key is, again, just when it gives you results, double check them, you know, nothing wrong with that. It's still going to be faster than creating from that blank sheet of paper but it's been known to be wrong before and now we've got the, what's the algorithm that now poisons AI data so they can actually going into your people are going to be digesting poison data, and it'll come back out through the large language models and screw everything up. Oh, it's so, it's so much fun to be out there. Anyway, Shannon, we've reached the end of the time, haven't we? We've reached the end of the time. Thank you both so much for this great presentation and conversation. Matt, thank you for joining us and thanks. It's always a pleasure to have you. And thanks to catchphrase for sponsoring today's webinar and helping to make these happen. And thanks all of our attendees for being so engaged. Just again reminder, I will send a follow up email by end of day Thursday with links to the slides and links to the recording. Thanks everybody. Have a great afternoon. Thank you, Matthew. Thanks Peter. Thanks, Shannon. Thanks, Shannon.