 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Data Architecture versus Data Modeling. It is the latest installment in the monthly series called Data Ed Online with Dr. Peter Akin, brought to you in partnership with Data Blueprint. Just a couple of points to get us started, just a large number of people that attend these sessions who will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Click that icon in the bottom middle for that feature. For questions, we will be collecting them via the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag Data Ed. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link to the recording of this session as well as any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also a founding director of Data Blueprint. He has written dozens of articles and 11 books. The most recent is Your Data Strategy. Peter has an experience with more than 500 data management practices in 20 countries and is consistently named as the top data management experts. Some of the most important and largest organizations in the world have sought out his and Data Blueprint expertise. Peter has spent multi-year immersions with groups of diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. Hi. Shannon, so good to talk to you as well. We're almost done with the year, but we've had a super, super year. And I understand we've got over a thousand folks that have enrolled for this one. So terrific. Great to have it. Let's just jump right in. Topic of data architecture versus data modeling is a question that comes up. And one of the ways Shannon gets this is from you guys' questions. So if you have other questions, obviously send them in and Shannon will produce webinars around the whole process. But I'm going to start out here with a little four-minute architecture lesson from Steve Jobs. We've been working on this for some time now, and we're really excited about it. We had one of our most important insights, and that was that the PC was going to become the digital hub for your digital life. What did that mean? Well, it meant that that's where you were going to put your digital photos. Where else were you going to put them? Your digital video off your digital camcorder and, of course, your music. Right? So you were going to fire it in the device, or potentially on your Mac, and you were going to basically sync it to the Mac, and everything was going to work fine. And it did for the better part of 10 years, but it's broken down in the last few years. Why? Because the devices have changed. They now all have music. I buy it right on my eyes from my other devices. Right? I pick up my iPad and it doesn't have that song like it's for the Mac to get that song, but then they've deposited some photos on the Mac, so I have to sync the iPhone again with the Mac to get those photos. And keeping these devices in sync is driving us crazy with solutions for this problem. And we think the solution is our next big insight, which is we're going to demote the PC and the Mac to just be a device. Just like an iPhone, an iPad, or an iPod touch. And we're going to move the digital hub, the center of your digital life, into the cloud, because all these new devices have communications built into them. They can all talk to the cloud whenever they want. And so now, if I get something on my iPhone, it sends up to the cloud immediately. Let's say I take some pictures. The pictures are in the cloud completely automatically. And now everything's in sync with me not even having to think about it. I don't even have to take the devices out of my pocket. I don't have to be near my Mac or PC. People think the cloud is just a hard disk in the sky. Make a bunch of stuff and you put it in your Dropbox or your iDisk or whatever, and it transfers it up to the cloud and stores it, then you drag whatever you want back out on your other device, stores it, and automatically pushes it to all your other devices, but all integrated with your app. And so everything happens automatically, and there's nothing new to learn. Now, the reason I show you guys this architecture lesson is because my dad, who's 85 at this point, used to watch these things back when Steve was alive. And he understood this. He called me up after this event and said, I want that thing. I said, what thing? He said, the cloud. I said, why? He said, because I'm always putting pictures on my computer and I want it on a different device, and it looks like this will fix that. This is the important lesson of architecture. It should be something that communicates some aspect of what you're doing. In fact, organizations typically manage a number of different architectures, although I will say that it's really only about one in 10 organizations that do a good job of managing any one of these types of architectures, a business architecture, a security architecture. Of course, we're talking data architectures at this point. Really key to this is to be careful that management doesn't think you're just sitting around having a nice committee meeting, that you're actually producing useful things. Again, like my dad, architecture is about understanding things, understanding the functions of those things, and understanding how those things interact. Again, that's just a little cute thing in there in terms of how to do that. But let's take a look. We'll take a look at data maps, which are really the models. The models are maps of what the data is. We'll talk about why do we need them? How do they need to be used? And there's some social, political, and economic challenges around the use of data maps. We'll talk about engineering and architecture, which are really two sides of the same coin, but they must operate on standard shared data of known quality. If they don't, they are not of any use whatsoever. We'll look at from the top, which really means forward engineering, building new things. The goal is composition and building these things. And then we'll look at the view from the bottom, which means as often and not in today's environment, you are reverse engineering, which is the goal of understanding. These are two different goals, and we apply them generally pretty much equally in today's environment. Working together, then, data architecture and data modeling allow us to get functions that are required for effective data management. And a real key need for this is the understanding of simplicity. So I don't always run everything by my dad, but I run a lot of things by my dad to make sure that it does make sense to folks that are not as technical as we are. We'll finish, of course, what we usually do with takeaways and Q&A on the whole thing. So let's get started. Starting out on components of this, one of the things you've probably heard people talk about as a business glossary, and one of the questions I like to ask groups is, what does the number 42 mean? Now if you're in on the inside joke, the number 42 is the meaning of life, the universe, and everything, according to the Hitchhiker's Guide to the Galaxy, which is a really fun little fiction book that postulates that the world is an experiment with the white mice and dolphins running the experiment. The white mice and dolphins decided also like to find out the meaning of life. They created a gigantic supercomputer. It runs for 2, 300 centuries and comes up and says the answer to life, the universe, and everything is 42. Of course, the rest of you are going, what is 42? It was also Smokey Robinson's baseball jersey. Well, that's a different fact and a meaning. 42 is the meaning of life. If you really want to get technical, 42 is my age 17 years ago. Well, why would you need these things? Well, that's data. Where you use data, though, is in response to a request. For example, in response to the request, is Peter old enough to consume adult beverages? The answer is generally in most states, Peter is old enough to consume adult beverages. Now we've taken data and put it into another category, which is information. And that is information that is provided in response to a request. Hello, can I see your driver's license? Why thank you, young surfer, flattering an old person like me by asking for my ID as I buy my adult beverages. Of course, the real question from a business perspective is how do we end up utilizing this information? And this is where people get into knowledge, intelligence, and wisdom. And I've been using this definition since 1983 based on a definition that Dan Appleton did for us at the Defense Department. The key, of course, is that if we don't understand how the information is used, we really have no ability to leverage our information. So what you're seeing in front of you is yet another architectural representation. It shows that data is a necessary but insufficient prerequisite to understanding information, which is also a necessary but insufficient prerequisite to understanding how to use information intelligently at the strategic level. Again, these are kind of old definitions, but I'm just sort of laying the groundwork on this. So let's talk. First of all, data is a subject that is very complex and detailed. Data as a asset has to work perfectly at the most granular level. That's a real challenge. It can't be sort of or close, right? It has got to be perfect at the most granular level. And as such, it is a complex and detailed subject that is taught inconsistently and oftentimes very poorly. Now that's an unfortunate occurrence, but it is true that as we've looked over how it is taught, many organizations are not able to really do this. And what this means is that in your organization, your work groups have likely figured out all this stuff on their own. Just consider the amount of time and effort that they've spent learning how to do this, whereas we could teach it to them in a nice, uniform fashion and we would probably recommend to them, let me guess, a series of dataversity webinars around this. These maps are a necessary but insufficient prerequisite to understanding the organizational data architecture and fully leveraging the data assets that are out there. Maps are generally incomplete, however, without a purpose statement. And purpose statements are much more powerful than definitions, we'll talk about them in just a little bit. But what we wanna do is add purpose statements to the models and then validate the resulting models that are there. Because these maps are required in order to share information about the data. Information about the data is generally what we refer to as metadata. We're not gonna dive too far down that rabbit hole, but data about data is sufficient to say that. And if we're going to exchange information or data around these things, we have to have these maps. The architecture then is comprised of these data models and data modeling is an engineering activity that is required to produce the data maps. Well, I'm gonna type on that slide, produce not product data maps that are necessary but insufficient prerequisites to leveraging the data assets on there. So how are these components expressed as architectures? Well, in an architectural diagram, we take some details and we organize them into a larger component. For example, I may take a combination of metal and wood and glass and make it into a door. And from then on, we refer to it as a door, even though clearly it has sub-components that are in there. This is pretty intricate. The larger components are then organized into models. And these models talk specifically about dependencies that occur within the components. I can't build the door until I know the size of the doorway. If I make the doorway standard, I don't need to worry about it. Again, these are important considerations. And finally, the models are organized into architectures. That is the architectures have a number of different architectural components. And this brings us to mind purposefulness. We are trying at this point to achieve some sort of goal and the goal is generally at an architectural level aimed at understanding. In the data realm, it works exactly the same way. The attributes are organized into entities or objects. The attributes are characteristics of things. The entities and objects are the things whose information is managed in support of the strategy. And there's lots of examples of these. I'll give you a couple of them as we go through. The entities and objects then are organized into models. Combinations of entities and attributes are structured to represent the information requirements and poorly structured information or data constrains the organizational delivery capabilities. If I don't have good models, people hear this all the time, I can't get the data out of the system even though I know it's in there. There are, again, lots of examples of this. But now we move to the architectural level and the models are organized into architectures. Now, when we're building new systems, the architectures plan the development. But more often, the data managers don't understand the existing architectures because they already exist in the form of software packages or previously developed system. We refer to everything that's in production as a legacy system. And it's really hard to come up with examples here. I'll show you a couple of pictures, but it's just a very, very large piece of information. Very difficult for any one person to sit down and get a complete handle on it. Let's talk a little bit about how data structures are organized to support the data strategy. First of all, consider the opposite question. If your systems were not explicitly designed to be integrated or otherwise worked together, what are the chances they're going to happen to do so, especially keeping in mind that I said five minutes ago, your data must work at the most granular level perfectly. It is a big problem and it's hit and miss in organizations. Even sometimes within the same software package, you have mistakes that are made around this. I've got hundreds and hundreds of documented examples of these. If they weren't designed to work together, what's the likelihood that they will just happen to work together? And the answer's no, just nothing, right? Your organization is likely spending 20 to 40% of its IT budget compensating for poor data structure integration. Get a handle on what your IT budget is. And I think you'll end up seeing that that's quite a large amount of money very, very quickly. See, the structure cannot be helpful if their structure is unknown. The data itself cannot be helpful if we don't understand the structure. This structure is important for us to understand. And there really are two different approaches to this. If we take a look, for example, one of them might be achieving efficiency and effectiveness goals. So we're trying to just make things faster, better, or cheaper. But the other example might be that we are getting the organization ready to be more dexterous, to prepare it for change. I don't want to be dour here as we're approaching the holiday season of 2018, but I'm pretty sure without sounding like a rocket scientist that we're gonna be in a recession in the next five years. I'm not an economist, but I know that we've gone 11 years without a recession and that's the longest time we've ever gone without a recession. So it's unlikely that this economic expansion that we're in at the moment will continue. Just a question of when rather than if it comes up. And I want you to think about a restaurant here that is trying to do two things. The first restaurant is going to give you a separate, the experience of this restaurant is so special that they have a different dish for every little different thing on the menu. The peas have a pea dish. The corn has the corn dish. The beef has a beef dish. The apple pie has an apple pie dish and the cherry cobbler has a cherry cobbler dish. Which means if I'm working in a restaurant and I drop a dessert dish on my way to the customer, I have to go back and get the exact type of dish that matches the exact meal in order to maintain the customer experience there. That could be a problem. However, that organization is clearly optimized on attempting to provide the best experience for customers and it's likely that they will be patient enough to wait for you to go back and put the cherry pie in the cherry pie dish. However, if your goal of your organization is speed, then perhaps you would suggest the bottom example there, which is to say we don't care what dish we have. All the dishes are going to be the same. So when I drop a dish, I can grab the next one off the next pile and simply move it forward. Both answers are correct. The question depends on what are we trying to make those decisions. If the goal is to have specialization, then we're not going to optimize on speed. If the goal is speed, then we're not going to optimize on specialization. Now here's an example of a data architecture. Again, what? I even forget where I got this particular example. I'm sure it's from one of my customers in there and they probably would be real happy if you could actually read this stuff, but of course you can't. Now let's take a look and talk about why we need a data architecture, particularly in today's world that is largely dominated by software packages. We call them Cots package commercial off the shelf software packages. And if we're going to have the ability to interoperate with these things among these software packages, the data architecture is the only thing that tells us how these packages work. Again, it's required in order to maintain self-correction or generation capabilities if we're gonna be able to do our own type of thing that is required to permit governance of data as an asset. By the way, data governance, good definition for it is managing data with guidance. So that seems like a fairly reasonable way to do it. And if you don't have a map to it, it's very difficult to even know it needs to be governed much less govern it around there. Your data architecture is a prerequisite to meaningful data exchanges. Yes, you can move data around. No problem, the question is, do you understand what is actually being moved and are we doing the right things? It lowers the cost of organization-wide and extra organizational data sharing because if we have these maps, we can hand the maps to our business partners. By the way, if you hand this data map to your business competitors, they will also have a pretty good idea. So I contend that these data architectures are also a form of intellectual property that the organization should maintain careful and controlled distribution about. These data architectures permit rapid evolution of your data environment, depending on the changing needs, the different partners, time criticality, et cetera, et cetera. It is absolutely required for role-based security and it decreases the cost of maintaining the various data inventories that occur in this environment. Let's take it a little bit more simply. The data architectures capture the meaning of the data that is running through the organization. I wanna say the meaning, it's the business meaning, not necessarily the technical meaning, although those can be inferred as well. Your architecture is always going to be a living document in that every time your organization changes, the architecture should be updated to understand and be able to reflect the content of this new information in here. It's a potential entry point for architectural engagements when somebody's describing something, the data map is the first piece that they will typically reach for. These validated architectural components can be used to quickly and easily populate a business glossary. And this instance, I do recall enough about this example here that the color coding had to do with various business subject areas. So the green might have been accounting and the orange might have been logistics and the purple is HR. Again, I'm making this up, but you get the idea. But it does represent a major collection of metadata within the overall organization. Each of these pieces of metadata then has a role to play in the overall organization and understanding what those roles are is very important. The levels of abstraction, the levels of completeness, the utility that is provided are kind of encompassed in this diagram. The models tend to face downward. They are detailed. The architecture is at a higher level of abstraction and that really represents an integration opportunity for us. In the past, architecture has attempted to gain a complete, a perfect understanding of this. The old joke was and it wasn't really much of a joke. Yes, we can develop a data architecture for your entire enterprise and it'll take us five years. And at the end, we'll have a really good idea of how your organization looked five years ago. Again, chuckle, chuckle, laugh, laugh, but it's kind of true. Instead, what we've moved to is a more timely process where we can re-architect components of your architecture as they are needed. Very rarely does an organization change its entire architecture. It changes it a piece at a time. Just like when you're remodeling your house, you don't change the entire house at once. You probably are gonna remodel the kitchen or the living room or whatever it is that you're going to do. So the focus has to change from being on an entire architecture to instead focus down into a smaller, more useful component. And this component can be governed by frameworks the most popular and the most useful that we found over the years. It's of course, John Zachman's enterprise architecture ontology. If you have trouble finding that, we could do an entire webinar on that for just the entire day. And you still wouldn't get all the pictures of it, but go out there on the web and Google Zachman Framework Version 3 and you'll see what I'm talking about. It gives you a very nice picture, a way in which the blueprints about your organization's enterprise architecture can be organized and provides more immediate utility than we've been able to get from the past. Now, the data model is typically domain specific in its focus. If I have a piece of software, Program A, then the software engineering effort is going to be focused on that program. And if I have a data modeling effort, that effort might only be originally focused on Program B, which means it might be a portion of all the data. But frankly, what's more likely to happen is that you have a database that accesses a family of programs. A, B, and C in this case, all accessing the data in the brown database. By the way, we need to change our icons around this. And those of you that remember, we used to use that brown icon as a series of whirling platters because of course that's the way data used to be stored. We're on these things that look like record players and they spun really fast. Nowadays, of course, everything's flash drive. We don't really have good icons for those just yet. But the point is, ERP, Cots, all of these packages are generally marketed in the same fashion. One database will connect all of these programs and they can interoperate. And a better use of data modeling is to get this family of programs into this. This is what we would talk about, a subject area. Again, if I go back on that previous slide with the enterprise architecture on it, the enterprise data model on it, it would show you the pink area. That would be one subject area. Of course, then the question comes up, well, what if we have multiple subject areas in our organization? And there's nothing wrong with having multiple areas that are there. But you do need to have somebody that's looking out overall to these for a couple of reasons. One, if we don't have somebody managing this overall proliferation of data uses in the organization, who's gonna tell us that the 80% of the data in the brown database is also used in the green one? And again, if we don't know this, we're going to replicate the data. Same thing here, that the orange database is different from the green database in the following characteristics, whatever those happen to be. Now again, I'll throw all this together with the data architecture focus really has a much greater potential value than a data model. So an underutilized data model might be a program, a better utilized data model might be the data model that supports all of the programs associated with application domain one, but our data architecture is concerned on the broader focus in either software or database architecture on system-wide use of the data, organization-wide use of the data, to say it another way. And on the problems of data interchange or data interface problems. And these architectural goals are much more strategic than operational in nature. I mentioned there were some social, political, and economic challenges associated with it. The first social challenge is that we don't teach knowledge workers anything about data. And my definition of a knowledge worker is somebody who deals with data. So I contend that 100% of knowledge workers need to have some basic data education that is currently lacking entirely from our educational systems. Another political aspect of what we're talking about here too is that we don't teach IT professionals very much about data either. Typically in accredited university curriculums, and Virginia Commonwealth University is a dual accredited program, both AACSB and ABET. Our program gives them one course on how to build a new database. There's a couple of electives that you can take if you're interested, the required courses are that you learn how to build a new database. Now, no offense to anybody in the building or in the profession, but if there's a skill we do not need anymore of on planet Earth, it is how to build more new databases. Two reasons for this one is the only thing we teach them about data, so therefore they think that's the only thing they have to learn about data. Second, our leaders go through these same programs. And what they get is an impression that data is a technical skill that you only need when you're developing new databases. That is a huge issue. It means that we've got good tracking that shows that data people have been pushed down and down into the bowels of IT from what originally started out as a leadership position back when the CIO function was created in the late 80s on this. Data is just not seen as useful. They say I'm building a software package, I do not need a database, or I already have a database and I don't need any data people that are involved, or I'm installing SAP. Again, fine piece of software, but people don't tend to think of that as the way it really is, which is a really wonderful data integration environment. Take another look at a political aspect, if you will, of data, which is that until recently we didn't really even have a good definition of data management. There were a couple of really nice books out there. This is our DIMBOK version one, Data Management Body of Knowledge. We called it the DIMBOK because Project Institute Management called theirs the PIMBOK. So we thought we'd catch on the alliteration in there. And this model is good enough to criticize, which I really like. Again, going back to George Box's famous quote, all models are wrong, but some models are useful in here. And this model is missing two important concepts. We constantly get questions at Data Blueprint that say, hey, the DIMBOK says I must build data warehouse. Well, no, what it says is that data warehousing is a part of data management in here, but it doesn't say that you must build it. And second of all, it doesn't have any concept of dependencies in here, which is to say that we should probably implement data governance in this case before we do data warehousing just so that we don't fill the data warehouse up with what most people call lift and shift data, which is again, a not good way of doing this. We revised the DIMBOK a little bit. We've got a version two that we put in there now. You could read more about this, but notice what we're talking about here is data modeling and design. That's a piece of what happens here. And unfortunately, because we've taken every IT student, every computer science student, every computer engineering student and told that the only thing they know about data is how to build new databases, that gives us the wonderful Maslow quote. If the only hammer, excuse me, if the only tool you know is a hammer, you tend to see every problem as a nail. So the answer to all of your problems is build a new database. Now again, I don't have a problem. We'll keep all of this in work, guaranteed employment for life is what I tell people. And I think it's a true statement. If you continue to build new databases, yes, somebody's gonna have to mess, get in there and untangle all of this mess. The reason this comes about is something because I call the bad data decision spiral. Business decision makers and technical decision makers are not data knowledgeable. This leads to bad data decisions, poor treatment of data assets resulting in poor quality data and poor organizational outcomes. And we need to get out of this because most people don't understand that data failures don't occur the way failures do in the real world. Here's a wonderful example of the Tacoma Narrows Bridge. It's just a little north from where Shannon is in Portland. And it was a wonderful bridge. It opened on July 1st of 1940 and collapsed in a windstorm on November 7th of 1940. Believe me, those were not the original design specs. This was the most dramatic failure in bridge engineering history. And the fix for this was to come back and say, hey, how do we do this? Now, when you're watching this little clip here, first of all, I want you to think of two things. If I take a soda can and I start pulling the tab at the top and rocking it back and forth, kind of like the bridge is doing here, you know what's gonna happen with that tab sooner or later, it's going to pop off of there. And that's of course, exactly what happened to this bridge. It was designed to work in wind that they understood was going to be very fierce. However, as you can see here, it was not designed to withstand the kind of wind that happened here. And of course, it fails dramatically. Now, the other thing I want you to think about here is who would sit here and take pictures of this? And the answer is Safeco, who was the insurance company who were insuring the bridge, they wanted to see what the heck went wrong and they went back and studied it and found out that they'd done a very problematic build on this. Again, 20 to 40% of IT costs come into this, but that's because we have data structures that people really don't understand about and you end up dying death by 1,000 cuts because we've only educated a portion of the population, not the knowledge worker portion, but just the IT data people about what is the data structure. Again, you can see here the wonderful computer science definition and example of how customers related to sales order in a product, a sales line and all of these things that you can see, it contains the grammar, the rules for describing if the constraints, whether it's a sequential or unique order, whether it's going to be arranged in a hierarchical or relational or a network fashion. These are all different characteristics of data structures, but data structures that are bad cause organizations between 20 and 40% of their IT budget on all of this. Doing a poor job with data means that whatever you're doing is going to take longer. It's going to cost more. It is going to deliver less. And finally, it's going to present greater risk in the organization in order to come up with all of the rest of these things. Steal that by the way from Tom DeMarco who did a wonderful job of describing this to all of us. So let's move on to the next section here. On this way, we're going to talk more about engineering and architecture. And the first thing to understand is that all organizations have architectures. One of the things we get a lot at Data Blueprint is people say, can you come build a data architecture for me? Our answer is yes, but you already have one. Wouldn't you rather understand the one that you have instead of building a brand new one? Of course, if you don't understand what you have, your architecture cannot be useful. And our architecture and engineering capabilities are constrained in the same fashion. Architecture is used to create and build systems that are too complex to be treated by engineering analysis alone. The architecture components require technical details as the exception. The data engineers develop the technical designs for implementation and they are going to be working with the very specific technical details on this. I looked for a number of years trying to find a way of illustrating this and finally found a BMW commercial here where BMW says, look, you can't build from the top. Of course, we all agree with this. There's simply no way we're going to be able to go in and do this. I don't care if I am the Pharaoh's architect, if the Pharaoh says to me at the end of this project, oh, by the way, Peter, I forgot to tell you, we have a requirement for a swimming pool in the basement. And I happen to know that the pyramids are made of large rocks on top of shifting sand. There's no way I'm going to be able to add a swimming pool to the basement of this after it's built. However, if the Pharaoh had just made up his mind prior to this, I might have been able to satisfy that particular request. Similarly, from an engineering perspective, I love this particular piece. Shannon and I see this when we're out in San Diego for the big conferences that we have out there on a regular basis. I'll give you a couple of attributes about it. This thing is taller than I am. It has a clutch built in 1942 and it's still in regular use today. And you say, goodness, what is it? Well, the answer is it's a mixer that sat on a ship, in this case, the USS Midway, that we put 4,000 soldiers on and sent off to help win World War II. In 1942, we were losing World War II very badly. And so we had 4,000 brave soldiers that got on this ship and went out and they needed something every morning, breakfast. And I don't care how many of these KitchenAid devices that you have, they're not going to be enough in order to actually come up with what you need to have. So again, engineering and architectural components. Now let's instead turn to a definition of bed. If you go to Wikipedia, it does a bed as a piece of furniture which is used as a place to sleep or relax. I'm going to use this to talk about entities in this case and more importantly, entities and relationships. So I might ask a question, what's the relationship of a bed to a room? And somebody may say, hmm, well, I would guess that beds are part of rooms, right? So let's take a look at how it could work out. A bed and a room could have a relationship between the two of them, seems great. Probably most of us being native English speakers or at least familiar with the English language would not think rooms are part of beds, right? I mean, it's just the definitions in there, but okay. We could also then go in and say beds are related to rooms, but in this case, we're going to put a, we call a cardinality on them. More precision, many beds can be related to many rooms. Now the chicken feet that you're seeing there in orange is just the many description. Don't worry, we're not going to dive into a lot of these details, but I do want to show you a couple of them. Here's another final example on all of this. I may say that many beds are associated with one room and each room can contain many beds. So I'm reading the relationship in two ways around. Of course, the question is what if beds can be moved? That could be a problem, in which case the middle layer is probably the better layer to relate these. Now, these relationships that I'm talking about here, there's some very specific ones we'll just sort of run through them real quickly. This one says exactly one. You must have one. Every bedroom must have at least one bed. Excuse me, every room must have at least one bed. Now, that's not necessarily a real rule, but you could have that rule in your organization. It might be particularly helpful if you were in a hotel, so you would have them like that. Or we might say that a bed can be contained in one or many rooms, all right? Or a room should eventually have a bed in it, but not right now. Now we're introducing a timing dimension into it or it may have zero or it may have many. Finally, we may say zero, one or many in terms of bringing them all together. Those represent a range of possible relationships between the entities and each of these can go on either end of the equation. So these families of modeling notion variants have also been hugely problematic because there are at least four different styles of doing this, not counting object orientation. Again, Peter Chen had the original one on the upper left. Charlie Bachman came out with one. James Martin and Clifenkelstein with information. The goal here is not to have a religious war between the various adherents of the various modeling, but to pick one and use it as a standard in your organization and make sure that that applies both to the modeling efforts that you do as well as the architecture components that you do so that everybody only has to learn one of these. It is a shame that after 40 years of doing this, we have not yet standardized on these, but many are moving towards the information engineering model. So let's take a look again, what are these relationships? It's a natural association between these entities. We could talk about ordinality and cardinality using these to finding the mandatory relationships or optional relationships and also using minimum and maximum occurrences. A bed is placed in one and only one room and a room contains zero or more beds. That's what that particular relationship says. On the other side of it, a bed is occupied by zero or more patients and a patient occupies at least one or more beds. Now, at that point, you might start to get confused. Would a hospital let a patient just go randomly around the hospital and pick out beds? No, but maybe different types of beds might be for different types of care facilities. So an intensive care unit bed might be different than a resting and recovery bed. And again, these are things that are going to make a difference in your organization. The accumulation of all of these things upward is what is the architecture. The data modeling has to occur at this level and the data modeling has got to be correct or the architecture will not. Now, we've got the entities there. Let's talk about the attributes. An organization might decide to characterize a bed as ID, description, status, bed, sex to be assigned and reservation reason. So the decisions about how to manage each specific attribute have very direct consequences. For example, if we use the above descriptions, ID, description, status, bed, sex to be assigned and reservation reason, we may determine that female beds are all sold out. Okay, because all of the beds assigned to the female gender, sex to be assigned, have been reserved. Now, again, reservation, right? We can see that as well. They may or may not be reserved. So do I have beds available or do I have beds reserved? Depends on what's going on in your organization. All beds may have a status. The status may be occupied or unoccupied. Many beds can be assigned to females in this case. The characteristics may be required to be unique so that I can uniquely identify a specific bed out of the entire population of all the beds that we have in this. And the description is unlikely to be the same for each bed if we are looking for the description to describe who's out there. So again, here's our entity, bed, bed ID, bed description, bed status, sex to be assigned, reservation, region in this. Now, most of the time our students are taught that they need to provide a definition. Now, very big problem. Most of our students are not getting any exposure to any case tools at all. So I'm showing you an example of a case tool that is being used on this. And sometimes they're taught about them that they theoretically exist. But for the most part, students are getting zero experience with case tools as they are attempting to wind their way through school, which means when they come to you all as people who want to employ them, you have to retrain them and they have to unlearn some of the bad thinking that they've learned. So here's our bed and our definition, something you sleep in. Very unhelpful. And the reason for that is because that is a definition. What we really want to do is instead concentrate on a purpose statement. The entity is bed. It is a principal data entity. That means one of the major ones that we have in the purpose statement describes why the organization is maintaining data about this important business concept. In this case, bed is a substructure within the room, substructure of the facility location. It contains information about beds within rooms. That's the purpose of it. Now, I'm gonna take it just a touch further on this. We've got source, so we can link it back to some documentation. Again, here's our descriptions that we had there before the attributes that we had and we had some associations. So the associations with other items, again, it's a very poor representation, but it says one room contains zero or many beds that are in there. Now, I'm gonna add one little piece to this. This was a bit I did at the Department of Defense when we were working on building the veteran's administration system, the one that is currently in use today on this. And one of the things, excuse me, one of the things that they were trying to do at this point was that, believe it or not, hospitals lose patience. I know that doesn't sound very encouraging, but it does happen on a regular basis. So they were gonna put an RFID tag on this and when a bed was in a room, they would be able to tell where that bed was in case they lost them. On the other hand, just following that logic a little bit, that means also that hallways need to be rooms as do elevators. Yes, they lose patience by leaving their beds in the elevator and patients ride up and down the elevator. They even have names for it. You'll have to ask your local hospital what they actually call that, but it's a pretty awful one. So clearly, this putting an RFID tag was not going to be able to show things other than a bed within a room with an RFID reader in it. And if you have a hallway, which end of the hallway is it at? Some of these hallways for these veterans hospitals are literally miles long. One other thing about the purpose statements on this as well is it tells whether or not the entity has been validated or not. Now, it's important to have it validated because if it's an unvalidated model, it maintains as a draft. And if it's a draft, it's not something that you're going to have as much faith in as the rest of the information that you have in your database. Get a collection, whatever it is we're gonna talk about. One more point on this before we dive into sort of the viewing from the top and the bottom on all this. We'd like our models, of course, to be validated. We tend to talk about three different types of data models that are here. And this is the ANSI Spark three layer schema definition. There's a Wikipedia entry on this. I think I put it on the slides here for you guys to take a look at if you wanna look in more detail, but we really do have a conceptual perspective on this. We think of this also as something that's too, that doesn't have enough detail but not enough so that managers won't look at it. You could show it to managers, which means that independent people can look at the different perspectives of the data and still look at the shared access. Remember back a couple before I had the application domain, program A, B, and C. Each of those could be a different view into that same data set in there. That's important because if we don't have that ability, it means we have to create a separate physical layer, separate physical database for each of those views. And that would make it even worse than the mess that we have right now. The mess that we have on now really means that 80% of the data that you have in your organizations is redundant, obsolete, or trivial because people are not practicing the things that we talk about here. And that leads to all sorts of confusion. The second layer then of these three is the logical layer. It hides the physical storage details from this, but still presents enough information about the data that people can look at this and see from a business perspective what's actually going on. So they can see the things that they want. So a conceptual view, a logical view, and finally a physical view. The physical view then allows the organization to implement the logical perspective in something like Oracle or DB2 or Hadoop or whatever it is that we're going to do. Again, a change to a new database technology, they should be able to change the database without affecting the users if they have engineered this process correctly. Good data engineering, good data modeling will allow your architecture to be much more responsive to the overall process of pulling this together. So again, let's get into this next piece, which is how do we use these things? And the idea here is at the moment, we have to first do what we call forward engineering the system. We're building new stuff. And quite frankly, this is again, an absolutely appalling situation in our colleges and universities because you don't build new stuff as a fresh young graduate. You instead get put on existing stuff. You build existing packages that come into play here. This building new stuff simply doesn't work. But we still teach people, and this is generally the only thing that we teach them. Oh, I see I've got a quick little typo on here. Hang on, let me get that off of the screen. There we go, sorry about that. Again, building new stuff. So what we do is we do some requirements, then we do some design, and then we do some implementation. Now the requirements part of it typically produces a binder and some power points and some requirements. And then the design aspect of it produces this conceptual model. These again correspond to the three level anti-schema pieces I was showing you a little bit before. The in-between step is the difference between the what and the outbuilt. And it's how are you going to do the implementation? So the model here to the right of the green arrow is the detailed plans on how you're going to implement the things that are described in the three-ring binder on the left-hand side. And as a result, we end up with the actual database that comes in there. Now it's important to tell students that this is what we do, but we know that we spend 80% of our dollars, in this case, rebuilding existing stuff, working on our existing systems. And consequently this view is a very, very bad perspective. So we need to also introduce the topic of reverse engineering. And a couple of organizations are now starting to do this. As they are building these requirements, now they're starting to see, oh yeah, I need to know what's existing. Again, it's analogous to me, couple slides back saying, you'd like me to build your information architecture. The answer is no, you probably don't want me to build an information architecture. You want me to surface your information architecture and make it usable so that you can get exposure to this. This is the process we call reverse engineering. And if we're spending 80% of our dollars here, we ought to be spending at least a comparable amount of time teaching students that this is the process. Because unfortunately, each of your organizations is having to undo what we've done incorrectly at the college and university level. I'm gonna introduce one additional concept here. This is our existing system. So we're looking at our legacy. This is why I call everything that's in production a legacy system. There's no point in mincing words or changing modes of all of this. You're going to be looking at your system and what you're going to be doing is saying, I need to understand the plan that was built, the logical model from which the physical as is was built in order to come up with this. And the process the purple arrow represents is reverse engineering definition, a structured technique aimed at recovering rigorous knowledge of the existing system. Pretty good. We may also need to follow the yellow arrow and go all the way out to the requirements. If I'm not going to change the requirements, then I only need to do the purple arrows links, going from the physical as is to the logical as is. If I'm going to change the requirements, I need to additionally add from the logical as is to the conceptual as is in this case. Neither of these are distinguished between in most programs and consequently, we have students that simply aren't as useful to you guys as we'd like to be. Now let's put them all together. This is what we call re-engineering. And you'll notice those two definitions, excuse me, two parts to the definition. First, reverse engineering the existing system to understand its strengths and weaknesses because every system has some things that are good about it and some things that are bad. If you don't know which is which, how are you going to avoid making the same mistakes? We give you a very specific example. We take the database in the upper right hand corner, the as is implementation assets, the as built system here, upper right hand corner. And that database was organized such as the A through J's are in this place and the K's through P's are in another place and the L through Z's are in another physical structure. That might not be a very good way of redoing the data because when the data set was originally built, a database could only contain 10 megabytes of data. That's a really dumb way of making sure that that would be brought forward. You would never organize your data that way with the cost of data storage being pretty close to zero, although not at zero and certainly not as close to zero as the cloud vendors would have you understand. So again, first reverse engineering the existing system. Sometimes that reverse engineering needs to be all the way back out to the requirements. Other times it only needs to go to the design stage. The purple is from physical as is to logical as is. The yellow then takes us from logical as is to conceptual as is. If I'm going to go down the next arrow, I'm going to go to the requirements assets in the 2B. I've now bridged the gap between what I have and what I'd like to have. And then I can put out the rest of this. I only need to do the yellow loop if I'm going to change the requirements. But I need to do the purple loop in all cases. And only when I have reverse engineered first and then understood the existing system and use this information to inform the design of the new system should I then move forward with the re-implementation of the existing system that's there. This is absolutely mind-bogglingly not known out there. And I can tell you it's a huge problem. It's one of the reasons I say do not trust your IT to work without you by themselves because they have not really been educated in these data and architecture concepts. So let's talk about the modeling process. We might want to start out by identifying entities. Just sort of labeling them and putting generic labels on them. And then we might want to identify a key for each of those components. Those keys again will give us the ability to identify a specific instance of this. If you're working in a big data technology these are what are called the identity management subsystem of this. And finally we might want to draw a rough map of what those relationships. So again I haven't given you any labels for these so there's no context on this but here's a first rough draft of a data model. Then we may want to put out the data attributes that are in there and say where should the data attributes go? So we'll move the various attributes around to the various entities by mapping them all together. Sounds great. Finally we may in fact need to change the way they work so that it's a more useful model. Models should evolve a lot at first. The first draft that you do is almost never what you need to have in order to come up with this. When you pull all of these models together now you can start to actually be able to use this. And it doesn't matter whether you're getting the information from a brand new implementation which again represents only 20% of our spend in IT or an existing set of models which represents 80% of the dollars that we spend in IT. Couple more pieces here just before we finish up we may discover that there are other attributes, other links that we need to put in place. And when we're doing modeling the modeling activities should evolve over time. The first component is we're going to be doing what I call evidence collection and analysis. In general you're going to be going out and acquiring information. You can see that's the white space on this graph and over time preliminary activities you'll do a lot more gathering but eventually you'll start to move things around as I did on that last diagram out there. And as the modeling cycles proceed we should be doing more analysis around that as opposed to less analysis. If we're finding lots and lots of collection activities when we think we should be doing modeling cycles we know that something is off track. There's also an aspect of project coordination where we need to make sure that we are able to gain access to the subject matter experts that have this. We can reverse engineer to some degree in a semi-automated fashion but almost always we're going to have to have some aspect of expertise and we call these people the subject matter experts. Over time those coordinates should decline precipitously. Another component of this is what we call target system analysis. This is what the new system should actually look like. And again our ability here is that we want to do increasing amounts of this as the requirements for collecting and analyzing the data decline and as our focus on building the new systems come to fruition. Finally our modeling focus should change largely from refinement originally into a validation component at the end here. And that's the thing that we want to make sure we have a declining balance on all of these things. So let's summarize where we've taken our little journey in the last hour. Data maps are really what data models are. They are a map to understanding how to use the data. These data maps are critical to understanding your most complex, most poorly managed organizational data assets that you have out there which are your data assets. And I say that in general because as Shannon mentioned at the top I've looked at literally thousands of data management practices around there. It tells how they're to be utilized in the context of your information architecture. There are a series of challenges around that having to do with we haven't socialized people. We haven't done a good job of explaining to people how they use them. We haven't told them the economic costs of in fact not doing data modeling and data architecture well over time. All of these pieces are critically important to pull together. Engineering and architecture are sort of the same size. The data architecture is comprised of a lot of data models. The data models however are the very detailed components that tell specifically the data blueprints if you will of how organizations use leverage their data in support of their organizational strategy. And in order to operate efficiently they must operate on standard shared data of known quality. If they don't then you will end up spending too much. Your project will take too long. It won't deliver the results that you want and more importantly the risk to the organization will continue to rise. The use of standard data items can only be applied through the use of data modeling in here and of course the modeling is what makes up the architecture. When we look at it from the top perspective we are teaching students how to build new systems. If there is a skill we do not need any more of on planet earth it is how to build new databases. That should be a specialized skill. Instead what we should do is teach them how to operate on existing databases, how to re-engineer their existing databases. The goal of new systems is on building new things but our goal on reverse engineering is understanding the existing systems and that is much more important, much more prevalent than any other skill that we need out there in the real world. In most cases you simply cannot understand the existing data models, existing data architectures unless you have the cooperation of very expensive people. And a good friend of mine told me this joke but I've used it for years and years. Walk into an organization and say who are the 10 people who are too busy and you couldn't possibly spare them for this project, write their names down on a piece of paper. Of course they write the names down on a piece of paper and you turn around and say those are the only people I can have on my project because they are the ones that are gonna understand what's going on because only by working together can we have effective data management functions within this structural approach. We need to have simplicity in this in order to come up with all of the rest of these pieces. And now we're back at the top of the hour and I will turn it back over to Shannon and let's see what sort of questions you guys have. Peter, thank you so much for this great presentation, such a hot topic. And just to answer the most commonly asked questions that have been coming in, just a reminder, I will send an email to all registrants by end of day Thursday for this webinar with links to the slides, links to the recording of the session and anything else requested throughout. So Peter, diving right in here. Can a description of data architecture be split into physical data architecture, IE databases and logical data architecture, IE data models? I think the answer to that is yes. However, I would be cautious about that because the size of many organizations that they're working with end up being truly something that looks something like this. And so if you can imagine maintaining this architecture in both physical as is, conceptual as is, logical as is format, it gets to be very, very difficult. There are a few modeling technologies out there that will do that for you automatically. And if that is an important concern, then I would definitely check out those vendors. One in particular that I'm familiar with is visible systems. On this, I believe that both Erwin and Embarcadero have some aspects of that, but I think visible was the first to put out a full stack approach to that type of a process. So in architecture and modeling, was a bit easier in a traditional waterfall environment. So what's your opinion and approach in an Agile environment? Shannon says, uh-oh, be careful, Peter, because she knows I jump on this one. That's what happens when we've been working together for seven years. So let's be very, very clear. Agile is the best way we have come up with as a society to develop good quality software faster. There is no question about it. There's all kinds of empirical evidence. Data, however, does not fit into an Agile model. You'll see all sorts of people talking about Agile data warehousing and Agile big data and all of these things. That's a description of it, but Agile in a software engineering context is a proven method of developing a better quality product faster. And if you are in the middle of an Agile sprint, which means you are on a project footing and you discover that your data is incorrect, that your data definitions are incorrect or any part of the data requirements are incorrect, you need to stop working on that Agile sprint because the only possible outcome is more small piles of data. And that is generally not what most organizations are attempting to achieve. We've got to separate and sequence data from the traditional pieces. And somebody mentioned the waterfall model, which is, of course, how all this started. There's a fascinating side story about where the waterfall model came from. It's really not a construct that occurred in IT. It was actually the creation of a writer at Look Magazine who was writing about this topic in 1959, the year I was born. So some of these concepts are pretty dated and not really current. Now I know what the question was trying to get at. Yes, if you are doing Agile data modeling and you have a good data architecture that is well known, your Agile is tremendously aided by that process. But if you're doing Agile software development and you do not have a good data architecture, you do not have good data models that are in there, the only result is more small piles of data. Yes, your software will be finished on time, but it will not produce the results that you want to have. And let me take this a step further, Shannon, because we're starting to get questions around AI and some of the neat new stuff that's coming on board. By the way, Shannon and I take a very skeptical view to all these because we're, let's just say, have seen it all before on this. It's not that these things are not good or new, but it's just that they have a use and we have to figure out how to layer them into our existing functions. Rarely, for example, has big data technologies replaced legacy systems on a major scale in organizations. Instead, what's happening is that big data technologies are complementing the existing set of technologies. And organizations that are succeeding with these are doing quite a good job of, in fact, complementing what they already have. So I've got these capabilities here and I want to add these other capabilities around this. So from an agile perspective, yes, you can describe your results in data as being more agile, as being more flexible, but that is not the agile method. Do not confuse the two. Agile software development is a very, very specific, focused activity designed to make software of higher quality, faster, and it has an excellent track record of doing this. But if you're trying to do the data at the same time with the same product, excuse me, the same method, you will end up with more small piles of data and guaranteed employment for life for your data people. If that's your goal, go right ahead. If not, then I'd suggest separating and sequencing them. So this next question is also a question we have certainly heard before. What if your organization is saying data modeling is not important? Is it just documentation? I think that's, I do hear that occasionally. And again, this is one of the things that we want to do here at Dataversity is to help everybody out with these arguments. So let me just take this slide here, which we were talking about a few minutes ago. And the idea is that we're trying to understand this overall domain here. We could absolutely understand this domain by reverse engineering each of the nine programs that are involved in this domain. But if I have a data flow diagram that describes this, I don't actually need the programs. Again, let me make just a quick example here. Pretend the screen is blank, which it is blank right now, and you have a single circle in the middle that's an ipodiagram, an input process output diagram. And if my input to that process is dough and another input to this process is water, and the process is called make pizza, and the thing that comes out the other side is pizza. So the inputs are dough and water and the output is pizza. I've actually got a miracle because I don't know about you, but my idea of pizza is much more than just dough and water. Probably gonna have to have some mushrooms and some sausage and some sauce and some cheese. Let's get hungry about this whole sort of thing. You can use these flow diagrams, these architectural diagrams to tell you a lot more about what happens to the system at the macro level than you can describing each of these individual programs. Very, very important to understand the right tool for the right application in this. I hope that answered the question. Just not feel free to follow up with me on that. So Peter, do you think we need to have a corporate data model in order to have data virtualization? For example, virtual data warehouse in order to succeed? Short answer is no. Most organizations are not maintaining nearly this much information. However, I think what the questioner is asking is slightly a different question. Of the virtualized data, do you need to know what that is before you put it out there as a virtual site? And the answer is absolutely yes. If you have no idea what is out there, how do you know whether it's correct? This is one of the really interesting aspects of the big data technologies movement that we've all sort of passed through in the last 10 years. By the way, we are in the post-big data era, just in case you all weren't aware of that. It fell off the Gartner hype cycle, I think, in 2016. Most people haven't caught up with that yet, but just become a routine part of what we're doing. The key is, if you're going to do something with data, you have to have an understanding of it. Does that mean you have to have the entire corporate data model? I've worked with dozens of organizations that just work on aspects of it. If I could, what I would do with this picture is sort of blur out maybe two-thirds of this and say that we're only going to deal with the purple section or the orange section or the green section around this, and that if there's a part of this that's going to get virtualized, yeah, you better have a pretty good understanding of the goes-into's and the goes-out-is, as my friend Mike Corman says. So you've already touched on a couple of these aspects of this question here, but in combination, how is this process changing in agile methodology? And I feel that most of the times, business product owners who bring requirements are short-sighted. So there are several different types of requirements. Software requirements that describe what the software will do are absolutely critical for developing the software correctly. However, if you don't also have a definition of what the data requirements are in that context, it's going to be very difficult to get the software to behave as it should. Let me just give you a very sort of short example on this. If I have a piece of software that is doing payroll, and I have the possibility that somebody could exist in the payroll twice. Now, the example that I'll give you here is US military. I've done a lot of work with them over the years. And at one point when I was a federal government employee working for the Defense Department, the US Defense Department had about 30% of the workforce that had a second job within the Defense Department. We don't tend to pay our warfighters a lot of money. And consequently, many of them will take a second part-time job, a moonlighting job to raise money for their kids' college funds and things like this. And if I build the software incorrectly to process the payroll with the payroll application, considering one person gets one paycheck, then I will have to do special processing in order to put that second payroll check through for the same individual. If I want to have all of those individuals paid on a routine basis, I need to structurally re-alter that software or run it multiple times, both of which are problematic because it wasn't the way it was designed originally to be done. It's absolutely crucial to understand that data requirements are different from software requirements. And that this is the thing you can say after doing this business for almost 35 years now, I've worked with some companies for the entire time of that 35 years on and off and their data is exactly the same. I won't say exactly, but it's largely the same as it was 35 years ago. Your data requirements are much more stable and less evolving. Again, that's the word we use is because they evolve gradually as opposed to changing dramatically. Going mobile doesn't mean you're going to change the data that people are accessing. You may add to it social location, mobile information, so lo-mo, we call it, but the basic things about your checking account or the order that you want to submit to IKEA are going to be exactly the same as they were literally dozens, hundreds years ago. The data requirements do not evolve at the same rate as the software requirements. And it's important to separate and sequence these. I know I said that before, can't have said it too many times because so many organizations have yet to realize this and they are still stuck in the trap that actually adds up hurting their development efforts and making their agile programs not nearly as successful as they would be otherwise. So Peter, do modern data catalogs answer and to some extent replace some of the modeling we do? If you have a modern data catalog that does not give you the ability to define your data elements inside it, that I wouldn't even consider that to be a modern piece of software in there. So I think the answer is yes, but at the same time, there's a couple of problems. I mentioned one of them already, which is that we aren't even introducing these concepts to students. So the students don't know that they don't know. They come out to your office when you hire them and you'll say, well, do you have any case tool experience? And they kind of look at you like, you ask your dog a question about the same question, they just kind of tilt their head and they don't have any clue. We used to teach them case tools. We do not, we have stopped teaching them case tools. They don't even come up as a technology. And so people don't know that they don't know. It's just an absolute degradation. We've got probably three generations of students that need to be undone with this knowledge in order to do this. Now the question was, if I've got a glossary in here, why wouldn't I also at the same time have the capability to understand the model, to be able to crank out and define the models and use the modeling technology that I showed on the next slide? And the idea around all of this is that, yes, the more integrated you can get, the better off you are. We're seeing a lot of very interesting movement around the business glossary as a data governance tool. Many people feel that that's an important part of data governance and certainly having common definitions is. But better still is not just let's have just the definitions, let's actually have the model that's inside of this and let's get them to use it. I'll just give one quick example here. We did an audit for one of the hospital systems that was a Cerner system. Cerner is a very fine piece of software. It just happens to be the one that's gonna replace the old legacy system that we designed for the VA 30 years ago to do this. But this Cerner software had this capability built into it. So you could right click on any field and it would pop up the business glossary and show you what the definition of this field was. Now we only did the audit on the very first entry in the entire system, which was called patient admit date. Actually it was called admit date. That was the first one that was up there. And we found when we did the audit that it was being used 12 different ways around this hospital system that had tens of thousands of employees in it. And while that was disconcerting, when we put a dollar value on the cost of the misinformation that was being processed by that system, it amounted to tens of millions of dollars annually just because people were misusing one data field incorrectly. So the glossary is a great thing to have, but if you can extend it as I think the questioner did to say let's not do just a glossary, let's actually include the model components in this as well, you get a much bigger bang for your buck on the whole process. Peter, instead of a corporate data model, I suggest the focus be on a conical data model. Only the critical business aspects are modeled. Is that an acceptable statement? Absolutely, I'm gonna guess that's one of my colleagues, maybe Dave out there doing that. And yes, we are in agreement. Not all of the thing is required. 80% of your data is redundant, obsolete, or trivial. So having the model do the nth degree is absolutely unhelpful. In fact, if you wanna get really fancy with it, if you go to Dave McCombs pieces, and I know you guys run some Dave McCombs seminars on this as well, Dave has some really excellent numbers where he shows the reduction in the order of magnitude of complexity by doing this, not just for reducing it to a canonical model, but actually reducing it down to a conceptual model and semantic model. This model would become much, much simpler given Dave's approach to this. And if you can reduce the complexity of something by one or two orders of magnitude, that is something absolutely worth paying for because it reduces the friction that's in your organization as our knowledge workers try their best to navigate through these systems. And if you have any doubts at this at all, just watch an airline employee try to work the airline reservation system. I don't care which one you're on, they all suck. They are running over their nose just like their face to try and get you some good service. And if we provided them with a decent system, they might actually be able to provide you with a good service on this. But very few of these things are actually useful given that, yes. So short answer to the question, we do not need all the detail, we just need to know the important stuff. And that's usually enough. Again, if I could gray parts of this piece out, I would do that to show you guys example of that. Yeah, the question is as they are not Dave but a follower struggling to bring in that concept where they work, again, I think the message is always the toughest part, how to bring everybody on board into this. And the only thing people will listen to Shannon really is money, right? I say that, that's assuming money is your primary objective. If you're working in a nonprofit world then it's service delivery, right? But you've got to be focusing on showing how doing more around those pieces will help you achieve the organizational objectives. If you don't, they're gonna think of something else. Indeed. So referencing specifically slide 17, data architecture are comprised of data models. What about data flows, et cetera? So again, a wonderful question because we are simply not, we have gone three generations of not teaching people that data flow diagrams even exist. So people will come to us at places like Enterprise Data World and go, oh my gosh, I've never heard of a data flow diagram because it's not in the textbooks and the teachers are not teaching it. And consequently, they don't understand the utility of that. Of course, if you have this much information in your encyclopedia, glossary, whatever it is we're going to call the thing, the case tool, then creating data flow diagrams out of that is kind of trivial. Stuff that goes in, stuff that goes out and it becomes a natural part of the documentation. But if we don't even teach the students that these models exist, they're never gonna ask for it. In fact, it's so bad Shannon, I'm gonna quote Karen Lopez here, another one of our favorites, right? Karen was recently at a Microsoft conference and I say recently, this was within the last 60 days where she said a Microsoft VP stood up in the middle of the conference and said, I've just discovered this thing called a data model. It looks like it's pretty useful. I think we might want to try to incorporate more data modeling in what we do. Shannon and I were complaining about the user interface on some of these things before. It would certainly help out in that process. Sorry, I don't mean to get worked up about Microsoft, but data modeling has never been their strong suite. There's a question in here too about how to learn more about canonical data modeling and who was the person it was, David, that you mentioned, who asked for free? Dave McComb. You do Dave McComb stuff, right? Yep. Yep, he's got some great things. That our conference is, and I'll put together some reference material. We recently published an article called Data Architecture Versus Data Modeling and it's mentioned in there as well, but we'll get some more content going on that as well. Some great content. So just in case it was missed, what is the future of data architecture in this new age of extensive software solution, software being purchased where data models are practically invisible? It doesn't require, it does not in any way lessen the need to understand the data. My favorite example of this is QuickBooks. QuickBooks software is used by many, many small businesses including my own and it is a very good piece of software. When they move that to the cloud, which is the software as a service, they did not take the data model that the desktop version had and the server version had and put it in the cloud. They instead put in a limited version. And now what that happened was, it meant it was more confusing and it was harder to do certain things of QuickBooks in the cloud. And consequently all accountants immediately came to the conclusion, perhaps not deservedly so, but they came to the conclusion that the software in the cloud, QuickBooks in the cloud sucked. It's a really harsh thing. It only sucked because they couldn't do things they knew how to do that they could no longer do in the cloud because Quicken made a very poor set of architectural choices to do this. No, just because it's in the cloud does not mean you don't need to know what's going on. You need to know what's going on to determine whether or not. In fact, let me be very explicit. It is now considered best practice if you are buying software of any kind to ask the vendor for a data model of the software. If the vendor doesn't provide you that model, you probably shouldn't be buying the software period. Now they may require you to submit an NDA. Nothing wrong with that. You happily sign an NDA because the ones that want you to do that want to know whether their software is going to help or hurt your existing complement or impede your existing business practices. So it is very, very critical to understand those models in the cloud, especially since we're putting lots and lots of effort into it and everybody thinks cloud equals less cost. In fact, I've never seen an organization move to the cloud and actually achieve lower cost for a total cost of operation. It does achieve other very good technology components, very good technology benefits, but it is not cheaper and it should not be sold that way. Fabulous, and if you have any questions please submit them in the Q&A section. I'm just trying to go through the chat here too to catch a couple of things. You know, do you have any recommendations, Peter, for incorporating new data modeling techniques like graph modeling? Yeah, it's not something we can really get into here but certainly would be a good topic for a webinar and if you probably got some on the books already Shannon maybe you can point them to some of that. Graph modeling is a different approach to this but it is very powerful and it's very complimentary with some of the big data technologies that we've been talking about as well. Again, I don't want to do it here because we don't really have slides but absolutely there's some really cool new stuff coming down the pike. What kind of tools are out there for data paths, data flow diagrams? If you just Google data modeling software freeware, you'll see a bunch of them that are out there. I don't take anything away from the Embarcaderas and the Irwins of the world and the Visible Systems Corporations and all the rest of them they are good software but there's a lot of freebies out there they're not nearly as powerful and they're certainly not supported to the extent but if you wanted to prove the utility of it go ahead and download some of these freebies, give it a try, kick the tires a bit and see what happens and then use that as the basis for deciding what if any further investment that you should make in case tools and case tool technologies. I just can't imagine working without them but that's because I've done it this way for 30 years. Can you imagine the complexity of this diagram trying to keep it in your head? I'll tell you a quick story about this architecture. There's not this one here but one of the groups that I work with created a model like this and somebody said, oh, that looks like a quilt and they went, wow, it actually kind of does look like a quilt so they called it the quilt diagram and that quilt diagram became so useful that it was maintained over a period in a single PowerPoint slide over a 20 year period. It's still in use today in this case documenting the architecture of the entire healthcare system that it's being used by. Very, very interesting process. And there's this economy here going back to the previous question, Peter. So it would seem that if an organization were to sell software as a service that they should be ready to provide the data model for that software would ask one more reason to have the data model ready. You bet, absolutely. All right, again, just trying to get through that. There's so many great comments here and such is going on in the chat section. How's the catalog different from the data dictionary and data modeling perspective? So probably granularity. A catalog maybe just a list of the various sources if you think about it might be just the entity names whereas the dictionary might actually describe the entities and the attributes but we don't have standard definitions for these products. So it's really up to you. My preferred term for all of this is the data bank. Nobody uses that term but you put it in a bank because you make more money on it in the long term but nobody likes that term so it doesn't matter what Peter thinks in this case. You respond to microservices owning their own data seemingly opposing the view that data is shared. Microservices owning their own data? Correct, nothing close. Yeah, so I don't support the concept of anybody owning data other than the organization. It's just too problematic. I'll tell a quick story here that I think was very powerful. Data Blueprint did the military suicide mitigation project for the US military back in 10 when we discovered that more of our soldiers were harming themselves with their own hands than that were being harmed by the bad guys. Very, very critical meaningful project on this and I had a bunch of people that I was working with that we were trying to work off of a 30 by 30 matrix. I called it my council of colonels and this group of people were all in a very large room and they would get together and sir you can use my data for these circumstances under these conditions and sir you can use my data and we're trying to map it all out. If you've ever tried to work off of a 30 by 30 matrix it doesn't work. So I had a favor that I could ask the secretary of the army to come in and do and the individual came into the room and after the third person stood up and said sir you can use my data under these circumstances he put his portfolio on the desk in a very loud fashion that got everybody's attention in the room. I said I see why you brought me into this room Peter I have an announcement to make everybody listening from now on we're going to call it my data and anybody that wants to tell me why they can't use my data to save my soldiers lives my office is open are there any questions? Now the reason that was a powerful story is because the individual was probably not authorized to make that statement but it was certainly the right thing to do and in doing that he changed the dynamics of the project. Now I say that because very interestingly enough I've told that story to more than 100 corporate CEOs and not a single corporate CEO has been willing to take that step and to say that the data belongs to corporation X whatever corporation X happens to be and that hurts your organization in enormous ways because people thinking they own the data remember data by definition is a shared resource means that you have more non-technical problems that are associated with your organizational operation and that is a huge, huge cost on organizations. We have a saying that culture is the biggest barrier to solving organizational data problems and I firmly, firmly believe that. Now I'm telling you this because there's been a more recent development in that area I was on a visit with the Army's Chief Data Officer recently and he told me he had just abolished data sharing agreements within the Army. His lawyers and his procedure people were horrified by this and he said I can't find it in statute or law where it says we should have a data sharing agreement after all we all work for the United States Army so what's the big deal? Now we'll see whether that comes about maybe ask me next time we do the seminar and I can tell you whether he's had success with that or not but it is an interesting process. Hope that's an interesting story for you I do not like data ownership. It's very interesting. Well Peter that does bring us to the end of today's presentation. Thank you so much for this topic and presentation it's been fantastic. Thanks to all of our attendees who are so engaged in everything we do, I love all the questions and everything that have come in throughout. Again just a reminder I will send a follow-up email by end of the day Thursday with links to the slides and the recording of the presentation I'll try and get some information on additional research for you. Thanks Peter. I'm going to start thinking about Enterprise Data World right Shannon? We are almost there so we got a couple things that are coming up as a really I think great topic myself and a couple of colleagues have put together how I learned to stop worrying and love my data warehouse. Love it, yes. Perfect. All right, great. Well thanks everybody. We'll look forward to next month. See you talk about the seven deadly data sins on December 11th. Thanks Shannon. Thank you, bye all. Bye all.