 And here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Dataiversity. We'd like to thank you for joining today's Dataiversity Webinar, Data Architecture versus Data Modeling, Comparing Contrast. It is the latest installment in a monthly series called DataEd Online without Dr. Peter Aiken, brought to you in partnership with Data Blueprint. Just a couple of points to get us started. A digital large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag dataed. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of the screen for that feature. And to continue the conversation and networking after the webinar, just go to community.dativersity.net. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link to the recording of this session as well as any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles in 11 books working on the 12th now I hear. And the most recent is Your Data Strategy. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups such as Diverse US Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. Hello, Shannon. Hello, everybody. I am now the newly named daughter of Peter. Do they say daughter of doctors? That's all right. We'll get up there eventually and it'll work. No problem at all. Yes, absolutely. Pleasure to be on with everybody today. And our topic is talking about the differences between data architecture and data modeling. And Shannon's mentioned several times the community. We get these topics from you guys. So when she sees that you're having questions about things, she puts on a webinar to try and make it better. And that's what we're doing with this one here. So let's first of all describe what we're going to do. We're going to talk about data maps that really do represent models. And we're going to talk about the importance of modeling and importance of modeling data. Why do they need them and how are they used? We're going to talk about challenges to increased use of data models as well because it is not widely known that this is something that we should be doing when we're doing systems development activities. We'll break those challenges into social, political, and economic, which are always good. Then we're going to look at architecture. And you can't look at architecture without engineering. They really are two same sides of the data coin that we have. But they must operate on standard shared data of known quality. And that's very important in order to do this. When we are building systems, however, we tend to take two perspectives. And typically both of these perspectives are useful. From the top means we're doing forward engineering. We are building things. And the goal is to come up with a new product. However, in the data world, we are also oftentimes working from the bottom up. And this means doing something we call reverse engineering. Now interestingly, originally when we started this field of reverse engineering, people would ask, can you get arrested for that? And it's even less widely known than architecture and modeling on this. Of course, the key is we want both reverse and forward engineering to work together. And we'll talk about how those functions are required in order to come up with effective data management. Finally, we'll close with a need for simplicity, because part of our job is to make things less complicated than they should be, because that will help us. And of course, finish up with the best part of the hour, which is the Q&A part that all of you guys jump into, because we have such a wonderful, engaged community out here. Let's get started. If we look at data, data as a subject is complex and detailed. It's fairly inconsistently taught and relatively poorly understood. And Shannon mentioned the next book that I'm working on at EDW next month, just a couple of weeks out from here now. We'll be rolling out the first version of the new data literacy material that we have on this. But we're not talking about literacy today, so let's go back in material. The data maps that we put together are necessary but insufficient prerequisite to a data architecture. In fact, the architecture is comprised of the maps. If we're going to fully leverage our data assets, these maps are critically important to have. I do a lot of work with a lot of different companies, as Shannon's told you. And one of the things that amazes me time after time is that we'll see an organization trying to move forward in data without fully understanding their data. And that's just not possible to do. The maps are going to be incomplete, however, by themselves without a lot of definitional material. And one of the pieces that Clyde Finkelstein taught me many years ago is that we shouldn't be using definitions. We should be using purpose statements because the purpose statements incorporate stuff in there called motivation, that the definitions, which are generally contextless, do not incorporate in here. So we'll talk specifically about the remedy of adding purpose statements to your data model. And then also, how do we know the model is correct? And that's when we have a validated model. So we'll talk about that as well. So the maps are required to share information. If I want to give you a map of how to get to my house in rural Virginia, a map would probably be required in addition to the GPS coordinates that will get you there. And the data architectures are composed of these data models. So let's start off with a definition. Data modeling is an engineering activity that is required to produce data maps that are necessary but insufficient prerequisites to leveraging our data assets. Data maps are part of the DIMBOK, the data management body of knowledge that we have put together. And you can see I've circled that portion of it. It used to have some words out there where we would talk about analysis, database design, implementation, and additional data development. In our drive to be simplistic in DAMA, we also cut out some of that detail, but it's all still there. So let's talk about some unique properties that data has. Data does not obey all of the laws of physics. It's not really visual. Some people will do visualizations, but you can't really see the data. You can see the effect of data, but not the actual data. In fact, we've been working through a group that's been doing a post-mortem on the Iowa caucus, and they clearly had some data problems that were there as well. And maybe I'll be able to introduce some examples in there as well. Doug Laney, his book I'm showing in the bottom right-hand corner, really a terrific book, not on data modeling, but on data in Phenomics. He calls data non-rivalrous, which means that the cost of providing an additional copy of your data is typically zero, which is why people don't tend to value it as much. It is non-depleting. It does not require replenishment. It is regenerative. It is nearly unlimited. It has low inventory and transportation transmission costs, but it is more difficult to control and own. It's eco-friendly in general, and it's just about impossible to clean up if you spill it. So thanks, Doug, for those lists. It's a great list that's out there. Data modeling, then, is the process of discovering, analyzing, and scoping data requirements. Do you understand what the data things are, what they do, and how they interact? Nice basic questions, which are really, really good. And representing this information in a precise form is the definition of a data model. They are a map to your use of critical business assets. They comprise and contain themselves metadata that is essential to data consumers being able to use the information, and they function as kind of a sheet music language in there. It's important, if you're a musician, to be playing off the same sheet. In this case, from our data perspective, we need to be published working off of that as well. This data about data, which is a terrible definition of metadata, but it's the one everybody understands, is really essential to other business functions. For example, if I ask you for the sales forecast for the last four years, and you give me the sales forecast for the last four years, but they have not been fully vetted, then that might not give me the same conclusions that I would have if I had fully vetted that information. The process is iterative and may include different components that we put into all of our data models that may exist at different levels. So I'm going to use a couple of words here that may not be familiar to some of you, but there's a tiny little diagram down on the bottom corner, and we will actually approach it in another aspect in just a little bit. But we will typically tend to do a process of conceptual, logical, and physical modeling. Each of those models has a different purpose in representing those data requirements. So the process of discovering, analyzing, and scoping data requirements, then, we may want to go in and take a list of specific organization, person, places, or things. Nouns, if you will, that do a very nice job of describing stuff. And we need to describe these things because those person's places or things may need to be created, read, updated, deleted, or perhaps archived in this. And if you look at the first letters of that, it's a cruda diagram. We tend to call them a crud diagram, but the A is on there as well. And then when we finally get to the next piece of this, these bits and pieces that we have, the things that need to be updated, are called attributes. And the attributes are characteristics of the person's places or things that we have that we're taking a look at. Let's take a look at how that actually works. An organization might describe the parts of the thing as attributes. ID, description, status, sex to be assigned, reservation, reason. These are attributes of a thing that we're describing. And I've put them up there in a data model format as in I labeled the thing. I underlined it to show that that's the name of it. There are various notations on this. We're going to come back and say a word about that in a little bit. And we also have an ID. That's what's shown by the pound sign. So I can tell this thing from that thing. And the only way I can tell those things apart is if each of those things has a unique ID. By the way, if you get to a unique ID on everything that's in your entire database, you've got a data leak. That's another side conversation we won't have right at the moment here. So decisions about managing information for each specific attribute have direct consequences on the rest of the system. The decision to use the above data attributes that I have there to determine tells me whether I have female things that are able to be reserved. If I didn't have the sex to be assigned on there, I couldn't tell whether the things were male things or female things. That's a very, very important characteristic there. Also, these characteristics may be shared. All of the things may have a status. So the things' status there that I showed in purple on that, they may be available. It may be unavailable, different types of things. And many things can be assigned to females. These characteristics may be required to be unique, which gets us back to the thing ID. So again, this is a way of organizing information about things that we have in order to do this. Go back to our definition here again and talk about representing and communicating these in a precise form called a data model. So if I have thing one and I have thing two and I describe a relationship between thing one and thing two, this precise requirement here says that each thing two must be accompanied by a thing one. There's one, there has to be another, they're paired. That's important for some structures. Let's look at a slightly different structure. These representations that we make between them define mandatory optional relationships between the various things using minimum and maximum occurrences from one entity to another. So showing them here, we're going to get some examples on this in just a little bit, but this is the components, if you will, of all of the bits and pieces. And finally, the process is iterative because data models are developed in response to specific needs of the organization. Those organizational needs become instantiated and integrated into data models. And those data models authorize and articulate specific information systems requirements. And most people think that the process stops at that point that we've produced something we're done. But actually it turns out we need a feedback loop in there to say have we got it right. Remember, IT only produces 30% on time with full functionality within the original budget types of results. If my dentist was that bad, I would find another dentist. But of course the real problem is it doesn't stop, it continues on a regular basis so that we can figure out how these data models need to evolve so that the organizational needs change over time. Now all of these changes that are involved in this are still a slower rate of change than process models in organizations. One of the nice things about living a long time is that I've gotten to go back and visit clients that I worked at early in my career. And guess what? The basic data that they're managing has not changed if the company is still in business. If the basic data changed a lot, there's a good chance that that company is no longer still in business. So again, our data models are this process of discovering, analyzing, and scoping the requirements, understanding the data things, the entities, what they do. Well, they keep track of information and how they interact. Those are the communications between them. We use precise modeling notation to make sure that everybody can read these models everywhere, and that process should be an iterative process. I mentioned that I was going to describe the conceptual, logical, and physical pieces next. Let me just try to do that very briefly. It's not an easy example to understand, but in almost all cases, you're going to need to have a data perspective that allows independent customized views. I don't want to have to change the guts of the system if I change the way the user interacts with it. And these layers allow us to implement these things in a way that prevents the users from having to deal with technology-based changes. The second layer down, then, is the logical level. This is where we talk about things with business people, because they will understand the business terms that we are using and can communicate correctly with them about this. The third layer, then, is the physical layer, and this is where the database administrator is going to be using. And the user will not understand it because it's going to be talking specifically about how Oracle or Amazon or whatever actually does the implementation, the change in the data there. So what this means, if we do a new system in this proper fashion, when we change a database technology, like moving from a spinning disk to the cloud, the database administrator in a properly architected system should be able to affect this change without affecting the users. Now, all of these things that we put together are what we call data structures, and there's a good computer science definition for a data structure, which probably isn't really relevant here, so I won't read it to you. I'll just show you this example. We've got customer, and a customer can have many sales orders, and a sales order will have a line on the sales order, and that line on the sales order would link it back to a product to show me which products were purchased by which customers on this particular purchase order. The structure characteristics then include some grammar for the data objects, what they are allowed to do, the rules essentially, the constraints for those data objects, whether they occur in sequential order or not, whether they are forced to be unique or whether they happen to be unique. We may end up with different structures of a hierarchical, relational network, other types of orders around all this, and we may look at what we're trying to implement in trade-offs. The architecture is always a process of describing trade-offs in this, and we may want to optimize around certain trade-offs as we're going forward with it. If we describe these bits, one of the other descriptions that we can do is a hierarchy. A hierarchy is a range of items in which things are presented as above or below, so we may have a mess above, but that we can take our data architecture and I can range them into a hierarchy of a series of data maps, data models that we have. In addition to that, we may also have some other types of arrangements that we can put together, such as this particular group of three may represent one model view, a subgrouping of this to look at the model from one perspective. Now, I mentioned there were some problems going into all of this, and there are some real challenges around this. The first one is social, and that is that most people try to implement data modeling and architecture as a project. It is not. My colleague John Ladly was having a conversation with my class last night, and he pointed out that HR was not something that most people thought they needed to have 50 years ago, and yet there's no organization today that would be without a good HR manager. In fact, I want you to just imagine walking into your boss and saying, I don't think we need any more HR today. We've done all the HR we need. Well, I want you to tie that same concept in mind to data as well. Data program, and it has to exist as a program in order to be successful. It cannot exist solely as a series of disconnected projects. Data program, excuse me, must last at least as long as your HR program in this case, so it's a very nice way of tying it to something that management has actually been through. And yet the problem that we have around all of that is that we don't teach knowledge workers, which includes most managers, anything at all about data. And we're still, all of them deal with it 100% of the time. There are also some political reasons that keep us from happening, and I like to show this in particular because in this case, we don't tend to teach IT professionals an awful lot about data. We teach them how to build a new database. And quite frankly, if there is a skill we do not need more of on planet Earth, it is how to build new databases. When we teach both students and people who were formerly students and are now managers, the idea that the only time you need a data model is when you're building a new database, then they will not be able to understand that if you're implementing an ERP, it might be useful to have some data people around to do this, or if you're combining systems. Because management says, I'm not building a new database, therefore I don't need new people. This is a really bad message that we have left to our generations of students who have been underserved from this. But let's go one more piece on this. For 30 years, we've taught people the only thing about data. In most cases, there are some now larger programs that we'll get into some other stuff, is that data means building new databases. And gosh, if the only tool you know is a hammer, you tend to see every problem as a nail, quoting from Abraham Maslow in there. So when somebody comes to a data person that says, I've got a problem, they say, hey, no problem with that. I can build you a database. Now, all of us on this call, this means that we have guaranteed employment forever. So it's not really a problem, but it doesn't serve our organizations in the way they should really be served. And this leads us to a sort of thing I call the bad data decision spiral, which means that business decision makers and technical decision makers are not data knowledgeable. They make bad data decisions. And that results in poor treatment of the organizational data assets and poor data quality, which leads to poor outcomes. And what we've got to do, of course, is look at this cycle in order to get further with it. Finally, I mentioned there were some economic reasons as well. Most people do not really understand how data errors occur. My colleague Tom Redmond coined a phrase a while back called the hidden data factory. And I'd like to use this example to illustrate that. But Tacoma Narrows Bridge, called Gallup and Gertie, was the world's third longest suspension bridge. It was a really nice bridge. It opened on July 1st of 1940 and collapsed on the 7th of November in 1940, probably not exactly where people wanted it to collapse. Now, you might imagine that bridges are not typically designed to do this kind of thing. By the way, the guy in the middle here who's going there is getting his dog out of his car because he thought it might be fun to ride on this thing for a while. And then he decided maybe it wasn't a good day to do that. So he went and got his dog off because, of course, you know what's going to happen. If I had a can of Coca-Cola and I started taking a little tab and swinging it back and forth, the same thing would happen to the Coca-Cola tab. It would break off that's happening to the bridge. No, it was not designed to flex in that fashion. Hence, it crashed. Now, the reason I show that is because most data errors are not that dramatic. Instead, it's insidious. It's lots of little things that happen. Organizations spend between 20 and 40% of their IT budget evolving data. This includes data migration, conversion, and improvement. And in each of those cases, you're going to need to have aspects of the data model in order to do this. When you do a poor job with data, your IT project takes longer. It costs more. It delivers less. And it presents greater risk to the organization than if you did it properly. Thank you, Tom DeMarco, for those wonderful words. So now let's look at that compared to architecture and engineering. Data model focus is generally very specific. How does this software program interact with this set of data? Sometimes we might do a data model that might span programs. In fact, the brown database that I'm showing there, in this case, covers all three programs, A, B, and C. And that's getting towards better use of data models because they are all using the same components. The ERPs of the world, the commercial off-the-shelf software packages, are also marketed as being similarly integrated. And this better use of data modeling works really well in a development situation. However, if I've got multiple development efforts going on simultaneously, who is going to keep track of all three of these groups of efforts to make sure that they are all literally singing off the same sheet of paper? And the answer is the data architect. Data architecture has a greater potential value, broader in focus than either database or software architectures, that the analysis scope is on the system-wide use of the data and problems that are caused by data exchange in order to do this. So hopefully you see the architecture has a much broader scope, but it is still nevertheless comprised of each of the various bits and pieces. And of course, our DEMBOC has this as the number one topic on here in order to do it. The words that used to be there were enterprise data modeling, value chain analysis, which is an important technique for deciding how the data in your databases contributes to the bottom line of your organization, and of course, related data architectures that are in there as well. So how are all these components, individual bits and pieces, expressed as architectures? Well, first of all, we take the details. Every little thing that we want to do, we organize those into larger components. That's an intricate process, and it allows us to do some very, very savvy things with the data, very, very good business types of activities with the data. Excuse me. The larger components then are organized into various models. We've already seen how that works. That introduces dependencies. The minute you start making decisions about data models and using standard components that way, these data models constrain further activities. They eliminate certain possibilities from you and force you to do certain things certain ways. So the data model is actually much more important than the software. It's very easy to modify software compared to a database of software. And finally, our third component here, the models are organized into architectures that are comprised of the various architectural components. That indicates purposefulness, the motivation component that I mentioned earlier. If we focus in on an architecture as being supposed to fix something, now we can actually get it to work. So how do data architectures work out of this? Well, I'm going to keep those three words, intricate dependencies and purposefulness up there. And we'll look, the attributes are organized into the entities. I say slash objects because objects actually are very compatible structures here. So if you're working in an object-oriented environment, there's nothing incompatible that we're doing here on this. And here's an example, thing and the thing ID, the thing description, the thing status, sex to be assigned and the thing reservation reason. Remember, if I didn't put the sex to be assigned in there, I couldn't differentiate between male things and female things, assuming that that was what we were trying to do. Those entities and objects are organized into models. Combinations of these entities and attributes in this is where the dependencies come up. When I start to say that one of these can be related to many of these things, whatever those things are, now I'm putting constraints or capabilities on the system. Oftentimes these are called business rules, but they don't have to be at that level. And then the organizations are organized into, excuse me, the models are organized into architectures. This is our purposefulness. This is the idea that we are trying, in this case, to do everything to get this particular thing right. And how come I can't show you an example? Well, they're not much use, particularly in a format like we're doing today. So while you may have this that represents an architecture, I'm just using it as an icon. It really doesn't tell us anything at all. So architectures are about things, about the function of those things individually and how those things interact as a system. Now, when we look at that overall, we get a better picture of what's happening. I'm going to show you a little quick... Oh, got my music on here still. Hang on, let me turn that off. Because we actually want to listen to Steve Jobs right now. So we're going to listen to a little quick, short, four-minute architecture speech from Steve Jobs. You like everything so far? Why not to blow it? I get to talk about iCloud. We've been working on this for some time now, and we're really excited about it. About 10 years ago, we had one of our most important insights. And that was that the PC was going to become the digital hub for your digital life. What did that mean? Well, it meant that that's where you were going to put your digital photos. What else were you going to put them? Your digital video off your digital camcorder. And of course, your music, right? You were going to acquire it in the device or potentially on your Mac. And you were going to basically sync it to the Mac, and everything was going to work fine. And it did for the better part of 10 years. But it's broken down in the last few years. Why? Because devices have changed. They now all have music. They now all have photos. They now all have video. And so if I acquire a song, I buy it right on my iPhone, I want to get that to my other devices. Right, I pick up my iPad, and it doesn't have that song on it. So I have to sync my iPhone to my Mac. Then I have to sync my other devices to the Mac to get that song. But then they deposited some photos on the Mac. So I have to sync the iPhone again with the Mac to get those photos. And keeping these devices in sync is driving us crazy. A great solution for this problem. And we think the solution is our next big insight, which is we're going to demote the PC and the Mac to just be a device. Just like an iPhone, an iPad, or an iPod touch. We're going to move the digital hub, the center of your digital life, into the cloud. Because all these new devices have communications built into them. They can all talk to the cloud whenever they want. And so now, if I get something on my iPhone, it's sent up to the cloud immediately. Let's say I take some pictures with it. Those pictures are in the cloud, and they are now pushed down to my devices and completely automatically. Now everything's in sync with me not even having to think about it. I don't even have to take the devices out of my pocket. I don't have to be near my Mac or PC. Some people think the cloud is just a hard disk in the sky. Clear-eyed disk or whatever. And it transfers it up to the cloud and stores and then you drag whatever you want back out on your other devices. We think it's way more than that. And we call it iCloud. So where is your content? It automatically pushes it to all your devices. So it automatically uploads it, stores it, and automatically pushes it to all your other devices. Completely integrated with your apps, practically, and there's nothing new to learn. Showing that little lesson there. Because people like my father would watch Steve Jobs do that. And he would watch that and he'd say, that's exactly what's wrong with my thing. I get the photographs on this computer to go here. And Steve's architecture lesson showed him there was something else. And when somebody like my dad can understand this kind of a process, we are making really, really good communication that occurs. My dad is not terribly tech savvy at 86 years old. So architectures are managed and some organizations do more than others. My numbers show that about one in ten organization attempts to manage one of these architectures. And if the only thing that's happening is that you have a bunch of people going to meetings and nothing is actually changing in your organization, you want to be very, very careful not to be perceived that way. In fact, the number one question that we have about architectures, people will come to us and say, hey, can you build me an architecture? I don't need to. If you have an organization, you have an architecture. All organizations have data architectures. All organizations have process architecture, et cetera, et cetera. The only question is, if you don't understand it and you can't understand it if it's not documented, it cannot be useful. So let's take a quick look at how we describe these architectures from an architectural perspective. First of all, I'm going to toss the number 42 out there. Some of you may know that 42 means the answer to life, the universe, and everything because we've read Douglas Adams' Hitchhiker's Guide to the Galaxy book. And most of you are going, I'm sorry, I didn't read that book. Why are you telling this? It's 42, which is fact. And that fact has a meaning. It's kind of not data until you pair it with the meaning. So now that I've told you that 42 is the meaning of life, you're much more well-educated than you were when you started this particular webinar. I could also tell you that 42 was my age 19 years ago, and you would not necessarily care, but it still nevertheless becomes important. How do we differentiate there between data and useful data? Well, we actually have to see how it's used. And we don't know how information is used absent some form of request. So when we add a request to that, we now say the difference between data and information is that data that is provided as information in response to a request is now more useful than just data that's lying around and not doing anything. And the amount of dark data that organizations have is somewhere around 80% of all their data. So you're really operating your organization on a subset, 20% of your data. And the only argument I ever get with anybody is that it's not 80%, it's 85%. Now, hopefully you can see at this point that you can't have data. You can have data without information, but you cannot have information without data. And even that was not enough. We had to go one level further because until we understand how people use that data, we really have no ability to value what's going on because while users are good with requesting things, they don't necessarily understand how they're used in the larger context. So what I'm showing you here is actually an architecture describing in this case how useful data can be converted to information in response to a request, but it really becomes useful to the organization once we have gotten to the point of understanding its strategic use and all that. And these architectures are required to determine interoperability. In fact, if you have a commercial office shelf software environment, something that's only packages, the only thing that tells you is, in fact, your data architecture. So it permits governments of the data is a prerequisite to meaningful data exchange and lowers the cost for various data sharing activities, allows rapid evolution, which is kind of moving us towards an agile type of process in organizations, and it decreases the cost of maintaining our data inventories. The architectures capture the business meaning of the data running in the organization, and it's a living document that should be updated periodically. It's an entry point for architectural engagements, and if we have a validated architecture, we can now use these components to populate a business glossary, which is the big buzzword that we're going to see in the upcoming DGIQ event that we're going to hold in June coming up on that, and the glossaries will be hugely featured in all of that. So it's also a major collection of metadata. So as we're looking at these things, the upward facing of the architectures, the higher level of abstraction, downward facing are the models, but actually where we are is right in the middle to be in both of them. So we weren't able to achieve a perfect understanding, but we can now focus on subsections of it and make those architectures focus in on specific things that we're trying to get. Let's take a look at an example. If I take a data structure and organize it into supportive strategy, one of the things you might ask yourself was, if I didn't purposely organize it, what are the chances that it would just happen to work together? And the answer to that is absolutely not. Remember, this is how ERPs are sold and they are more integrated than non-ERP type products. But at the same time, how do they work for your organization? It's the likelihood that if they weren't engaged specifically to focus that way, they are going to be helpful. And of course they cannot be helpful if their structure is unknown, and they don't have the answer to the documentation. So another component of this, let's just take it a restaurant that we might have had that has specific goals, and I put those plates in the upper right-hand corner of this screen for you to take a look at. I just want you to imagine this is a restaurant that's so cool that they've got a separate dish for everything. There's literally a different dish for the apple cobbler versus the peach cobbler. Now, if I'm breaking things, if I happen to drop a dish, I need to go back in and find not just a dish, but in this case the apple pie dish if I was carrying apple pie or the peach cobbler dish if I was carrying peach cobbler. That's going to hurt your efficiency background. But in a restaurant like that, time is probably not the most important variable. If I was, I would re-architect my collection of plates and just make the next plate give me the next plate off the top of the stack because any plate can be used for any dish, and that satisfies different objectives for the organization. All of these concepts, engineering and architecture, are interrelated. Architecture is about creating systems too complicated to be treated by engineering analysis alone, and engineering is required when we have details that are too complex in order to do this. I've tried to show over the years different folks on how the importance of architecture is, and I finally found an old MW commercial here. By the way if you want any of these videos just send me a note and I'll be happy to send them out to you. But of course the answer is you cannot architect after you have already implemented. And each of these structures that I'm showing you here clearly was not built from the top down. The Empire State Building did not start at the spike. The Taj Mahal the Archer St. Louis did not start at the top and finally the pyramids of course did not start at the top. And most importantly too, if I was the architect of the pyramid, I clearly am not. But if I was, and the Pharaoh said to me, Peter you need to put a swimming pool in the basement there. I would remind the Pharaoh graciously of course that the pyramids are built of large rocks on top of shifting sand and my chances of adding a swimming pool in the basement after I've built this large pile of rocks are not going to happen. Probably turning my resignation to see if I could get out of the country without incurring the Pharaoh's wrath on that particular piece. Let's turn to another object that is really one of my favorite objects in the world and we will all be gathering in the city where this is at. So if you're going to San Francisco, excuse me, San Francisco if you're going to San Diego next month to EDW you can go on to the place where this is and look at it. Let me tell you what this is first. Obviously you're looking at it and going a mixer. So what? Well it's taller than I am. It has a clutch built in 1942 and it's cemented to the floor and it's still in regular use today. Now you might ask yourself hmm okay it's a super mixer. Well who would need a super mixer and the answer is the 4,000 brave soldiers that we put on the USS Midway in 1942 to say go out and fight against the Japanese because we were losing World War II very badly and they didn't know how long the war was going to last but they did know that those soldiers would need to have breakfast every morning and this is the way you make breakfast every morning for 4,000 people to make it for a war that you don't know how long it's going to last. And it doesn't matter how many of these things we might try to substitute for it. I can guarantee you one of these would not make breakfast for 4,000 people. Aside from the time constraints that we wouldn't be able to overcome the motors are not designed for that kind of use and I've already told you you can go to San Diego and eat food from this thing in six weeks if you want to join us at Enterprise Data World. Let's talk a little bit about the definition versus purpose statement. And I mentioned Clive Finkelstein who is one of my mentors had taught me this particular piece. I don't see it occurring in textbooks yet but it's sort of frustrating. So if I'm going to define a bed as a piece of furniture used as a place to sleep or relax that's very nice. But remember this part over here we were talking about beds and status and things like that. We actually need to have the bed ID be able to tell us one particular bed versus all of the other beds. And when we were working on this project the project that we did for the Veterans Administration here we would ask the question as a data architecture component a data model in this case what's the relationship between a bed and a room? Well the data maps at the end of the level allow us to store facts. So a bed is related to a room no problem. We can also do this however bed may be related to a room or in this case we can make more precision more beds sorry many beds can be related to many rooms. And that's interesting that tells us that we're probably talking about a dorm room or something else. We can also do another one and say hey in this case a room can contain multiple beds. So in this case beds are not transportable across various rooms. In this case many beds can be contained in each room and each room can contain many beds but you would typically have a bed only associated with a specific room we would not be moving beds from room to room given that we're responding to oh let's just say coronavirus or something equally fun. So if beds can be moved that could be a problem. And these relationship cardinality optionality pieces come in five specific flavors. Exactly one, one or many, eventually one, zero one or many, and eventually one or many. Each of these represents different types of processing requirements at the most possible granular level of information representation for our systems. And while we have a lot of different ways of describing these people love to get into definitions and you know wars over where you should use Chen style or information engineering or Bachman style or James Martin style just pick one and move on and it probably won't be up to you to pick it, it will probably be what your organization uses but let's stop arguing those and focus in on the relationship. So from this point on I'm going to use the information engineering which is the most popular version of all of this. This represents a natural association between the entities. In this case it defines the mandatory optional using minimum and maximum occurrences in here. So a bed is placed in one and only a room according to this data model and a room contains zero or more beds. Now if I had specified this to say a room contains one or more beds that means every room must have a bed in it and that may or may not be what we're trying to do with respect to our modeling of information requirements. A bed can be occupied by zero or more patients and a patient occupies at least one bed but if I go back in here and I define bed as something you sleep in, which is a fairly standard definition it doesn't give us that motivation. So the purpose statement allows us to expand on this. In this case why the information, excuse me, why the organization is maintaining information about this specific business concept. This is a piece from the Veterans Administration data models that we did many, many years ago where we looked and saw these attributes, the status, the sex to be assigned, the reservation reason, and the associations that we have here, one room contains zero or more beds. And one of the things we heard about this was kind of interesting, was that the folks that were doing this said, hey we're going to put RFIDs on the beds because believe it or not, hospitals will lose patients on a fairly regular basis and if we had that information as to where was that bed and which room it was, it would probably be easier to keep track of it. Now this was experimental and the VA had said they could go ahead and try this as an experiment, but here's an interesting little thought experiment. If a bed is always associated with at least one room, then what is the definition of a hall? It also has to be a room and if I said the bed is in the hall which room is it outside of? I don't have that information from this particular data model. Also, guess what else has to be a room if we're going to avoid losing patients in a hospital that would be an elevator because yes, patients get left on elevators on a somewhat regular basis. One other component for all of these too, in addition to the purpose statements, is that we need to make sure that we have our models validated because if it is not validated, it is in a draft format and I urge all of you that are being constrained in many ways in your organizations to do more with less, to write the word draft on all your models that are draft and until they are validated they are draft models and you might say to people would you really want to build on something that was not necessarily in final format here. So we'd like to get our models all the way to validated but in many cases we don't have that as an option. So let's talk about how these two work together. Forward engineering is a process that we need to go through and understand specifically about what we're doing. So the process of forward engineering is describing the requirements, the what, the design assets, the how and then of course we have the implementation and our requirements are typically in the form of a three-ring binder or some sort of bound thing. Our design assets are typically models, they're going to describe the implementation piece and then of course we have our actual system that is out there. Now you'll notice I've got stuff up there that's representing a theme we're going to do here. In order to fully understand our architectures we have to reverse engineer the existing components in order to get this. Building new stuff in this case new databases is pretty straightforward although again we teach it very poorly because we talk about the requirements, the what, the how to build it and then we have the actual implementation on this and we give the illusion that you can build your data and your software architecture using the same project. If you build your data models project by project without a guiding program you will not succeed because the only way this can be successful in this type of a format is if your system's development lifecycle is focusing on building a system that shares no data outside of this system. It's a huge huge problem and we've caught people this incorrectly for generations all the way up down here. Very, very big challenge to work on that. Separate and sequence the two of those. On the other hand the reverse engineering is also kind of interesting and I love this particular quote from Fred Brooks who's still with us. By the way this is a key piece that we don't teach undergraduates anymore either. I think the Mythical Man Month should be mandatory reading for everybody who's working with IT. His insight here was that while you can do some things in parallel you can't have a baby develop by having nine women be pregnant for one month apiece. It does not equal a baby. Now that's an important side note on that but he also had a very important representation observation of data models. And the idea is that data representation is really the essence of programming. If you show me your chart, your flow chart describes how the program flows and hide the tables I will be continued to be mystified. In fact I had this poster up at data blueprint in there but if you show me your tables I usually won't need your flow chart because the answer will be obvious. Data is much more stable as an organizational component than are any of the software pieces on this. So let's look at the same model we were looking at before the forward engineering and look at the reverse engineering. Now again you start off with the existing system up there on the top but in this case we're trying to go backwards from this because we want to enhance somehow or do something with the data that's in the existing system. It may be important for us, whoops I hit the backwards arrow, sorry about that guys. Let me jump back through these slides here real quick. It's observation data representation I wonder why I hit the back button. There we go and reverse engineering here and what we're in all likelihood going to do is to look at the physical system and try to derive the design. The answer to that is why? We'd be interested in understanding the existing design. Let me give you an illustration from perhaps not too distant history where you might be approaching a check-in place. This is a check-in place that runs off of a database so the A through D's line up in the first Q and the E's through H's in another Q and the I's through K's in another and the J's through H's. The original data design was created that way because we had 10 megabyte hard drives on our IBM PCATs and we could only handle a quarter of the all customers for each of those databases. Is there any reason on earth to replicate that design and use it in the new? No, none whatsoever. That is why we do the reverse engineering so we can understand the strengths and the weaknesses of the existing system. Reverse engineering is a structured technique aimed at recovering rigorous knowledge of the existing system to leverage enhancement efforts. As we're doing that process we need also to think about whether or not it is important that we go into requirements. If we're not changing the requirements we can skip the yellow arrow. It is not required. However, if we are changing the requirements it is necessary to reverse engineer the design to figure out what the actual requirements are as well. Let's see how this all works together. This is effective for doing this. Here is our re-engineering platform where I've taken the top and the bottom of the charts from before. First reverse engineer the existing system. All systems have strengths and weaknesses. If we don't understand them we run the risk of building the new system with weaknesses of the old system or not leveraging the strengths of the old system. You need to first reverse engineer the existing system to understand its strengths and weaknesses. Only when you have reverse engineered the system you use this information to inform the design of the new system. If you are not demonstrating to me where you are using the information from your existing system you need to be re-engineering your systems. That is a very clear objective behavior that we need to have. Remember also I said you do not need to do the yellow piece of this if you are not changing your requirements. You can actually just go from existing system as is as built existing system how new system how and new system built if you are not changing the requirements. Only once you have that good information to implement your systems and yet what happens in 9 out of the 10 data migration projects that I have been involved in in the past 35 years management's original plan is to go directly from the as is system to the new system. Just show me what fields you have in the old system so I can map them directly into the new system which means you are not considering your data models you are not considering your data architecture you are not considering the motivations were created in that fashion for specific reasons and I have seen companies lose their stuff over all that which is generally not a good thing to have happen. If you want to miss a specific example give me a ring I will be glad to tell you some stories. Let's talk about the process of building now some data models which I remember are architectural components. First of all identify all of those entities just a label in a box. This is a class of business things about which we are going to create, read or update information. Very important to have this at an abstract level. Next identify a key for each of the entities or objects. Now remember our key is what allows us to differentiate one object from another object. It was originally supposed to be that the social security number system in the United States would allow us to individually identify a person. Two things have happened to make that no longer the case, actually three things. One we made it against the law. Two we didn't do a very good quality control when we issued those. So there are more than 5 million duplicate social security numbers around. In fact there are so many social security numbers around that are duplicates that the entire country of Finland could sneak into the United States adopting these duplicate social security numbers and we wouldn't even notice they were here except for the IQ of this country would go up by a bit. Did he really say that? Okay. Anyway the third reason for all of this is identity theft which is why you would not use these as well. Then draw a rough map of the entity relationship. This thing is connected to that thing and that thing is connected to those things and those things are connected to these things. It's not that everything is connected to everything else because if it is then you're probably at a data lake type of situation but if it's all individual pieces we're trying to figure out what those structures are so we can build that structure in a way that complements our organizational strategy. That's where we should be looking at at this stage. Next step identify the data attributes. What are the details associated with a bed, a person, whatever it is we're trying to do and looking at these and what those attributes according to each one map the entity attributes to each of the entities. Now that sounds pretty easy but actually this process is an iterative cyclical process. While you may look at this one of the best things you can do is to develop this model and then share it with some of your business partners or your application developers and say how would this work and how would it not work. If they look at this model and say everything looks fine sorry you're dead because you're not going to get any valuable feedback from that. However if they look at the model and say it shouldn't go that way we should really do it this way that's actually very useful information. So we always want to reward our users for coming up with new information and new ways of describing this and we may in fact discover that there were some relationships that we didn't have on here in the first place. It takes time to do data modeling and I've already mentioned that it's iterative we should see some changes in the modeling cycles that we use over time. Primarily we'll be starting out by collecting information and doing analysis but eventually our modeling cycles will largely focus on analysis activities. There are always going to be some coordination difficulties that are associated with it and that's an issue but not much we can do about that hopefully they will decline over time. We should be doing analysis of the target system in particular whether it's a new software package, an ERP or anything else on this and our modeling cycles should have enough time to allow us to take the original models that we refined and start to validate them. Because remember if a model is not validated it is still in draft mode and I would not want to fly on an airplane that still had the word draft stamped all over the various parts that are there. So let's do a couple of quick takeaways on all of this. Again our attributes are organized into entities and objects. This is an intricate process it means we are going to be making decisions about very detailed stuff. This is why most people who are not data people have zero interest in data modeling but we eventually come up with a candidate idea for what we should do to describe information about a bed in this case in a hospital context. Those entities and objects are organized into models and these models start to codify dependencies. If I encode the wrong dependencies in my models I will have a very good solution to the wrong set of requirements. Very, very critical that we have iteration and cyclical examples as we do that and I showed you an example of that with the data information and intelligence in that diagram. And finally all of those models are organized into an architecture and the architecture is developed for a purpose and that purpose should be reflective of your organizational strategy. That's a different webinar that we do there but again we just don't have the ability to show examples on all of that bits and pieces. So we are back here at the top of the hour and it is now time for you guys to ask some questions and tell me how I can make this presentation better and more useful to everybody. Shannon, back over to you. Peter, thank you so much as always and just to answer the most commonly asked questions just a reminder I will send a follow up email to all registrants by end of day Thursday for this webinar with links to the slides and links to the recording of the presentation. So diving in here Peter, there is a proliferation of the use of UML particularly in government for example can you speak to the merits and challenges of the method for modeling these versus UML versus other methods? UML has done a marvelous job of trying to make what they do compatible with what we do. The object management group who are responsible for that has done stuff to make sure that anything you are doing as an object can be modeled as a data model. It is one of the best success stories in there. I am going to push back a little bit on the proliferation of it. The questioner made it sound like we are seeing increases. What we are seeing overall is a decrease in UML usage all the way around the world because objects are on the same hype cycle that we have to deal with for everything else that we deal with in IT. Interestingly enough the book that we used to teach systems analysis and design 10 years ago was all about UML and you know what we have actually gone back to data flow diagrams which are not part of UML but that is it. I do want to acknowledge the really good work that Richard Stolley and a bunch of others over at OMG have done to make sure that everything we are doing in objects actually maps on the data and everything we do in data can be mapped on the objects. If you have very specific questions about let me know but there is absolutely no incompatibility between the two methods. How do we embed the skill and appreciation for data modeling on the business side for while we were getting into MBA programs? Can you hear me? Can you all hear me? Oh, good. You guys can hear me. Peter cannot. The problem is on Peter's side. We lost his audio. Let me get him back for y'all. We lost his audio. We lost his audio. We lost his audio. We lost his audio. Are y'all meantime? Feel free to send more questions in the Q&A there. I'll text him. There we go. I was just texting. I guess that fire in the next building or something was maybe causing some problems here, but no. Your phone line dropped anyway. Again, always entertaining. What was the question? And yes, I said, y'all. It's just so much easier. I love that. My sister lives in the south now. So, you know, it's just the best word ever. Everybody can say it. All right. So moving on here to the next question, Peter. So how do we embed the skill and appreciation for data modeling on the business side for a while? We were getting enterprise architecture into MBA programs. Fantastic question. And yes, we were. And then there's still some good things. What I found works best is that if you take a simple data model and show the business people how they can use that data model to express a problem. For example, I remember there was one thing that we were working with where, well, actually the example I just gave here was probably the easiest one to stay with. So if we're in the Veterans Hospital and we said, look, do you want to make sure that you can tell male-occupied beds differently from female-occupied beds? They said, actually, that's a big challenge for us. Yes, we want to do it. So we showed them how to use data modeling notation to express that requirement. And they looked at that and went, well, I could have written 100 words and they may or may not have been right. And everybody knows that you can read the same words and come up with different interpretations. That's why the process of lawyering exists on that. So the picture actually codifies that and crystallizes it into a very specific piece. I would, if you're having trouble with your organizations, and this is data literacy is what we're talking about here, which I've already said is the subject of the next book. But if you take a business problem that you're working with with somebody on the business side and say, look, I know we could talk about this for a while, but let me just try and draw you something very, very quick and dirty on the back of a napkin or anything else that you're working with and come up with something that says, hey, look at this. If you have one of these and you need to have 10 of them, or if this way our system is set up so that you can have one and only one of the way these things work, then the business people will actually look at this and go, oh, yeah. And if we'd had this documentation in the first place, we wouldn't have gotten in trouble. So the example I'm showing here again is just say, you know, the reason you're having trouble keeping track of this is because you're trying to go directly from patients to rooms. And patients to rooms doesn't have a connection, which requires lots of misinformation and drawing things. But if we connect to patients through beds, it's actually an intersecting entity if you look at it properly on here. You'll have a much better sense for what's going on. And people will take these diagrams and use them in many, many ways that we, as data people, haven't even thought of. My favorite example of this, I did about five years with Deutsche Bank and their phenomenal organization. But the CIO that I had there, a really great individual, but he just would always say, I never understand what you guys do with all these data models. When I go to Hong Kong and I see the data model up there on the wall and people using it, I know that what you're producing is useful. And I need more of that because my people tell me they need more of this information. Again, the system cannot be useful if its capabilities are unknown. So while he didn't appreciate the how we did, he understood the what that we did and was very much able to take that and tell people that, no, the work they're doing is important. They need to be into their process so that they are able to complete the data models that they need to have. It's again, it's one of the things, you don't want to go invade them, but if you just have a once a month session where people come to your thing and you can teach them about data modeling and other types of activities, it's a very, very productive exercise. We've had some of these models. I was in San Antonio not too long ago last year and I hadn't been there for a while. And there was a fellow there who said, hey, look at this, it's the data model that you produced last time you were here, Peter, which was 20 years ago and I'm still using the same data model. It's just amazing how little these things change and once you get a good model, it's very, very useful. Social engineering. So, Peter, as we look at multi-sourcing and particularly trading out service providers, can you speak to the complications of trying to get various systems with their own embedded data models to interoperate? Of course, we're pretending that they all have data models. Well, of course, all the systems do have data models in the sense that if they are using data, there is an architecture that supports it. Again, where's my slide? Here we go. All systems have the models, the architectures. The only question is if you don't understand it, you simply can't use it. And so, given that that's a data model and given that that's an issue for most organizations, you really have to show them how these things are useful. And if you can't show that there's a utility on the other end of it, the manager does have a question. You might say, you know, I'm not sure why we should do that. So you've got to show them that these business models have value and show them what value that they derive out of that. And you can't do it just once. It's a continuous process because management is being bombarded by all kinds of other different types of activities back and forth. So we always have to get our message up there as well. In fact, I've got a case study where we couldn't actually solve the problem until we got the board of directors involved in it. And boy, if there's a tough process to make that understood, that's a really tough one because those people's times are very, very scarce to be able to have them. I'm not sure. I might have wandered off traffic. Did I answer the question, Shannon, or is that a... I believe so, yes. And I'll let the questioner add to it if he desires. But, you know, the next question, Peter, we get a lot. We get quite a bit, you know, data and business departments, you know, thoughts on best practices with regards to interaction, communication, quote, unquote, with as opposed to at all too often, there appears to be a disconnect. Absolutely. And of course, the with versus add is a very apt content because we oftentimes will try to talk down to or simply say this is complicated. Let us go away and do it and we'll give you something and then of course you want to show them something like this. Think of what am I supposed to do with that? It's not something that is easily fixable if there is a disconnect there. But one of the first things that can happen is that in Peter's world, the data people belong to the business. They do not belong to IT. And so, you know, think about it for just a minute. Why do we call it a business rule? Because we're over in IT and we want to go talk to the business. If we were already in the business, we wouldn't call it a business rule. It would be the way things happen and we would know that information. We wouldn't have to go and discover that. This model that I'm showing here, this architecture, is describing a whole bunch of business practices in the organization. Things such as you must register and give us your home address before I can give you a price on your healthcare. That's absolutely backwards, isn't it? We should actually price the stuff and then figure out where you live, but unfortunately there are some issues with that that go into that. So, first thing is if you can, get your data people out of IT and make them part of the business. Second thing is, it's that continuous process of people working with the groups on a regular basis because when you have this kind of information, pretty soon you will be seen as somebody who has some expertise in this. Let's change the model here for just a second. I know you're looking at the same picture, but let's pretend this was a model of an ERP that was about to be implemented in your organization. If you have this data model, and by the way, most ERP vendors do not want to give you their data models on this. I made literally millions of dollars over the years reverse engineering the various ERPs that are out there because this information is so valuable on this. If I have this information and nobody else has it or cares about it, then people are going to start coming to me and asking questions. What does the green represent in this diagram? I'm working in the orange area. Do you have more detail about the one that's over here on the left or whatever it is that we're trying to do? All of these things, when we start to write them down, we can now argue about whether it's right or not. But if we don't write it down first, we can never argue about whether it's right or not. It just becomes talking with alcohol, which sometimes happens too. So is the architecture a consequence of there being many models in an organization making the architecture a database of databases? That's a way to think about it, although we don't have the access. Most organizations are not well documented enough that you could use a database to access that information, but some are. Here's another piece that's just astounding for me. We stopped teaching case tools about 20 years ago. So the students come out of our classes at good universities and do not know that these tools exist in here. So the database of databases, the collection of data models in here. Let's pretend that the one that we're looking at here, again, is a architecture for an organization, but that everything that is blue or violet or purple is not done yet. We know it exists, but we're not done yet. So the only things we know are the green, the orange, and the sort of blueish teal up on the top there. Even that is more information about the organization's data than most organizations have about their sole non-depleting, non-degrading, durable, strategic resource, their data that's in there. And so this model can be useful even though it is incomplete. And one of the things you might look at as a group that is trying to do more with your data in your organization or let's go crazy and say we're going to become digital, which is what everybody wants to do today. By the way, you can't be digital without data. It just doesn't work on that. So your data modeling becomes even more important in a digital transformation that it does outside of your digital transformation. We don't have to have the thing perfect. And that was where we got in trouble, I think, was when we tried to strive for perfection. We can make good decisions about this data model, even if it's only half or a quarter complete because some of this information is still much more useful than what we've had in the past. So it's real important to understand that we've got a lot of new tools, new techniques, new approaches for this than we used to, and it really is a different world than we used to have. Data model validation. What's the best way to achieve this? All right. So let me tell you how not to do it. Iowa, you don't hand an app to somebody the day before you go live and say, do you know how to use an app and expect that things are going to go well? And of course, if you've read the papers, you know that Iowa was a disaster. The purpose of testing anything is not to prove that it works. The purpose of testing something is to break it. And if you pay people to break something, they will break it. If you pay people to tell you it's fine, they will tell you it's fine, but it's not going to be tested. The whole purpose of testing is not to attempt to break something. So the consequences around data modeling are that you're showing information to people in ways in which you hope they will find errors and improve this. It's tough. One of the hardest parts to some of the classes that I teach is that I say, you're going to develop a data model and the person sitting next to you is going to tear your data model limb from limb. Why? Because you won't understand fully the problem of interacting with your business partners there. Can you in fact come up with a more improved data model? Remember the iteration process that we had around this. So it's very, very critical to understand that the purpose of testing something is to in fact break it. And if you incentivize people to break something, then they will break it. If you incentivize people to tell you everything is fine, then they will tell you everything is fine. I'm looking for my iteration slide on here. There we go. That's the one. Again, remember what we've got here is most people think that we do this just once. But in fact, the way we do it is that we make sure we have a feedback loop on this. And once we have the feedback loop, to understand that this does not ever stop until the organization itself stops evolving. You will no longer need data models and data architecture in your organization when you no longer need an HR function. And by the way, the evolution of that is exactly the same. I'm describing this earlier on that. Again, if we don't have this part right, let's say that we're doing a data model not for the purpose of enhancing the business, but maybe fixing a problem. And if we diagnose the problem correctly by changing the data model that's at the heart of this, we can make some fundamental changes to the business in a way that is able to do things that it wasn't able to do before. New capabilities. And these new capabilities are critically important for the organization in order to be able to move forward. But it's not going to be the... We're not done, right? We're going to continue to use this. The only time we're not going to need this task is if the organization no longer needs an HR function. And Peter, is there an ideal stage in development for modelers and architects to interface with data governance? Continuously. Data governance people are not your experts in what you're expert in. And so the same way we were describing the social engineering tricks that you need to involve the business people, the data governance people, in many cases, are also not data knowledgeable here. I have a couple of things on governance that I like to throw out in particular. One of them is that the language of data governance should be metadata. I have worked with so many groups over the years who have been doing data governance and they haven't actually realized they weren't talking about the same thing because they didn't use this type of an articulation technique to say, are we talking about the same thing or are we talking about two different things here? And there are lots and lots of examples here in order to do this. So very, very critical to make sure that your governance group understands the nuts and bolts of data. And in many cases, these people do not. They come to this from a policy perspective, which are valid and good, but they need to have additional information in here in order to properly govern the data that they're working with. And if you have questions for Peter, feel free to submit them in the bottom right-hand corner of your screen. We've got some time left. So Peter, how do we reverse engineer our DMS that has no relations within table objects? That is such a great question. That is actually how I started my career. So we have some really interesting things and I'm working with a woman named Dina Bitten who invented most of these techniques in response to a contract that she did for the Department of Defense. If your database, your relational database, or your data lake, or anything else that you're working with in your physical as-is world does not have relationships between them. It is still possible to construct a logical third normal form of the model in a semi-automated fashion. Hopefully that's exciting to some people and the rest of you are going, I'm not sure what Peter just said. So we have techniques and processes that we can apply to physical piles of data to derive from those physical piles of data a logical model. Remember we talked about the conceptual logical and physical structures. The logical model, that middle layer that talks to the business in a semi-automated fashion. Nobody likes that because everybody wants it to be fully automated. It cannot be fully automated as long as it requires subjective interpretation about what's actually happening within the model. You need humans to define the terms because there's no way artificial intelligence will ever be able to come up with this type of a process. This process of data discovery from these physical as-is data sets can allow you to determine a third normal form of that model, which is the pristine version of it from a data modeling perspective. Yes, there are additional categories of normalization beyond third normal form, but third normal form is a great place to be brought to by automation in order to do this. You do not need those relationships. It simply looks at how the data works and infers the relationships that are there. That's, again, probably a more detailed session, Shannon, but if anybody's interested, there's lots of material on it that we can do. That was done for courtesy of the Defense Department and provided to all of us free by the U.S. government as an important asset that we have in there. Again, not many people know about this, but it is absolutely possible to do that sort of thing, and very, very useful and a lot cheaper in most cases than doing other approaches to it. Peter, are there any industry recognized certification or courses to demonstrate the high-level skills in data modeling and data architecture? The CDMP is the best place to start. DAMA International is working on additional series of exams that we will be running at EDW. If you're interested in that, go to the CDMP at dama.org. And just to let everybody know, we're actually going to release a CDMP prep course in mid-March. So we're excited about that. Time for the conference. That's fantastic. I know. Well, unfortunately not enough for time for people to take it before they get to the conference, but... I think we should be running at the conference. Yeah, you're running at EDW. You're running the exam and actually at all of our conferences. Any of our conferences. So... But, yes, that's a great question and a lot of different education sources out there in terms of just... Well, that's not certified, but get a good amount out there. So, Peter, what are your thoughts of modeling around NoSQL databases, like Neo4j, a graph database, or a document store? So a great question. And again, remember, our activities here are driven by a business purpose. So there is some business purpose that says I need to take this and implement whatever it is that I'm doing. In most cases, moving into a NoSQL or non-relational is really the word that we use for these. There is a business purpose behind it. For example, most people are not aware that relational databases are actually one of the slowest form of databases out there. Two others that we've used for years and years are network and hierarchical databases, and we don't even teach students that these exist. So there's been about 30 years of people who just are really surprised to find out that there's something that is literally 10 times faster than relational databases. And the reason we did that is because we actually invented them first. Relational came along later in order to build those. And we've also discovered that there are other times where it's not appropriate to go through what I'm showing you on the screen. And in that case, a NoSQL database can be a way of describing a NoSQL database can be a very good way of managing certain types of data that will get you answers faster than going through a traditional development process. Typically, those activities, people overhyped and said they will solve all of our problems. Well, I've never seen anybody run their payroll using a NoSQL database. It just doesn't make any sense whatsoever. And I've said that before, and somebody will prove me wrong, which is great. But we now have new capabilities, and these new capabilities do not require some of these constraints. That doesn't mean that models are not useful for them. So again, the purpose of the model is to reduce the imprecision and uncertainty around aspects of the data. And if we can use the model to reduce that uncertainty and imprecision, then we have added value to the process. And these NoSQL databases are really good at initial discovery and at archive. And they are really good. And if you think about it for just a minute, if you have mail in Gmail that is archived, it's out there archived in a NoSQL database. It comes back to you in a different format with a slightly longer lead time. But we're sure glad that Google never forgets anything, at least most of us are. And what would you recommend to those companies that are about to start integrating their data systems with the cloud? Oh, I have three rules for the cloud that most people ignore completely. If you are moving your data to the cloud, it represents an inflection point. So the first rule is that data in the cloud should be of higher quality than data that is not in the cloud. And most people would generally agree with that. But how are you going to actually achieve that if you don't make the data of better quality on the way in? So your data in the cloud should be at least as good as the data outside the cloud. And in most cases it should be of better quality. And that's not going to happen through happenstance. It only happens through conscious action. Second rule for data in the cloud is that data in the cloud should be by definition more shareable. And for that I'm going to go back to the slide where I showed the ANSI stack, where the conceptual, logical, and physical models are there, because data in the cloud can operate exactly the same way. And I think that was the example that I used, was if you have data that's in the database and you move it to the cloud, then that can be more useful on all of that. So first rule, data should be of higher quality in the cloud. Second rule, data in the cloud should be more shareable. What do I mean by more shareable? Shareable is a specific objective criteria. It would exist at the conceptual level, the logical level that's out there, because that's where you get your business reuse. So if we don't put data in the cloud to be more accessible than not accessible, we are actually going against the cloud's purpose in most cases. So data in the cloud should be of higher quality. It should be more shareable by definition. And the third piece of that is if you do the first two of those, the third thing is the data in the cloud should be of smaller volume than data outside the cloud. The reason for that is because you have cleaned it. You have made it more shareable, and by making it more shareable, you're allowing disparate piles of data that weren't previously integratable to be integratable in this. So those are Peter's rules for data in the cloud. Data in the cloud should be cleaner than data outside the cloud. Data in the cloud should be more shareable than data outside the cloud, and data in the cloud should be less volume than data inside the cloud. There's not a company on earth that has ever saved money by going to the cloud, and it's not fully sold. It doesn't change anything. It just changes the pockets where the money go. I get real riled up on hype. Tell us how you really feel. Yeah. Is there any specific course to understand the data architecture in our relational and big data in NoSQL world? I'm sure there's got to be courses out there somewhere. I have not done one, Shannon. Do you have one in other parts of the of the university? Yeah, we do have a modeling graph databases that we just released with Thomas Riesendahl, which is a new course on training.dativersity.net. We do have that. A lot of articles and blogs, of course, as well. Probably good to figure it out. Yeah, we're just building our modeling queue of courses. Perfect. Well, that brings us to the half hour, out of questions. Peter, thank you so much for another fantastic webinar thanks to all of our attendees for being so engaged in everything we do. We just love it. I got a nice education. All y'all will be in San Diego in March to meet us at EDW. Thanks, Peter. Thanks, everybody. Hope all y'all have a good day. Thank y'all. We'll see you next month. Bye.