 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Data Architecture versus Data Modeling. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin, brought to you in partnership with Data Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle for that feature. For questions, we will be collecting them via the Q&A in the bottom right hand corner of your screen. Or, if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag data ed. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and will likewise send a link to the recording of this session, as well as any additional information requested throughout the webinar. Now, let me introduce to you our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. If you already know him or have seen him at conferences worldwide, he has more than 30 years of experience and has received many awards for his outstanding contributions to the data profession. Peter is also a founding director of Data Blueprint. He has written dozens of articles and 11 books. Most impressive. The most reason is your data strategy. Peter is experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups of diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart. And with that, let me turn everything over to Peter to get today's webinar started. Hello and welcome. And welcome to you, Shannon. We have a really cold, wet rainy day here on the East Coast. Hope everybody else is warm or wherever. But let's jump right in here. One of the things that you end up seeing a lot when you do 500 data management practices as I have is a lot of organizations manage a lot of different architectures. So let's start off by just talking about a couple of the different types of architectures that people will manage in organizations. Business architectures. Fairly common one. A lot of strength out of Europe around business process architecture systems. Architectures. Security architectures. Technical architectures. And of course, what we're concerned with here, data and information architectures. Now, really key with this is that management tends not to understand what these organizations in your organizations do, your architecture functions. They're not sure what happens there. And they do see it as a bunch of technical committees where people are having good conversations about topics that are stuff that they probably are responsible for but don't really understand. And that's an issue. So one of the things we'll talk about today is how to help you guys articulate that value to management as you do. But let's also put some numbers on this. One in 10 organizations manages one or more of these architectures formally. So first of all, if your organization is doing this and you would like to do this, you're in a good match. Most people that are talking about these topics are trying to do this. But believe me, there's a lot more of your colleagues that are out there trying to convince management that this is important, something that they ought to pay attention to as opposed to actually doing it. So important numbers there are what is architecture. This is probably not the way you want to explain it in an elevator situation. You know, your boss gets on at the top floor and you're both sort of standing there and boss looks over and reads your name badge and says, oh, so Peter, what is it you do for me here? You do architecture, right? What is architecture about? And I say, well, boss, the architecture is about things and about what those things do and how when those things are doing their thing, they interact with each other. So very simple explanation, but probably not one that helps the boss understand what it is we're actually trying to do here, which is to help our business process, system security, technical data and information architectures manage better all the way around. So let's take a quick look now and walk our way through the agenda here. Again, key is that models are the things that bridge the difference between data maps and architecture. So we'll talk about why we need them, how are they used and what challenges we do have in organizations that are attempting to do better with these tools. We'll talk about engineering and you can't talk about architecture, you can't talk about architecture without talking about engineering as well. They are two sides of the same data coin there. Because if you don't have standard shared data of known quality, then it's very, very difficult to do either architecture or engineering. And standard shared data of known quality is what gives you the ability to do both of those functions. From the top, most people think of architecture when you're building something new. We call this officially forward engineering because we're building something forward. The idea is that we're composing things, we're building something new, we're creating a new capability. But it's also important to understand that architecture and modeling are crucial for looking from the bottom up as well. This means reverse engineering, which is really focused on understanding how things work together. For example, this is Tuesday after BB&T announced it was buying SunTrust. And they're going to be looking at SunTrust over the next couple of months and saying, hmm, how does this work? So there's very little forward engineering going to be involved in that, but a lot of reverse engineering. And of course, to get any of this to work well, SunTrust, and BB&T, the new organization, whatever it's going to be called, we'll have to take the best things of SunTrust and bring them forward, and take the things about SunTrust that weren't exactly superior and leave them behind. And that is the process of reverse engineering around this. And this will be all about data architecture and data modeling. It's crucial that these functions work together, not just in the merger, but in general, because you've got to simplify these environments. They're getting so incredibly complex. It's just very, very difficult to work. So anyway, let's jump in. I'll give you a quick little glossary lesson here. First of all, many people have heard me ask groups this, what is the number 42 mean? I'll give you the answer. It's the meaning of life, the universe and everything that comes from a book called The Hitchhiker's Guide to the Galaxy by Douglas Adams, where they ask the question, and it turns out the answer was 42. There's a longer explanation, but just bear with me for a second, because what I've done in your mind here now is connect a fact 42 with the meaning, in this case, the meaning of life. I'll give you another equally useless combination of facts and meanings. 42 is my age 18 years ago. So again, something you can do with it, but that's how we make data. We combine facts and give them meaning. To get to information, we have to go one level of abstraction higher and start to work in that context of what is a request. And then finally, to get to where data is a strategic use to us in our organizations, we have to go one level high and say it's not just the data that's requested, but what data is actually used and how is it used? It turns out that's pretty different. And there's a lot of evolution that goes on between these. Well, the point in telling you this is that you can hopefully see an architecture in this diagram as well. I get organizations occasionally just want to build an information architecture. I don't want to do data and intelligence or I'd like to manage my data and information architecture separately, which is it's just more confusing than not to do that. So these are necessary, but insufficient conditions for strategic use of intelligent information. You need to have a good founding data. Steps on that architecture, very, very easy to do. But architecture and data as a subject is complex and detailed, and it's taught inconsistently around the world. As a result, it's poorly understood by everybody outside of a couple of us in the profession. Maps, data maps are necessary, but insufficient prerequisites to data architecture. And these are the models. This allows you to fully manage the data assets that you have. Now, these maps are incomplete, however, without also what we call purpose statements. Most people try to say definitions, but Clive Finkelstein taught me a long time ago. The use of purpose statements is a much more powerful, completely different way of addressing it because it incorporates motivation in here. We'll come back to that a little bit, but they are more powerful than definitions on this. So adding these purpose statements, validating the models are really, really credible. It's another component that's often left out as the models are not validated. They're simply created and a validated model is much more, much more worthwhile than a created model. Anyway, data architectures are composed of these data models. And the link in between them is that architecture is too big. So models are subsection or view onto a model in there. These are necessary prerequisites to have them, but they are insufficient prerequisites to leveraging your data assets. Sometimes you can get lucky. But since this is largely an engineering based discipline, it's much more important to be accurate on this. So let's very bottom start off with a what is the definition of a data architecture? And here it's a wonderful computer science definition, an organization of information, blah, blah, blah, I'm not going to read it to you, but it does talk about the characteristics of how data is organized. In some sense, you can think of it as providing a grammar for data objects. So whatever rules can be worked on them, constraints and ordering a preference, whether it is or must be or might be nice to have unique ones. You know, is there an order or they hierarchical relational or network balance optimality? All of these things are important, particularly as you get faster, you need to be engineered better. Because these architectural engineering components do not happen by accident, quoting a T farm fine in India. Now, again, back to here, what are the components? Well, details that we have are organized in larger components. And that's pretty intricate stuff on their larger components are then organized into data models. So we take the components we organize and those create dependencies among and between various aspects of them. And finally, we pull all of those things together the models into an architecture. And that is purposefulness. That's where we start to address what really needs to happen. How are we going to work this definition of a set of data in support of organizational strategies, attributes, sorry, they did a structure level exactly the same thing occurs, attributes go into these entities and objects. So the attributes are characteristic of specific things, business things about which we create, read, update and delete information. That's our old crud that we talk about entities and objects are these things whose information is managed in support of strategy. If you have a thing that you're managing and it doesn't support strategy, you do have a legitimate right to ask the question, why are we managing it? It's got to be a good example. And then there are examples of these things about you Google them on the internet and here's one for bed, we'll come back to it in a little bit. Bed has got four things that characterize a bed, description status, sex to be assigned and reason reservation. So entities are organized, excuse me, attributes are organized into entities, entities into models. Here are these coordinates, combinations pulled together, badly structured data, constrains the ability of the organization to deliver information. And again, I'm giving you an example of a model here before I talked about the difference between data, information and intelligence. We did that just a little bit ago. These are examples because they're kind of nice, we can comprehend them. We'll see a problem with that because when you try to build architectures, the architectures are used to plan development, but more often managers and data managers in particular don't know what the existing architectures are particularly in new situations. They can't make support of them and can't make use of the support of strategy there. And this is a huge problem. It's very difficult to find examples because they tend to be big and have lots of parts. So here's a data architecture that's just a random one. I forget what I found. This is clipart or exactly where it came from, but it's complex. It would take us the rest of the webinar to walk all the way through it. It's just not comprehensible in there. And most importantly, people really don't want to see them that way. They want to look at a portion of it and they want that portion to be well integrated with the rest of their perspectives on the organization as well as their understanding of how the organization works. So data architectures determine interoperability. They're required to enable correction and generation. The better to find the architecture is the better you're going to be able to automate one or more functions of the architecture. The architecture permits the governance of data as an asset. You can't do data governance unless you understand the architecture of the thing you're doing. It's like saying you're going to protect a building and not knowing whether the building is an outhouse or a dormitory. I mean there's huge differences. It's a prerequisite to meaningful data exchanges. It lowers the cost of organization, internal organizational sharing and external. And it's important to understand that your organization is already spending 20 to 40 percent of its IT dollar doing this type of data exchange that in many cases is non-value added and unnecessary. So your organization has a large chunk of IT budget that's stuck in bad data architectures. These architectures permit the evolution. So when we merge with another company, again this whole BB&T and SunTrust thing is going to be interesting, it's required for role-based security implementation. If you're going to implement role-based security and if you're not, I question what you are implementing. It decreases the cost of maintaining data inventory. So these architectures capture the business meaning of the data. They are however a living document. So printing them, you know, the yesterday's news already, it should be evolving on a slow but general basis. In fact, data architecture is a program for your organization. So make sure that you people understand that it's a part of your data program and your data program needs to exist a long time. It's a potential entry point for architecture engagement. Somebody says I'm going to start working on the orange stuff. You know where to look and the architecture is the map that helps you get there. Again, once you get these architectural components validated, they can use to start populating the business glossary representing therefore a major collection of metadata all around. So these data structures are organized into an architecture. Then the question is how do data structure support strategy? And the first thing you have to do is turn the question back around the other direction. Where are your systems explicitly designed to be integrated or otherwise work together? If they were not, what do you think the likelihood that they will just happen to work together is the answer is of course slim to none on that. Again, 20 to 40 percent of your IT budget is spent compensating for poor data structure integration in here. And these structures can't be helpful as long as their structure is unknown. So again the importance of models as part of architecture. There are two interesting aspects of this too. Let's just take two different strategies on this. One is strategy is to achieve effectiveness efficiency goals. Again, if you're making a play for consolidation and a simpler set of systems accruing across number of delivery platforms and things like that then these effective efficiency goals low maintenance are probably better. However, you may be looking to enhance organizational dexterity. And so in this case it's a different type of an architecture, a different type of an approach. And let me illustrate that with these two sets of dishes. On the top set of dishes there we're just going to make up a story that this restaurant has a focus on providing the exact dish for the exact plate. So in other words the apple pie has a different dish from the peach cobbler. And if you're trying to deliver this food service to the customers and you drop a plate with the dish on top of you're going to have to go back and find not only the new piece of apple pie but the apple pie dish which is going to be a harder search than if you optimize for speed and simply grab the next plate off the top of the existing set of plates and use it this way. Again these data structures need to be organized in a purposeful way so that your organization can benefit from the assets that you have in these data structures that will then be useful as far as getting the organization ready to support strategy. Again the models face downward they are detailed the architectures at a higher level of abstraction for thinking the Zachman framework. Again the couple levels are and the models are further down and of course he shows that as well. We had a challenge in the past tell a story on on myself but I was part of a group at the Defense Department that in the late 1980s and early 90s decided that we could architect the entire Department of Defense at a useful level with a control vocabulary of about 5,000 words. Needless to say we didn't succeed or you would have heard about it because it would have been quite an achievement. That said we did produce the first level of federal enterprise architecture which was the DOD enterprise reference model and it was integrated in process and data model that we used. This give us the ability in the Defense Department to focus on architectural components and most people aren't aware but the government is actually slightly ahead of the rest of the world at this point certainly private industry in terms of its maturity over the break President Trump signed a law that mandated the use of CDOs in federal agencies. I mean we'll get a webinar on that at some point but it's quite an interesting environment at that place. Anyway key here again is that models are downward facing architecture upward facing. We've got this ability to take a look at things. So let's look at a scope of a data model. You might make a data model for a specific program. You might also make a data model which would be really underutilized for a different program but ideally when you're doing this you have a family of programs. They're mostly related. There's some overlap between them but they all access common data and so the goal is to say okay let's not manage the data on a program by program basis but manage it at a family of programs basis. And this family of programs is served by a database sitting in brown up there in the right hand corner. That database hopefully has somebody taking care of standardized definitions across to the same way as we were trying to do it for the Defense Department and the same way that ERP and COPS packages are marketed as similarly integrated. Sometimes they are, sometimes they aren't. The best use of the modeling effort under these circumstances is to model all of the data for that particular piece. That works great but sometimes you end up with other parts of the organization or they come to you from different ways or they're simply not allowed to scope them this far. But they hopefully have the same type of governance at the same type of level. Oftentimes they don't and then the architecture is the opportunity then to take on and understand how data flows throughout all of the systems. The focus is much broader than either software architecture or database architecture. The analysis scope is on system-wide or organization-wide use of data and it's a much more productive way of doing it. This helps us address problems, challenges that are caused by data interchange or interface problems and the operational goals can be focused more on supporting the organizational strategy given what's going on here. However, there are some barriers to this. I divide them up into social, political and economic. The first one I've already alluded to as a social barrier is that this program of working your architecture with your models have to be focused specifically on as a program. If you don't, there's just, again, projects start and projects end and your data architecture is going to be around at least as long as your HR program for the organization. And this is the thing to tie it to in the minds of your executives. The reason we're doing data models is because these data models will be useful to this organization until we stop collecting this type of data. Usually the type of data that you're working with is the type of data that the organization wants to manage and consequently do a good job. So tie this very, very strongly in mind with your executives that they understand the purpose of data architecture and modeling is to provide a programmatic set of activities. They don't have to be huge but they do have to be large enough to be useful. That's a different topic. We can get into that as well. The reason we have to do all of this is because knowledge workers really end up knowing nothing and being taught nothing by college and university programs about data. And of course the definition of a knowledge worker is someone who works with data. So socially we've got some things that we've got to go back and address. And there's a political dimension to this as well. This is where I'm calling the baby ugly. I'm sorry, but teaching 30 years of IT professionals that the only thing they need to learn how to use is how to build a new database. If there is a score we do not need any more of on planet earth it is how to build new databases. I'm not saying it's a bad skill. However it's the only thing that we've taught IT professionals about data for 30 years and consequently why are we surprised there are so many new databases out there. More importantly though than that and I think that's pretty huge. More importantly than that IT professionals also go through these programs. And our managers have gone through these programs and our leaders. And they get the idea that in a 10 credit concentration this is typical of what you get in many programs for the last 30 years that only one tenth of it should be devoted to data that's about how much time we're going to spend when they get out in the real world. Data is a technical skill that I only need when I'm developing brand new databases. And that's of course insane. But the other problem is of course because we've taught everybody how to build new databases they look at every problem and see just the same way as a hammer looks around and sees everything as a nail to paraphrase Abraham Maslow slightly. So let's take a minute. Everybody is familiar I hope at this point with the DIMB block. Dima's body has got a guide to the body of knowledge. We've got a version two out. And I like to say this is now good enough to criticize in the spirit of George Bach who says all models are wrong but some models are useful. This has been a heck of a useful model. And we've done great. We're missing a couple important concepts. We need to have the concepts of optionality and dependency in this articulation of it. One of the reasons for that is because from an architectural perspective people look at this and say oh can I start anywhere on the wheel. And the answer is well there are some things that are more important than others that you do them early. Then there's other things that are simply dependent or optional. For example not everybody needs a document and content management program. But of course all organizations have purchased SharePoint. Sorry couldn't resist. Anyway moving on this is version two of the DIMB block and you can see that data architecture and data modeling are critical components right next to each other for use of everybody. And we see them as highly integrated and related but I've just ran through a couple problems with you which is that business decision makers and technical decision makers are not data knowledgeable and therefore they make bad decisions which results in poor treatment of organizational assets as well as poor quality data. And of course then some poor outcomes and those outcomes don't get corrected that we will continue to fall through this spiral. The reason for this is because the decisions that are made around data often end up causing problems somewhere else. Now I wish I could tie data to the Gallup and Gertie story I can't but you can already see the stories on here. This is a very fine bridge built in Tacoma. In 1940 ended open on July 1st and collapsed on November 7th. Obviously that was a little bit shorter than the planned life. And they did take away some lessons from this. This was a dramatic failure. Again one thing to think about just very briefly and this is a little choppy that you're looking at but if you just Google Gallup and Gertie you can see the real film out there at YouTube on this. Just a quick side note. It turns out that the insurance company who was ensuring this bridge decided to set up these cameras after they saw the bridge exhibiting some of this behavior and said maybe there's something to be learned from this. As a result they don't design bridges this way anymore. They made some very significant improvements because as you can see here the bridge failed catastrophically. Now the reason I'm showing you this is because most organizational data failures don't manifest as catastrophic failures. Instead they are insidious and again for the third time taking up 20 to 40% of IT budgets migrating, converting and improving data all the way through. Doing a poor job with data takes longer. Doing a poor job with data costs more. Doing a poor job with data delivers less functionality than you would have normally and presents greater risks to the organizations than if they would instead crawl walk and run their way to where they need to go. So let's take a little bit further now. Those are the challenges. Now let's talk quickly about architecture and engineering and the most important thing to understand about architecture is that everybody has one. It's just kind of a bit that people forget. People call us up at Data Blueprint all the time and say hey can you come here and build a data architecture? Yeah we could but the only time you build a brand new data architecture is when you're going on a brand new green field. All organizations have architectures. So the question isn't can you build me an architecture. The question is can we help you understand and document and therefore make more useful your architecture. And the way we do that is through a series of data models. We could be comprehensive and derive every data model from everything in your system but instead it's more important to focus on the things that obviously have greater importance so that you go for a depth versus breadth type of an approach and this has been backwards from how we have traditionally approached the subject. In addition to that of course we need to focus in on the relationship between architecture and engineering. As I told you on the way in architecture is used to create and build systems that are too complex to be treated by engineering analysis alone. And engineers develop the technical specifications for implementation of those systems. They have to work together. It's a virtuous cycle if you will in order to do this. And again the currency that we're talking about here in all cases are data models. Now I've tried for years to explain to management that you really can't architect after you've implemented something you have to re-architect and that's very different than building from the top down. However BMW came up with a great commercial for us and did this. Even that I was the architect of the pyramids and had built this and the Pharaoh said to me I'm sorry Peter I need to have a swimming pool in the basement of this. I would just simply throw up my hands and give up because at that point the pyramid is built of shifting large stones on top of shifting sand and the chances of putting in a swimming pool after I have already built the pyramid is not likely to happen. Now let's talk about engineering as well. That's architecture of course. Purposeful and oriented. Here's another purposeful item. There could be a couple of attributes about it. It's taller than I am. It has a clutch built in 1942 and here's a little interesting one. It's cemented to the floor. My goodness why would you do something like that? Well, the answer, oh by the way it's still in regular use today but the answer is because you are putting it on a warship and sending it out in the middle of the Second World War where we were losing and asking 4,000 brave warfighters to get on this ship and change the way the war was winning. They did of course and one thing they needed every morning was breakfast. Hence you want to make sure that if you're going to make breakfast for 4,000 people again not simultaneously doing stages but nevertheless every morning you need to have something more than one of these. While these are wonderful machines, KitchenAid makes a fantastic appliance here, they are not going to have a duty cycle that is going to let you make pancakes for 4,000 people a day for every day since 1942. It's just not what it's designed for whereas this thing was designed to last for feeding lots of people. We don't know how long but the longer it was available certainly would be really bad to have those folks out in the middle of the ocean and all of a sudden have no food for them or give them only something else. Anyway you get the picture on this. Let's now dive in a little further and talk specifically about the components of data models and architectures and one of them has to do with relating facts about business things. I mentioned that earlier on. So here we have facts about bed. I'm going to Wikipedia and say bed is a piece of furniture that is usually a place to relax or sleep. That's a definition. It's got a little bit of purpose in there and it says well here's what you might want to use it for as they just said a bed is a piece of furniture wouldn't be very useful and if I want to sleep I go to a bed or maybe the couch or relax on here. Well you know regardless of that there are some concepts in here. Let's talk about the concept in this case I'm showing you the picture of a bed in a room. So the way to represent that from a data model perspective is to look at these two components number I always I said in here that we don't the models are incomplete until we put the purpose statements in here. I'm leaving some of these purpose statements out of these quick representations of these are incomplete models but nevertheless we have bed and room. Now that's nice it tells us that we're concerned with beds and rooms or it doesn't tell us what the relationship between those two objects are. The relationship is that at the entity level we're talking about storing facts. So if I have beds and rooms I may say that the bed is related to a room. Very nice. It's not as precise perhaps as we'd like but that's good. Right. That gives something beds and rooms related. Well here's another version of this is a little bit further in refinement here and you'll see what I've done as I've added the orange notation there to say that many beds can be related to many rooms. All right. Well that's also interesting. It tells us information. It may not be still as precise as we want to get for the specifics of this instance but nevertheless it does model accurately information that's out there. Let's do this one more level down and then in this case when I add the qualifiers onto the relations I'm adding them in a way that speaks a little bit differently that giving us still better information many beds may be contained in each room and each room may contain many beds. So this is a one to many on the way the room one room has many beds in it. That's a different specification than saying beds are related to rooms and it's certainly different from saying there are many beds and there are many rooms. In this case we're looking specifically at what rooms contain what beds. So if I'm looking for bed number one two three and I have an information system that says that is in room X Y Z then I can have a fact stored about that and that's kind of useful and we can build entire assumptions based on into this architecture based on that fact that the room can contain one or more beds until what if you can move the bed and I have a completely different situation here. Hold that thought we'll come back to that in just a second. Now let's talk about the relationships between the entities at this point on this end it's with a T exactly one that means it's mandatory. I also can put down one or many this literally covers the entire range of possibilities when I add to it zero or one eventually one zero or many optionally and eventually one or many optionally. So there's five types of relationships that we can have there and those specifications are of critical importance and of no interest to anybody outside of this particular webinar. This is by the way the language of information engineering. There's also a version created by Dr. Peter Chen many years ago for which he was widely acclaimed Charlie Bachman had a style as well and Martin had a style as well on this plus there's very close rely sorry Bachman style ended up largely mirroring on the object notation here but people think about when they look at this diagram they start talking about the attributes and what's good and what's bad about all of these things and look just pick one if you're arguing about this stuff particularly in public people are going to think you are PhD professors and not really useful to their IT world. Let's get back to our models here natural associations between the entities. So we talked about an association of the entities there which was a room can contain many beds. Well these natural associations are important because we want to represent these concepts as naturally in the database as they correspond to in the world after we've been to the process of normalization. We're not going to talk about that on this webinar that's a on the modeling webinar that we do but let's let's take a look at these relationships that are in here again natural association. And here I have room bed we saw that from before one room many beds but now I've added a patient in here. So the relationship in this case could be patients can be in any bed or patients can be in multiple beds in according to the state of model. Now this defines the mandatory and optional relationships and what we talk about with ordinality and cardinality very critical to understand that with respect to relationship because again these are the specifications for this. So a bed is placed in one and in this case only one room because beds aren't mobile outside of the room for this given scenario and a room contains zero or more beds. Now obviously there's a limit to the number of beds that we can put in a room but we're not going to try and capture that in the data model because that's not information that is being requested the way the system is being defined at this point in time. Similarly for the other relationship a bed is occupied by zero or more patients and a patient occupies at least one or more beds in here. So this is already fairly complex in terms of the environment. Let's drop down and look down at the entities but the attributes because the attributes are characteristics of things. So if I'm characterizing the parts to a bed as ID description status sex to be assigned and reserve reason season that'd be interesting. The decisions about how to manage this information have direct consequences in this case. So in the above description attributes ID sex to be assigned this organization can assign beds to only females and they can't do that if they haven't designed that into their architecture in the first place. It's not the architecture. Of course it doesn't by definition appear in the models or do the opposite. It's not the models. It's not part of the architecture. These characteristics can be shared so the bed may have a status. Many beds can be assigned to females. The characteristics can require it to be unique. It wouldn't do very well to have all the beds in this hospital labeled bed number one. You wouldn't be able to tell them apart. So you probably know when you walk in large buildings the elevators the stairs the doors are numbered so that people can find a unique description for them. Description is unlikely to be the same for each bed. That's probably a clear text field or somebody wants to put in there and say this is a bed for a child and this is a bed for an adult and this is a bed for somebody who's incontinent perhaps or something along those lines. Now again keep going here the bed ID we can use to keep track of that specific bed and the problem again as I mentioned a few minutes back is that most of the way we've taught we define it that something you sleep in OK well that's nice but it's just not helpful. Let me show you how the purpose statement in this case came very useful. This is an example from the Department of Defense back when I was working with them late early 90s. Why is the organization maintaining information about this business concept. The purpose statement. Purpose it's a substructure within a room the substructure of the facility location it contains information about beds within rooms. Each of these words or many of these words are controlled vocabularies that appear in the glossary so we can go back and look up official definitions in there. The purpose statement here again also incorporates the source we got this out of a specific manual. They're the attributes the list of the characteristics the entity and the association. So again this case the association is a zero one or more can be associated with the room. You can have a room that has no beds. You can have a room that has one bed. You can have a room that has many beds but probably most people don't ask the question how many beds can you fit in that room. That's not capable of being determined from the way this data is being architected from an architectural perspective. It's important to think in terms of kind of three different ways. Oh I'm sorry I forgot one thing on here I almost blew it that room excuse me that the entity is not validated until it's validated. So you'll notice it's a draft before that and I hit the button and it's validated. I forget to say that a couple of times but it's important to go back and get the differences between those because it is critical again to do this because you added a layer of quality and additional approach to making your data models and architecture is that much more useful to your organizations. The ANSI Spark three layer schema is a good conceptual way to think about the types of data models that can be used to make up an architecture. One conceptual just allowing independent customized views. Two logical which hides the details of the physical storage away from the users but gives them the things in their natural form as we were talking a minute ago. And finally physical which is the way the database figures this should be stored in its most efficient fashion. This is where normalization can give you impressive results as you're looking at this. So again just keep this in mind physical logical and conceptual. Most models are at the physical or logical level. Conceptual is more rarely encountered in the world. Let's talk about going forward building how we teach people how to do things forward engineering. The idea here is we're building new stuff. In this case we're building a new database. We start off with requirements. My requirements always describe what is happening. That is separate from the skill of design which is actually missing from the last 30 years of education. Many curriculum have combined analysis and design into a single activity. It is not the easiest way to double check that is to simply check and see if you're doing design. Design by definitions meaning means you are comparing one alternative to another alternative. And if you do not have comparative feedback from one to the other then you are not doing design. You cannot have done design if you didn't compare two ways of implementing the what's that are in that first piece out there. We don't do that. We don't teach it. It's sad. All wine. The requirements are kept in a three-ring binder. The design, however, is the model. The architecture that we've been talking about as well on the physical implementation is the physical database itself. And there will that's a good component. Those are great ideas. It gives us the ability to do forward engineering. And yet when we go back to teach students how this works they get this thing called the SDLC. Again, what, how, and then of course the built part on all this. That happens only 20 percent of the time. We spend 80 percent of our IT dollars working on things that already exist. And that is an incredible number and we don't teach students anything about how to do this. Remember they get one course in how to build data. So is it surprising there's confusion as to what is the data model and data architecture. Okay, I'll stop babbling. Anyway, Fred Brooks. Fred Brooks is also a piece that's missing from the curriculum. But he does a great job, did a great job. I think he does a great job of describing what the mythical man month, which is to say that nine parallel efforts of one month each cannot produce a baby. I know that's sort of obvious, but people don't understand the way it goes in projects as well. He has two other quotes that were wonderful. Data representation is the essence of programming. He says show me your flow chart and can seal your tables and I'll continue to be mystified. But show me your tables and I won't need your flow chart. It will be obvious. And the reason is because the flow chart is the documentation. So we're going to talk for a minute about reverse engineering. In this case going backwards. We're still looking at existing systems in this case. There's no problem with having a focus on existing systems. But of course everybody always does all the documentation. When you get your system for the first time it has a perfect data map that shows you exactly what happened. No. Gee, I don't worry. You weren't getting worried. This is a rare occurrence out there in the world. There are lots of organizations that do a good job with documentation. But I find in software packages in particular and in organizations that have relatively not good understanding on the part of leadership of why it's important to understand your data architectures that there's a lot of issues around this. And so we end up doing reverse engineering which is taking the existing database and sometimes simply connecting it to a case tool and having it pop out with the physical as is model looks like as a design asset. That's a happy case. That can take a few seconds. It tends not to always work out that way. But gosh, it is worth a try in order to do that. If you're unable to connect up to the original schema and again there are ways you can read the catalogs of your Oracle set or whatever packages you're using the ERP models are also fairly well documented out there. Not really available but well documented for many years going around reverse engineering SAP and G.D. Edwards and PeopleSoft and all those great things. Well, they haven't changed much by the way it's really hard to change package once it's been fielded. But there are techniques that you can use to reverse engineer these data assets that you have and come up with the design asset that purple arrow that's on the diagram really, really key that you do that. It's often however important to go a little bit further backwards. Sorry, instead of going forward it's often important to perhaps go a little further and not look specifically at just the transition from physical as is to logical as is but in fact go back to requirements and understand that these requirements are very, very different in terms of what needs to happen. I'll give you a very specific example. If the database on the right hand side with a 10 megabyte drive and it contains the letters the start of all the customers that were in the letters A through D then another database next to it kept track of the things that were E through G and the ones after that came through H through whatever right, in other words you're partitioning this. The reason you're partitioning the data device because I already mentioned it's a 10 megabyte database. You might say wow who would ever work with a database that was only 10 megabytes large and the answer was in this case the IBM PC could only address a 10 megabyte hard drive at all so all databases were actually less than 10 megabytes large. There were data design problems that occurred around those issues as well. That's why we ended up with things like Y2K and all sorts of other fun things which just put in a little plug if you're going to be in Boston next month for EDW, David, Eddie and I are going to be doing a little talk on the brain drain around these issues. But think about this for a minute I have the databases of A through D and the ones for E through G you know blah, blah, blah bunch of them there. I take that design and I reverse engineer it to the design level and I say okay great they put all the A's and D's and D's together and the next one put the EF and G's together, right? Great, why? Well the answer was a technical constraint the actual requirement for that might require us to traverse that yellow arrow and go back and say what are we in fact using that data for? And the answer would be quickly and easily accessing customer data not just for us and internally but for our customers to access other aspects of this particular piece maybe it's a membership database that we're letting members gain access to. The key to this is to understand that most of the time this does not occur when people do system merges. They look strictly at the technical requirements of the package which are very, very limiting. So our good friend Elliot Chakofsky came up with this definition a while back. Reverse engineering is a structured technique aimed at recovering rigorous knowledge of the existing system to leverage enhancement efforts. And if you're going to re-engineer something which is typically the case in these exercises it is absolutely critical to take what is good about the existing database and bring it forward into the new design. That's the purpose of reverse engineering. But another purpose is to understand what didn't work well or maybe what worked well when it was created but will not work well in today's environment. Let's just say it's a so low mo component that's problematic in this social location mobility. Social location mobility. Yes. Those flavors in there are going to get conceptually hooked up there. Anyway, bad data design. We don't want to bring those pieces of it forward. And yet so many times I see organizations moving physical to physical or worse still you hear the organization say we'll just work with the data while they're paying no attention to what's good and bad about the data and making no attempt to separate one from the other. Take a look at an integrated fashion now how these two actually need to work together in virtually any instance of doing data architecture and data modeling. So here's the two parts of the diagram I was showing you before. At the top notice it's reverse engineering and on the bottom is forward engineering. So there's our reverse engineering again. First reverse engineer the existing system to understand its strength and its weaknesses. If you do not how are you going to avoid bringing the weaknesses forward into your design and how are you going to what price you're going to pay for not keeping the good things about that design and going forward with it. In everything there is some good and some bad particularly in technology designs and knowing that information is absolutely crucial. Now sometimes you're not changing the requirements so you don't need to do the yellow loop that I'm showing there but sometimes you go back and say I'm going to change the requirements significantly here and in which case you need to go all the way back through that reverse engineering process of the existing system to the as is requirements. You can't call yourself what you're doing re-engineering. If you don't use the information that you gain from the reverse engineering to inform the design of the thing you are now creating the new database. So that may be that we are going to go all the way back and use those three yellow arrows because I need to change the requirements or I may need to just redesign the actual implementation of the system such as re-hosting from a set of online spinning disks to a new cloud-based component. If you don't understand the good things about the existing system and take them forward and understand the bad things about the existing system and make sure they're not brought forward you're likely to make the same mistake. In fact, I think that's the definition of making the same mistake. Only when you have that new information incorporated into your design can you then move forward to re-implement the existing system in there. I wrote a whole book on that, believe it or not. You welcome to take a look at it out there, I think, on Google Books. And it's actually interesting just a quick side story but I was ordered to write that book by the Department of Defense. So let's look at a process for doing data modeling and that really starts out by identifying the existing entities that we have. Those entities are business things, business objects about which we create, read, update, or delete information. Seems like a pretty straightforward thing. Many people think that they're buckets or think of them as buckets and that's an okay way to think of them but what you want to make sure is that those buckets contain things that look as much like the factual information that you're trying to use for the organization as possible. The closer the correspondence between the two, the closer your logical model maps your physical world, not your physical implementation of your system but the physical world, the easier communication will be about this. So by identifying the entities and just simply saying we're going to collect things on beds and rooms and patients, okay? There's other things you can throw in there. That's a good start in terms of doing your modeling. The next step then is to identify a key for each entity that needs one. Now, this is a very, very simplified perspective but if you're going to try to identify the difference between myself and my brother, turns out our social security numbers are only one digit apart. So my digit, my actual, and mine for some reason is higher than his even though he's younger than I am. I have no idea why and we've never figured it out but George Mason, place where we both attended and have fond memories of, did manage to get us screwed up in their database such that he almost got my degree and I almost got his. I don't think either of us would have been happy at that outcome. Now, the point is, if you're going to identify things, you need to be able to tell one thing from another thing. If you can't tell that thing from another thing, how do you know where that thing is? I get a very specific example here. I was doing some work for one of the telcos on the West Coast and they were pretty interesting group, had a lot of fun with them but they had a lot of phone numbers that were out there in their system in multiple places. So again, I'll give you my mobile, 804-382-5957. If I've got three entries of that and I make a long distance overseas call to a foreign country, it helps to know which one of those I should apply the long distance charges to, hence the requirement for keys all the way around. It then makes sense to draw a rough draft of what that entity relationship data model should look like. These things are related to these things and these things are related to these things and these things are related to these things. Don't want everything to be related to everything else. That doesn't work. But in many cases, you can find out, guess and talk to people, the way they use, by the way if you're reading this from an exercise, you read the nouns and throw the nouns out of candidate entities and you read the verbs and use them as candidate process entities. Again, doesn't mean you're gonna be there but it gets you into that direction, into that mode of thinking, into that mindset. Then it's important to identify the attributes. So what are the attributes of a bed? We've already looked at several of them and assigned those attributes. Another great way to make sure this works for everybody is to make sure that the attributes will answer the business questions that people are asking. Remember, we identified three of them earlier on, what beds are in what room and does a bed, does a room have a bed in it and is that bed in that room? But we can't, from that business design, answer the question how many beds fit in that room because we didn't design it to be in there. So again, I'll go through this just very, very briefly. Step one, identify your entities all the way around, put names on them, look at the nouns that are being used by the people that you're working with, the subject matter experts. Identify a key for each of those so that you have the ability to identify one record as distinct from another record in within each of the data sets that are out there. Connect to the data sets, draw lines between them that gives you a rough idea of what that data model might look like. Again, list out all the data and the attributes that are there and then assign each of those attributes to the various pieces. If you're going to match on keys, then you have to make sure that those data sets share a common key. If you do not, there is no ability that you will ever have to be able to go back and re-identify these things. You may ask these questions, how could you design a system that would actually not be able to be used? I saw a purchasing system that was a statewide purchasing system for one of the groups that I was working with, and they had no ability to take things that were purchased on an employee charge card and match them back up with the employee because somebody hadn't done this kind of data modeling process. And consequently, the data architecture for this product was fundamentally flawed. We actually ended up using predictive modeling to try and we got more than half of them matched, but my goodness, 50% of them couldn't be attributed anywhere. I can't wait until the Inspector General gets ahold of that particular group. And how could you imagine designing a system where you weren't able to match purchases back up with the originator of the purchase? Just call me crazy, it's a real puzzle. Anyway, we got five steps here. So we've gone through this. It is entirely appropriate for the model to evolve in your thinking. So do not assume that the first way that you draw this is going to be the last way that your model will end up, because I can tell you for a fact, it should evolve. If it doesn't evolve, you have a problem probably. We actually use the model evolution, the velocity at which the model evolves to determine when it has reached the point of diminishing returns. And you can plot those out very, very well. It's kind of like predictive analytics, figuring out what a customer burns out on your service for you, it's very, very interesting things to do. Again, very much beyond the scope of this. But your model should evolve. So again, build it, identify the entities, get it up there, make the keys, show up, connect the dots so that your entities are related to each other. Don't forget the cardinalities I was telling you about. Get the attributes up, find the attributes off, and then expect your model to evolve. If it does not evolve, you're probably not doing something right. Now, as you're looking at the differences here, oh, you may add a piece onto that as well, as you're looking at how this works. When you're dealing with data modeling and data architecture, there's a couple of characteristics that are good to observe over time because just as your data model should evolve, if it doesn't, it's really the exception in there. You should also be taking a look at how you are using you and your team's time going through these modeling and architecture activities. At first, you should be doing more evidence, collection, and analysis. But once you have finished that section, oh, you can see you don't finish, but you should get to a point, a tipping point, where you're doing far less collection and much more modeling and analysis. Just a quick note on the terms for this final chart that we're doing here. Preliminary activities are really pre-project type of activities that are really good to go through before you decide to settle on a final budget for the project. So doing these preliminary activities can help you with the scope and the resource requirements that you're gonna have for each of these. Similarly, when you finish all of this, there's some wrap-up activities. This is to make sure that you have time built into the project program so that you're able to go and, in fact, do the documentation that if it had been done in the first place, you wouldn't have to do this stuff. That's a long way of getting back to that one. Well, let's also talk about project coordination requirements. You will need to have access to subject matter experts. And oftentimes, gaining access to them is the most difficult aspect of your challenge. If you don't have the ability to gain access to the subject matter experts, you are gonna be making guesses as to what's actually going on that will be not well-informed guesses. So you should look for these project coordination requirements to decline across the entire set of these. Similarly, also, what we call target system analysis, to focus on what the new system should start out to be pretty low. One of the things we tell people as they're getting better about doing data is postpone your technology decisions as long as you can. Almost is never the bad choice to postpone technology. When we see organizations that have bought technology first, it's kind of like them designing their brand new data warehouse. So let's just pretend on bar three there that I've shown across the row three, that you're doing a data warehouse. Well, you should be doing a lot more analysis upfront and not doing your target system analysis until well after you've started the project. You can see here, again, that it should just increase over time. It's not exactly a binary thing. There's no way to do it. But if you don't notice this shift of saying, look, we really better ask a lot of questions before we go out and try and solve the solution to the problem, you're probably gonna be spending your money in the wrong place. This is a test we use on IV and V type of activities to make sure that happens. And I mentioned also the validated model. All models are unvalidated until they are validated. They are in draft mode. And if you mark them as draft, management will be much more reluctant to use them because the project's not over. If my house plans were in a draft, I would certainly not be living in the house. It would not issue me a occupancy certificate for a draft house. So the modeling cycle should then be focusing initially on refinement of the existing model. That evolution that I was describing, but that evolution should decline over time and your modeling team's activities should shift towards a validation type activity. In this way, you will have better understanding of how your data model is playing this particular role in your overall data architecture. So let's do a quick summary and then we'll get to the Q&A section. Again, data maps are models. We need them in order to provide the link between data architecture and the physical components. There are a number of challenges to be had social, political and economic, mainly economic because people don't understand how data would help the tools that we've talked about in this webinar are causing problems for their organization. Most organizations are not skilled at architecture and engineering. They have to operate in tandem and they must operate for the case of data architecture for standard known qualities of data. Our view from the top, forward engineering, we're building new stuff or neglected science. We're building reverse engineering coming back from the other direction. Only by working together I'm striving to continually simplify the environment that we have and increase our knowledge of it. Can we in fact put data modeling to good use as a part of data architecture? I think I ran a minute over, but Shannon, that's the Q&A section, right? Well, let me mention upcoming things we've got too. Next month, the webinar is gonna be on reference to master data and then we're off to Enterprise Data World where I've got two talks that we'll be doing with a couple of different colleagues. How I learned to stop wearing and love my data warehouse and the data management brain drain. And in April, we'll come back to technologies around data management. Anyway, Shannon, back over to you. Peter, thank you so much for another fantastic presentation. I just love it and I love all the engagement in the chat and Q&A we've had already. If you do have questions for Peter, submit them in the bottom or anchor on your screen in the Q&A section and to answer the most commonly asked questions just a reminder, I will send a follow-up email by end of day Thursday for this webinar to all registrants with links to the slides and links to the recording of this session. Peter, diving in here. What are the steps to validating the architecture? Okay, so hopefully you got from the talk that validating the architecture is a case of validating some of the models. Again, I would never put a label on the architecture. It says it's 100% perfect. It can't be a living document by definition. But your steps for validating the architecture are to choose some data models that will be representative of the scope of where the architectural component that you're attempting to validate and then going about the process of doing IV&V work on that. Now, there is a good way of validating data models. I don't have time to get into it here, but let me just give you the equivalent of testing in software. So many people will ask the question, what is the purpose of testing software? In other words, can I hand it to some colleagues and say, could you guys try and break this? And they try and say, hey, it looks pretty good to us. No, the only way to properly test software is to pay people to find bugs. If you pay people to find bugs, guess what? They will find bugs. If you pay them to validate the architecture, they will find errors in your architecture that you can then use to improve it. Once people understand that their feedback is welcome and that what they're doing is helping the overall effort, it's kind of awkward at first. I can remember some of the first times I was showing data models to architectures, general officers and senior executives in the government and I'm showing them these things and they're going, well, that's not right and that's not right and that's not right. And I pretty quickly learned to say, thank you ma'am, thank you, yes sir. That's exactly right because it was. Everything they show you is an improvement and you're getting a gift of their time to work on your project and helping the thing get better. So yes, it's a welcome process, but it's a bit different. So a great question. Sorry I didn't mean to get too lost on that. Maybe Shannon, we can do one of these on validating data models, that might be fun. That's a great webinar topic. I love it. Yeah, and no worries. It's a great answer or a great question to get lost in. First slide 27, the mindset still prevails that data is something in service to an application that automates a specific business process or processes rather than an enterprise asset. Would you agree? Absolutely, and thanks for setting it up. I'm pretty sure it's one of my friends out there serving me a softball. Maybe they'll come back at us, but we do have to understand that the perception by teaching people that the only thing to teach IT workers who are the people who they look to IT and say, you guys know what you're doing, right? IT workers and the IT workers get this tiny little sense of the data is a thing that gets stored in a database and everything you do about data is about building a new database. Guess what they're gonna do? They're gonna go off and build new databases. Now, good thing is it's guaranteed employment for each and every one of us on this call. And that's a wonderful thing, but gosh, we could put our resources to much better use. It's an unnecessary, self-inflicted wound, just exactly the same as Y2K was an unnecessary, self-inflicted wound. By the way, not a hoax, that's part of the pitch for Dave in my session at EDW. Again, come by, it'll be a fun session. And Peter, I have to throw at you here the most important question of the day. You can tell by my tone, I'm setting this up. So why weren't you wearing the Dataversey T-shirt for your picture in front of the pyramid? Oh goodness, Dataversey T-shirt. I can't say the answer for that one. Love that that question came in, it's awesome. Although we are switching it up this year, we're not sure it's at the conference. I know, but we have some fun stuff, don't worry. So moving on here to two more important things, certainly, isn't a bed an entity unto itself when you're talking about... So great question, in other words, if you're looking at the context here, I don't want to get to the bed. So bed could be an entity, especially the way we've defined it here, in the sense that as a principal data entity, it's about a concept, and the concept of the bed, in this case, for this system, consists of description, status, sex to be assigned, and reservation reason. Now I didn't say this was a good description of it, this was one that came from one of the systems that I have, this is in the public domain, that I don't have to get any clearance to show the slides on. But bed could also be an attribute. I think that's probably what the questioner was getting at, is how do you go about the process of deciding these? Well, these are good aspects of data design. Absolutely, I don't think Shannon for Ports to Teach Anybody all about to be expert in data modeling in a one-hour webinar, but there's a lot of really, really good webinars around this that can help you guys take a look at these things. So a bed can be either an attribute or an entity depending on what data you are modeling and what the purpose of that data modeling is. Remember, the definition, in this case, for these attributes was that we need to understand about whether a room has a bed in it or not, whether a bed is in a room, but not the number of beds that can fit into the room. Nothing about our data design would help us to answer that question. Although, using the data, we might be able to ask the system, what is the average number of beds in each room, or maybe the maximum number of beds in any room, and that might give us information that might lead us to the answer, but it would not be the fact that would be stored about the capability of the size of each of the rooms. I think I gave enough on that one, Shannon. I may have messed it up. Hey, well, we can certainly get back to it if there's more that needs to be said or more inquiries into that specific topic, but no, I think it's a great answer. So Peter, on any thoughts on whether it's better to use more traditional table name ID format or the more modern ID format for primary key naming convention and why? The question is a good one. Notice on this particular description, I have a description of doing the attributes in this case with every letter capped. I was the standard for that organization, and people, when they saw that type of writing, now in more modern times, we would change the font, and every time you saw that font or maybe we would underline it to show that it was a term that was in the dictionary. The key is the scope of your development efforts are the ones in which you need to have that particular, we'll call that a meta standard, a standard about how to do data modeling. The answer is not that one form is good or bad, but the question is how far can you easily apply the standards in the organization? It reminds me again of the painful exercise that every PhD student has to go through, which is that for some reason or other, they require you to take your dissertation to the library where somebody picks up a ruler and measures the margins in your document. And by the way, if you don't match what the margins are supposed to be, you don't graduate with your PhD. Now, it doesn't seem like it ought to be one of the most important things, but by golly, it is a necessary but incomplete step there. So for organizations, the key is the better you can do to standardize the modeling around this be more likely you are to be able to benefit from the application of such standards. Yeah, I tell groups, your organizations have a ton of hidden information factories. Thank you, Tom Redmond, for pointing that term for us all. And those hidden information factories are groups of individuals that are compensating for bad data design, bad data architectures in your system. And with those bad data designs, it's costing them to put their own time and effort and more importantly, maintain knowledge about these things, something that they really didn't have any interest in and I probably never knew that they needed to know and they certainly did not all learn it in the same standard fashion. So consequently, you have mismatches in methods that go across the organizations. Again, great question and yes, let's keep the standards that you use to be the ones that work within the organization to the best ability, but let's not get to the point of measuring the margins on these things, which is a non-value head step. So what are the most useful set of artifacts, documenting components, for example, data models that you find you keep using across different projects and organizations? For example, in other words, what are common tools that are in your toolbox? I would say that's actually a different question, Shannon, than the first part of what you did. So let me try both of them. The key is what is the representation that we're using to describe these? And again, it is just phenomenal to me today, that we do not even teach kids that these things called case tools exist. I mean, they're not in the textbook, they're not getting any exposure to them. They walk into organizations such as mine at some point and say, oh my gosh, what is this? And we say, well, that's a case tool. And they say, well, why didn't they teach me about that in school? And the answer is because the vendor charged at least a penny. And if it costs a penny, most universities do not have the additional budget to go out and buy a case tool and they don't see the value of it anyway. So they don't even teach some kids this. Now, the question was here, what was the representation of each of these things? And the representation is that logical data model maps all the way to the left to give you the textual requirements and it maps all the way to the right to tell you what is the implementation, how is that organized around the flash drive that it's stored on, the old spinning disk that it's on, whatever it is that you put it on. This representation is exactly the same function as the blueprints for the building that we are all standing in at one point. Now again, I assume each of you are in a building and that nobody's standing outside, I may be wrong, but there is somewhere it has to exist or in this case, they can't get an occupancy permit that tells you within that wall is an electrical conduit and within that wall is a water conduit, within that wall is a structural element and don't cut the structural element or the roof will fall in on your head. These are extremely useful things and yet we don't seem to understand the transformation of them down to this level. Michelle, in the second half of the question, can you ask me that part again? I think that was more about a tools thing. Yeah, well, it really is, in other words, it was just reconditioning the question, what are the common tools or in your toolbox that you find apply to all of your customers? Yeah, so what we're seeing now is that there's a nice resurgent in the case tool market, primarily because of the lack of university around this. The case tool vendors have actually gotten together, I won't say as a group, but certainly they all collectively seem to have rekindled their energy around these things and that's a really welcome sign and they've gone out and incorporated things like social into the modeling and reverse engineering into the modeling tools. So the set of tools is very rich and the great thing is that there's a bunch of freebies out there. There's been a dedicated community for a number of these tools for many years. I am not recommending you buy, buy, build or whatever, but certainly to download and try for free is an absolute no-brainer around this and again, Shannon, as we move into the April one, I know we're gonna touch on data management technologies. We'll probably hit a little bit on that as well in April. Indeed, I love it. So, we get this question a bit, Peter. How would you estimate the time for a data model if that information is requested for a project budget? So how would I ask somebody for a data model? No, how would you estimate the time needed to create a data model within a project plan? Oh gosh, Shannon, believe it or not, I've got an academic paper on that and if anybody really wants the answer to it, I'll be happy to dust off this paper and send them out there to it. They won't like the answer. Let me just give you a bit on the topic. Now, it's an approach and it's, I think, a very good approach. Obviously, it was good enough to get published, but if somebody says to your organization, I need to take the data out of System X and move it into System Y and somebody says I can do that for $100. You should tell them to do it and make sure they're bonded and that if they don't make it, then they have to give you their first board or some other stiff penalty around that. Because anybody that says how long it's going to take them without knowing what shape the data's in is blowing smoke. It's just nonsensical. And so consequently, when you look at how organizations are attempting to estimate how long these things take, the wrong way to approach them is to say, okay, I want you to sit down in front of the computer and do five of these things. Again, there are good reasons to do it that way. My example that I love to relate is one that was in, I think it was Melissa Myers book, where she sat down and said they were drinking one night, she said, and somebody said, hey, how long would it take to digitize the entire world's books? And they said, well, we have no idea, let's just start. And they spent the next 12 hours putting books on copy machines and seeing how fast they could become and then extrapolated from there. And they said it was a very useful estimate, as in the beer had something to do with that perspective, but nevertheless, they did get some empiricism in there. Now, come back to your question of how long does this data modeling process take? Well, you're not going to find out by doing it once or twice. The best way to do it is to have your data modeling team keep track of some metastasks that say, we're doing these things. And as we're doing these things, it's taking some notional time. Because again, stopwatch is absolutely the wrong way to do it. But coming back after a year and looking and seeing how long it's taken things to do, turns out to be a very useful way of approaching those times. Most of the organizations that are doing that at the moment, again, remember, this is one in 10, so it's not a lot. We don't have a good sample size, but we are seeing organizations get better at this. The starting place for all of these though, believe it or not, and Chen, I don't know if you know this or not. There's something out there called the Dama Statistic. Turns out, Don Michael told me last time I was in Minneapolis that it was based on some very specific work. And basically what they said was it's an hour per attribute. So here's one way to think about it, add up the number of attributes in your data model and multiply them times an hour apiece, and that's your first estimate. However, Don also said that was not the reason that that number got quoted, the number got quoted for a completely different reason, but still it became, if you will, the Dama Statistic of one hour per attribute. But you can ask the question, is two hours too many, is one? Well, the point is, do it yourself. Control your own measures, take a look at it. If you're a data modeler who does this, you can do this yourself without telling anybody. Just go back after you finish a data model, how many days have I spent, how many attributes, how many relationships, how many, and you can divide it up at the end of the year and say, well, it looks like I spend at least this much time to do these on a yearly basis, and that's the way you come about with these estimates. Everything else is kind of, works well in theory, but not well in practice. It's a great question, thank you for asking. Indeed, and there's a ton of great questions coming in. We'll try and get to as many as possible. I'll talk back to what she said. Yeah, so, no, there's, so Peter, there's been efforts about scoring a model. So what are your thoughts about putting a score to how well a model is done? Well, the scoring that I have seen is been when organizations are attempting to determine the accuracy of their requirements. If we do understand anything about IT, it is that the requirements being incorrect can't result in the correct solution. And so consequently, putting more time into requirements is generally seen as a good idea, but generally doesn't happen. We jump in way too fast on the coding. I'm gonna go to this next to the last slide here where we were talking about technology in too early on this, but we jump in way too fast with the technology and that results in very, very poorly developed systems that are there. And it's just a universal rule. Whatever people are thinking about adding to technology, it's too soon. If every minute you spend not doing requirements is time that you're taking away from the things that most influence the rest of your project, that's just not a good idea. Shannon, you said you needed to run, keep running, so I'll just shut up there. No, it's good. I love the answers. We've got 12 minutes still, so. Although, again, we have lots of great questions coming in. How can data architecture be implemented in agile methodology? Our suggestions on how to capture and present it. Agile is the best method we have come up with for developing higher quality software faster. It works. No question about it. However, since it has worked, others are then trying to apply the word agile to other activities. Sometimes they are applicable, sometimes they are not. Here's the key. Your data must be a program. Agile is focused on projects. If you try to build a program by putting a bunch of disparate projects together, you will have an incoherent program that will be ineffective. If you put together a program first and then try to match that in with the agile projects, you will have a much better integration. The best integration occurs when you have a gateway for an agile sprint to only use data items that are already existing in the organizational metadata repository. If you do that, then your agile sprint is more successful. However, if you're in the middle of an agile sprint and all of a sudden you discover you have an error in your data, you need to pull the ripcord just like the bus stops thing, right? Stop bus right now and go on to another sprint. There's plenty of other sprints. It's never a problem to do that. But if you don't do that, you will then simply pour money down the drain. So, agile working in a data program context with a necessary but insufficient screen of only using pre-existing data models when you start an agile sprint will tell you whether it's worthwhile to invest in that agile sprint or not. It's a very, very nice system that works out very, very well but what I see is most organizations simply try to do the data as part of the agile sprint without the program aspect of it and that is simply not a workable solution. Well, it produces the results that we have and I don't think any of us are satisfied with the results that we have. Peter, lots of the content presented today seems geared towards people who are working within the parameters of Waterfall traditional project management approaches. Any thoughts to those of us working on data architecture modeling? Oh, we would just approach. It was just another way of asking about agile. Yeah, so if you want to add anything. Yeah. But anybody just got examples of how they've managed to do this if you will in spite of this, I think it'd be very interesting for a session that maybe next year's EDW or something but I think we need some examples because I am just not finding organizations that are doing 100% agile sprint and having the data modeling occur as part of the sprint are having success. In terms of resources, Peter, so what are your favorite books on data modeling and data architecture? First of all, if you're going to do data modeling I would absolutely start out with David Hayes data modeling patterns book that may have been a question submitted by Mr. Hay but he's well earned the right to do that. It's the first time I think we had codified the idea that these patterns can become useful in multiple contexts. So we owe David a debt of gratitude for that and also to Lynn Silverstone for taking up the next step further. David Marko did some wonderful meta modeling. I did a little bit of it as well. So again, authors David Hay, Lynn Silverstone, David Marko, probably myself in there at some point. And gosh, do we ever get your bookstore started Shannon? Not yet, we're working on it. You know, one thing at a time. Yeah, one thing at a time. So sooner or later, folks, you will get an announcement. Yeah, it's indeed. We're actually working on it, yeah. So anyway, nothing to announce yet, though I want to. Which breaks deliverables into two week bites? How can we do due diligence with this timeframe? If you are doing an agile sprint, two weeks is a great sprint time and the agile sprint should be gate weighed by saying the data that we're using has all been pre-vetted and we understand it is coming from known sources and of known qualities. Notice I'm not saying it's got the best data, but at least if it's of known qualities, then that agile sprint can work just perfectly. Can good data architecture work without data governance? It has for years, but only because we called governance data administration. It's clear that if you're managing your data without guidance, which is the absence of data governance, then you will have less good results than if you have data managed as guidance. Imagine if your organization allowed people to write a check for its financial resources or use the company charge card for any particular resource. That would be not good fiscal guidance and consequently, the organization would not function nearly as well. Data is exactly the same. If you have good data guidance, data managed with guidance, you will have a much higher rate of success of using data to support the organizational mission. If not, it'll be closer to then getting in the works and slowing things down. And Peter, there's another question about books again. You had referenced one of your books that you wrote and can you recommend any resources to better learn how to reverse engineer? Yeah, I'll tell you what, Shannon, we've got the ability to add a PDF to the end of this, don't we? Yes, we do. I'll send you guys a copy of the IBM Systems Journal article, which is according to synopsis of the whole thing. So that way you guys can take a look at that. I'm pretty sure that's out of copyright and I don't think IBM will sue us at this point. Perfect, we'll do. And then, could you talk about recursive relationships? In the context of architecture and modeling, I suppose, recursion is a technique that we've used for years. It's a well-defined technique, which means there's a good set of algorithms around it. As I was describing patterns a few minutes ago, recursion is one of the ways in which you can leverage patterns. Do that so a repeating set of an event occurring or something along those lines tends to occur in a healthcare record, for example. Each kidney transplant would have the same basic data structure or each like procedure kind of thing. But the recursion part of it is a particularly powerful leverage in the same way as a macro is a particularly powerful part of Excel that most people don't understand how to use either. I'll leave it at that for this particular one. We'll probably lose everybody if I go too further, too much further. And Peter, what about semantic models and how do they apply? Well, that's the one area we're seeing some very nice advances in. I don't mean to poo poo big data and big data science and all the rest of these things, but they have achieved the same levels of success as IT projects in general, which is to say about 30%. So again, they're being treated as IT projects. It's not surprising that they would only achieve the same success on that semantics. However, our showing promise in terms of how to work on that, I think Dave McComb just finished up a quick event on that and he's done a lot of work around those areas. There's a couple of us working in that space. There needs to be more done and we need more volunteers, quite frankly. If you have data sets that you think might benefit from semantics, reach out to us and we can tell you what's going on in those areas. The hard part about all this, again, is it's stuff that works well in theory. We need to actually try it in the real world to make practice. Semantics is a very exciting area and certainly one that I'm putting effort into. Again, so many great questions coming in. I'll try and get some of these over to you, Peter, that we don't have a chance to get to and see if we can take a look at those. I just, it's been such a great topic. How can we stress the importance of data modeling in an independent product oriented culture and environment? As in people won't think the data model is independent and valid? I'm not quite sure where the question's coming from. I think, we've heard a lot of similar questions before in lots of our webinars. How do you stress the importance of a data modeling? How do you convince people that we need a data model? Well, certainly in that context, just imagine, again, almost first people have to understand that almost nothing that we do, one in 10 things, sorry, one in 20 things, I'm saying 20% of what we do in IT is building new stuff. And so the opportunity to use the things that they've learned in school are rare. And consequently, the things that we need to learn how to get good at reverse engineering, re-engineering these various data models and data sets that have to occur is absolutely akin to everybody showing up to your house with a bunch of nails and hammers and things and no plan. The data model is the plan, the data architecture is the plan for how you're going to transform what the organization currently has into something that better supports its mission. And if your organization isn't evolving in today's environment, you'll polish off your resume and go find another one because it won't last. Hey, maybe that'd be great to finish on, right? Yeah. So does it make sense to modularize models and deliver components over time in agile environments? Of course it does, but that assumes that there's a greater plan. And that's really the problem with all this is that I've seen lots of groups that can come up with greater plans and then somebody else says, we'll know the schedule's going a different direction so we won't be able to use your stuff now. Yeah, absolutely. Okay, I think we have time for one more question here. So what is the most common metadata to be captured for a data model? When do we get to metadata, Shannon? Okay, so metadata is data, about the data. I'm sorry, when is it? Yes, it's soon. Probably the summer sometimes since we already know what April's going to be. Okay, so the entity name is probably the most consistent thing that people get to, but I think the question is asking it from a different perspective. Organizations not only have to have the models and the models can exist in multiple places because they can exist in electronic fashion. That can be very, very useful, but think about it if we go from just an automatic, an electronic form of the documentation to an automated form of the documentation, which allows the organization to actually use data as metadata in a way that automates their existing processes. Any type of a SOA environment or services-oriented type of environment can use all of this, but this requires a degree of sophistication that 90% of the organizations on the planet don't have. So if you have these problems, these are great problems to have, and again, we encourage you to come forward with your stories and talk about it so that we can inspire other people It's fun to talk about this stuff exist, but it's so much more powerful to actually show it. Maybe, again, Shannon, maybe an examples webinar or something like that might be fun at some point. But we are at 3.30, and I know you have another one to come up to to get ready. That's true. Peter, thank you so much again for another great presentation. And I love all these questions that have come in and continue to come in. I'll look at those over to Peter. And thanks to everybody for being so engaged in everything we do. As Peter's showing here, we've got March webinar coming up, reference and master data management enterprise data world. We'll see you there. Look forward to that. And then the April webinar as well. Peter, again, thank you so much. Thank you, Shannon. Thank everybody for participating. This is how we make the community stronger. Indeed. Thanks all. Have a good day.