 Well, and welcome, my name is Shannon Kemp and I'm the executive editor for Data Diversity. We'd like to thank you for joining today's Data Diversity webinar, Data Modeling Fundamentals. The latest installment in the monthly series called Data Ed Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right corner for that feature. For questions, we'll be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag data ed. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. Yes, we are recording and will likewise send a link to the recording after this session, as well as any additional information requested throughout the webinar. Now let me introduce our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him and have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and eight books, most of which he can find in the Dataversity Bookstore. The most recent is Monetizing Data Management. Peter is experienced with more than 500 data management practices in 20 countries and consistently named as the top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense. George Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. He often appears at conferences and is constantly traveling. Peter, where are you in the world today? First, I've actually been invited to the White House to address the President's Council on Science and Technology. Not a terribly long trip, but kind of an important and fun one. That's exciting. It is. Today, Shannon, we have also actually Data Blueprint's longest serving employee, John Sells. John's been with us well over 10 years. He's come to us with a Master's in Computer Science from Virginia Commonwealth University. He never says this, but he won the Microsoft Programming Contest one year too. I did it about 10 years ago. He did. Still, it's a very high achievement. That's the international. No, it was the national one you won and you went to the finals in Japan. I did it. Yeah, exactly on that. John, like all of our analysts are certified data management professionals and he's got a lot of experience working with large global clients around the world. And is able to talk about the process of using modeling in particular to help clients understand, first of all, why they need to pay attention to data and then why they need to put that understanding into a formalized construct, which we call data modeling here. Again, his expertise focuses in on pulling technical requirements and business requirements out and helping to achieve a proper balance of them. He doesn't say, well, I don't hear about architecture too, but that's another thing that you're really skilled at as well. He's helped companies improve their logistics flow, develop quality programs, implement meaningful governance programs, design and implement warehouses and specialization. And another certification that you have, John, is a certified data vault method as well. That's right. So I was up at the data vault conference just a couple of weeks back. A lot of things going on in that environment. We will hit them just briefly at the end of the presentation today. Again, our topic today is modeling fundamentals. And those of you that have joined us before know, we want to start out with an overview which always talks about how does this fit into the data manager context. And we'll dive into the real material, which is data modeling and what is it. I'm going to call out a particular, very short section that differentiates a very key piece of understanding that I got from one of my mentors, Clive Finkelstein, the power of the purpose statement. And then we'll show you how to use modeling to talk, to address, really, to contribute to organizational challenges gives beyond traditional data modeling, guiding, in this case, analysis in general problem analysis using data analysis. And then using data modeling in conjunction with architecture and engineering techniques. And finally, talking about how it supports business strategy. As always, we look forward to the last half hour of this where we address your questions. And you guys oftentimes will point out things. I know Dave Eddy is out there with a couple of real tough questions that he's working on. So Dave, we're looking forward to everything you throw at us as well as anything anybody else does as well. So let's dive in. Data modeling, overview, and data management in the context here is what is data management. We've been struggling with this concept for years. We have definitions like understanding the current and future data needs or everything that happens between the sources and uses of data in its lifecycle. But we really do need to formalize a little better than that and say that it focuses on engineering storage and delivery activities. Those need to be governed because it requires governance and specialized skills. We can't just hire anybody to do this. We actually need to have people that have a good background in here in order to understand this. Even this model, however, does not well depict the real important concept of data, which is that it is a reusable asset. And so we're still working on this. And the reason I keep putting these charts up and coming to us at some point, which happens regularly, is that I've got a better idea for this. So this was one improvement we made, which was to show that the resources for reuse really as the center of what data management is. And it's all about not just the sources and uses, but really delivery of this, incorporating as well a feedback loop that we call the analytical insight. It's how organizations gain insight from that overall process. Then, of course, we want to move on and talk about the ability to manage our data coherently, to govern the assets professionally, to maintain fit for purpose with the appropriate architecture and lifecycle consideration supported by organizational practices around that area. This is a relatively new development. We have a whole webinar devoted to this coming up later on in the year. And then sort of finish off the general piece of it by saying that this relates a lot to Maslow. Most people sort of take a pause here if they're hearing it for the first time. But much of what passes for data management discussion today is focused on technologies in this golden triangle. The idea that master data management, data mining, big data analytics, warehousing, service-oriented architectures, all kinds of things. I've been using these diagrams for 30 years. The only thing that changes are the buzzwords that go into that triangle. But really, collectively, we all know on the call here that these are just the tip of the iceberg and that we really need to address them at a foundational level and that not understanding, you know, the way icebergs are structured is a danger to ships and not understanding the way data is structured is a danger to organizations and that you really do need to have good, solid foundational practices in place in order to get the rest of these things to work better. So if you're going after any of the things up there from a technology approach, you need to develop organizational capabilities in order to come up with good, solid practice areas. And the illustration that I'm showing here also shows one other aspect of the foundation, that the foundation is only as strong as the weakest link in that foundation. In this instance, I'm showing the link at Platform and Architecture, indicating to this particular organization that they could put more money into governance and quality and it wouldn't do them any good until they correct the deficiencies in their data platform and architecture. One question that comes to us all the time is great. I hear you guys tell us this all the time but I still need it done by Friday. Can we do it? And the answer is yes. You can do the top part without doing the bottom part but it will take you longer, costs more, deliver less and present greater risk to the organization than instead of crawling, walking and running your way to the top. All right, Peter. Well, that was an excellent overview of the data management landscape. And while we're here to talk about data modeling and the data modeling fundamentals, it's important to understand where data modeling fits in the greater context of data management practices. So what we have here is the data management body of knowledge in a visual format which shows the nine key areas, or I should say 10, with the central component of data governance of data management functions. And we're going to be talking here in this highlighted slice about data development and data modeling falls inside of that particular piece but I just want to point out that there are a lot of interconnections here on this particular model and while we may stay in the silo, data governance provides a lot of, I guess, direction and compliance and oversight and process following which is very important but there are also lots of other components such as data warehousing and data quality and data security, all of which have huge impacts as a result of activities that occur in data modeling or decisions that are made early in the process of doing data modeling as it feeds into data architecture. I'd also say that within data architecture and metadata management there are a lot of artifacts that come out of this data modeling exercise that we'll be talking about that feed directly into those data management functions. So hopefully that gives a little bit of context about what we're talking about and how it relates to a larger data management program. So if we dive into the details of the data development cycle and look at data modeling, we have to think about, well, what are the inputs, the processes and the outputs of this particular activity? And this slide gets very busy and it can be very overwhelming, say, even to some that are new to this process, maybe intimidating to think about all the activities that need to be accomplished but I just want to point out some key highlights here. And the first one is the first goal to identify and define data requirements. That should be paramount, I'd say, in this activity is to be able to collect those requirements from the business and understand what is really trying to be achieved and how those particular requirements can be modeled and structured going forward. On the left side when we look at inputs, so how do you get from or how do you collect these requirements and start to define them? Starting with business goals and strategies is critically important and I think as well looking at data standards or existing architecture that exists within the organization can provide a framework or a jumping off point for the process. So those two are huge inputs. One is to be involved in the conversation to have successful data modeling endeavor and I would say all of these groups on the left under suppliers and participants such as data stewards and the subject matter experts as well as representatives from IT and I think really the art form in being a successful data modeler is being able to translate some of the requirements from the business into technical requirements to the IT and development staff. I think that's where, I think I've found some of my greatest challenges but also biggest opportunities and biggest wins in my career in being able to perform that translation or being able to synthesize various conversations and interviews from the business into the technical requirements. Soon as we move on to deliverables we'll talk about later in the presentation about documenting conceptual logical and statistical data models as well as metadata and information products that come out of this particular result. So anyway, we're trying to boil us down and make this less, I guess, the least amount of intimidation as possible so while there's a lot on the screen here I think what you'll see is the synthesizes into a much more achievable and approachable activity and will provide a lot of value for your business. So we've given an overview about data management I think we're going to transition into the section of well why do we do data modeling and what exactly is it? So this is a very interesting depiction of an entity relationship diagram or an ERD as some of you may have heard and this can be, when you look at one of these especially pulled from an existing system maybe one that's been in operation for 10, 20 or 30 years it can be massive and overwhelming and you might need a very large printer or lots of pieces of tape to put it together in physical form but it doesn't have to start at this level you can pull this up to a much higher level and we'll talk about how that works. So why modeling? Well I think it's important to think about translate this to another category is would you build a house without an architectural sketch meaning would you have would you start construction on a project without knowing where you're going or would you start a project without knowing how much it might cost or would you attempt to communicate with the contractors on a particular project without having a common language or would you have a set of basically requirements that would be checked off to know if you actually constructed what you intended to build in the first place or if you built something successful would you like to be able to reuse that particular approach to save time and engineering costs in the future and would you ever want to dig in deep into possibly some pre-existing walls without understanding the underlying features and things that would have been gathered before and I think the answer to those is no and so there's a lot of reason to do planning and construction and modeling before engineering actually happens both in the physical world but especially in the data world and as you can see there are lots of reasons why modeling is important. So we use models to do various things. We use them to store and formalize information into an expressible format. We also use them in a way to filter out extraneous details so we get down to the actual pieces of granularity that really matter when it comes to expressing business problems. We also can use constructs within the models to define an essential set of information and understand complex system behavior so I think the concept of functional decomposition or being able to logically separate out components of something that may be a very complex process is helpful in data modeling where you can drill down to a very small set of things that are involved with a business or even down to a very specific description of a thing that happens within a business. There's the opportunity to gain information just throughout the process of developing and interacting with the model and I think this is huge. I think it's not really necessarily the end state of developing the model and what you get in the end is the actual process that gets you to that model is vastly helpful to organizations that are either in a state of flux or have maybe just been doing the same thing for 20 years without necessarily an emphasis to change and those can be very powerful conversations. Finally, models can be used to evaluate various scenarios or other outcomes indicated by the model and monitor and predict system responses to changing environmental conditions so really to have some insight as to possibly a change to the business process or maybe a change in technology and how that might impact data flows and processes that are linked to that data throughout the organization. So when it comes to data modeling for business value I think what we need to understand is why do we do data modeling and how can it generate business value and essentially over the course of the process it really, I just want to say that, I'm sorry. Back to you. Yeah, so many animations. The first goal is to have shared IT and business understanding and data modeling is very useful for providing that common platform for communication and because so many pieces of modern technology are dependent upon the functionality of data sharing and the automation of that it's really important to focus on the foundation of these systems looking at this from a data-centric point of view and modeling provides an absolutely fantastic opportunity to do that. We know that modeling characteristics will change over the course of the analysis and maintaining these models over time is really important to providing context to maybe prior business problems or some future state to be defined but it also provides motivation and purpose statements with modeling so in addition to developing the model understanding what the purpose of the data that's being incorporated into that model plans to be used or could be used in the future can help make some modeling decisions but also maybe some necessitated process changes too. It's also really important not to really focus too much on the details of how to do the modeling but just the fact that you're actually doing the model and having that communication and that dialogue is extremely important. Models are living documents and they adapt to change hopefully if you have a team that's committed to performing those changes over time and making sure that they're actually useful and communicate the value of the modeling exercise is important so don't be afraid to add color and diagramming components to segment content because that communication will be helpful in both current teams and those in the future. It also from a data modeling perspective ensures interoperability so we know that if we're defining the application let's say A, B, and C today but we don't know about D, E, and F that might come from let's say a corporate acquisition or a technology purchase but who knows what else, maybe even a corporate split. If we do proper modeling today we can guarantee interoperability in the future because we'll basically have extensibility baked into the model, we'll be able to extend what we've developed so far and communicate existing current state and evaluate that state against potential future technical requirements. So the big business value about doing data modeling today is that as these applications get integrated and these architectures change really the purpose of the point of integration comes down to the lowest granular level which is at the data level which means that you cannot be successful in doing in future technology projects and involve data integration or extension or analytics or any of the higher level orders of data management practices without having a strong constraint on the foundation of your data modeling. And the output of data modeling exercises there's also reference material that gets generated. So here's a diagram that talks about both from a business level about what accounts and subscribers and bills and charges might be within the context of a business function and there are descriptions that provide some business metadata that should be maintained over time that can be referenced to so if someone ever asks what is the subscriber how does the business define a subscriber that can be answered but this material also indicates that it also implied business rules about a subscriber's relationship to a charge and or to a bill and if those things need to be changed you can use this reference material as a going forward point. Your point there about reference material is really good. We've been on a number of engagements where people will take the models that you produce for them and we'll go see them in a different part of the organization up on the wall and the bosses won't necessarily understand them in this sort of thing that they're really using these as this is the definitive vocabulary that we're going to use in order to do this. It's a bit of a breath there, isn't it? Pushing all the way through all that stuff so let's go to a definition here then so data modeling is an analysis and design method used to define and analyze data requirements and to design the data structures that are required to support these requirements. These are critically important because anything else that happens in a system uses those data structures so the interdependencies that John was describing earlier on on these are absolutely instantiated all throughout the rest of the organizations here and the model then is that set of data specifications that represent something in the environment that it employs some context of standardized systems and text in order to do this and represents this integration on these specifications on the right-hand side of the logo for data blueprint because of course that's what we make our living on. So again, the data modeling is used to articulate data architecture components. The architecture are described as models. There are a number of different modeling styles that we do. It is not important which one you pick unless it has an impact on your project. However, it is important that you are both speaking language and so again just make sure that you have agreement in that context over all. The models by themselves are useful in a stand-alone mode but they're also useful as components of the larger information integration here and the reason I show the Eiffel Tower bit on this as well is because they did the same thing when they were building the world's largest radio tower in Paris. They grabbed these components. They understood what the overall architecture was like and then they hauled these things up on the floors to make them work and when they put them up on the top part, it worked. So very, very key piece of all of this. Now I mentioned, I've been very fortunate in my career to have a number of tremendously gifted mentors and one of them was Clive Finkelstein who occasionally listens in on these things. Hi Clive, if you're listening to this, one of the things that he taught me was that most organizations when they teach data modeling they teach you to define things. This is a case tool that will say bed and then you look at it and get a definition of something that you sleep in it. Well, that is good but we can do better. This is one of the things that Clive taught me. The power of the purpose statement instead of calling it a definition but calling it actually a purpose statement it gets you to something a little bit more contextual. So what you see here is the entity bed and this is actually from a model job. I'm just going to show you in just a minute the veterans administration system. These are systems that are in existence today in the VA system, the federal VA system that we worked on many, many years ago. The entity is a bed but notice the purpose statement different from a definition. Bed is a substructure within a room substructure with the facility location. It contains information about beds within rooms. I'll just tell you a quick little story on this because they were looking at in those days lost patients occasionally too and they said we will never lose a patient again because we're going to put an RFID tag on all the beds and the bed will tell you which room it's in. Well, one of the questions that you have to ask is then so what is your definition of room? Now if I have an interactive case tool here I would click on room which you can see is underlined and we would go to that definition but some of you are seeing this already. If we're going to use the word room as something that has a bed in it then a hallway also has to be a room and an elevator also has to be a room and if we pointed that out to them with this purpose statement they rethought a little bit about their plans for how that was going to work. We also want to make sure the purpose statement includes sources of information so where did it come from, where did this information come from. Notice we have a partial list of attributes in this case, the gender, the bed, the reservation, reason, et cetera, et cetera and associations. A room can contain zero, one, or many beds. So that's an interesting way to describe it. That would mean the stairway would also be a room although we probably don't want the stairway to contain beds. So again, it helps us to refine these requirements as we go through the process of doing this. Finally, one more piece on all this which is whether or not the model component has been validated or not. It's important to understand that an unvalidated model is something that is more akin to a hypothesis than it is necessarily to an actual fact and so you'd like all of your models to be validated at that point. So we'll move on to a data map and really this could be considered somewhat of a conceptual data model at a very high level. We talk about, as Peter was mentioning, the Veterans Administration model and this is just a small subset of that but this piece of information just with these seven blocks and lines that are interconnected can be very powerful in a way. We're going to look specifically at this admission and discharge relationship here and some of you may be familiar with the notation that's on the screen. Some of you may not but this essentially translates into a few statements that's here and one that we have is that an admission is associated with one and only one discharge. Notice how it doesn't say zero or one discharge which means we're basically enforcing a requirement that every admission has a discharge associated with it. And I think an interesting point about that is it's saying that, and I guess some of it are unfortunate things that sometimes there are admissions in the hospital that don't result in a discharge of a live patient. So something was brought up about a business rule saying that death must be considered as a discharge code and that seems somewhat controversial at first throughout the discussion of the model but it was determined that because the way the data was structured here it made sense to incorporate that into the approach. And so just speaking in terms of relationships of things like admissions and discharges and how they should be interrelated brought about a need for consideration of business rules within the organization. And you can see nothing we've talked about in that example actually had anything to do with technology. It was really all about what was the business context and it was a little difficult to have some of those conversations in here. So let's talk about models, how they work in general. And again what we're looking for is it's an idea of trying to get a better sense of understanding than just sort of a vague concept. So models can be equations, they can be simulation, they can be physical models, they can be mental models even in your head. In fact, our colleague Elaine Gottseider did a great job of talking about what is a model and you can see there's lots and lots of different things that you can use to do this. When we talk about data modeling we're talking about a very specific set but it's still, it's visible to everybody. It's a structure for organizing things. It gives us a framework for making decisions and understanding the types of problem solving that needs to occur. It's fairly easy to review and validate. It typically has some combination of text and graphics, maybe even a prototype flavor to it. This is the first version of it that we're doing in order to try and come up with something around this. Now one of the things that happens though is that if you tell people that you're inviting them to a data modeling session, oftentimes they will run for the door. Whether you have cookies and things like that and John I know you've encountered a number of situations like that as well. So don't tell them that you're modeling. Just start by writing some stuff down and then arrange it and then make some appropriate connections between your objects. And if they understand that those are three basic steps that occur in there and you're just trying to make sense of some things that they already know, it's extremely helpful. Now something else that we've discovered over the years and it depends from organization to organization but the people that you're talking with are usually expert in what they do. And so it's absolutely appropriate to call them subject matter experts. And when they realize that you're trying to get information from them, sometimes the reaction is, oh, I don't want to tell you that stuff because that's kind of my intellectual capital that I contribute into the organization. And other times they're very flattered that somebody's interested enough to write some of their stuff down, arrange it a little bit, make some connections between the objects and feed it back to them and say, hey, here is what we are attempting to do. So I tell people you've got to keep them locked essentially in a room because we're trying to figure something out and so it helps to have a mission statement. Every model exercise should also be guided by some sort of a purpose of the model. You don't model just for the sake of modeling. So one of the times we might try to come up with a vague idea of what it is you guys do. Another might be that we want to do a little bit more specifics, and a third might be that we're at the last opportunity we have to make changes before we start to make this thing in production. So we really do want to keep them focused on that purpose statement and to make them utilize the use of the information in here for just to come up with a vague model then there's a type of modeling that helps that in order to come up with it. So let's look at problem analysis using data analysis and the question here might be are we going to build an interface, a piece of software for a soda machine? Now, this is an exercise that typically a computer science student might be given or an information system student might be given. And one of the things that we want to look at is to say what are the nouns that are involved in this analysis? And our requirements sheet here might show us that there are customers, there are sodas, there are coins, and there are machines. So we've written them down. Now we want to group them around things and say hey, how do we do this? Well, a customer selects a soda, a soda is given to a customer, a customer deposits coins. Coins can be returned to the customer, of course I guess with Apple Pay now we don't have coins anymore, right? And the soda machine dispenses the machine, the soda, we can get into all sorts of aspects of this. The four major things we have here on the screen are the entities. The entity definition is a thing about which we create or maintain information. It's very similar to objects and in many cases they are interchangeable. In some cases they're not and that's where the expertise of John comes in can help you out in terms of looking at these. So that might give you a high level idea of what the entity relationship view for the soda machine might need to look at. However, it can get more detailed, right, John? Absolutely. And here's another example of modeling and supportive requirements and I think that's really what I spoke to about earlier is that you're seeking an alignment somehow between what the business is expressing as a set of requirements and what you can define in a model and ultimately translate into a technical implementation. So here we have to identify some various entities such as a person or a job class or a position or an employee and some relationships between them. But what we see here is that a person may be related to more than one employee so what does that really mean to the business? Well, there's the concept of potentially moonlighting so a person can hold multiple roles as an employee basically they can hold multiple positions. One way to look at this is they could hold multiple positions over time or the other way to look at it is they could hold multiple positions concurrently and have different jobs. Now Peter, I think you said even there, there was a case where culturally or regionally there was a case where people weren't necessarily accustomed to this. Yeah, I was invited to Norway to speak to the Norwegian Computing Society a couple weeks back and we talked about this concept of moonlighting and it just didn't work in a socialized society. Everybody had exactly one job which I think would be really nice. It just points out the differences in the society. Terrific session by the way that we had around this but they really did not get that concept. It was a very foreign concept to them. Right. And somewhat similar to that there's the concept of having job sharing, maybe having a single position within an organization being fulfilled by multiple people. And so having those conversations about how does the business truly represent the relationships between these things that are captured, how should they be constructed in the data model? And some of the decisions that are made in this modeling exercise, one kind of is a documentation of a business process but it also can be setting in stone, or I'll say setting in stone at least, an initial architecture for establishing business rules and how data should be captured and how data should be processed in an organization. And having that documented in the data model can be very powerful. Now I will say as a background in computer science and software development, you can't necessarily put 100% of the business rules into the data model, and it really is somewhat unreasonable to expect that. But there are some core pieces of business rules that can and should be implemented in the model to help ensure data integrity and data quality throughout the life cycle of data. But I'd say even more over than that, just having the data model that you can use to communicate with other developers and other data stakeholders to describe the expectations of how that data exists and how it's expected to behave is very, very important and very useful. And you said setting in stone, John, and that actually becomes problematic because once you write code around these data structures, if you change the data structure, the code has to be modified. And if the documentation hasn't been done well, you end up discovering them rather than understanding where they are. But take the business rule that John's showing on the screen here. If you have a person that can show up as multiple employees simultaneously, that's the rule that you're implementing there, and your software doesn't support that, then you're going to have to do something extra. And that functionality, instead of being implemented correctly in the database, is implemented more complexly in code. So this is a way of helping systems with the appropriate balance. You want databases to do their appropriate word, their appropriate workload, and have the code do its appropriate workload. If we have the code do too much workload and the database is just simply handling data in and out on a table handling structure, you're really over-relying on the code, and that makes your code much more difficult to maintain. So let's take a look now at data modeling in context. Again, it's a complex process where you need to speak with people. Now, we like to set it up so that our analysts go in and do some homework, and then come in and ask people to show them where they are or are not correct. Again, the best thing that somebody can tell us when we show them a data model is, that's not right because we know we've actually hit something there that they are caring about in terms of looking at this. And these models, again, there isn't art to it, as John said. It's just an absolutely non-trivial bit to simply say I can throw some lines and diagrams on a page and get it to work correctly. So the formula that we use in this context is to say, look, what's the audience of this? We very rarely show a detailed data model like the one we started out the presentation with to a set of managers. There's not something that they would understand. However, the four entities that we showed in terms of describing the structure around the four insurance concepts that we had a few slides back is something that many insurance companies actually pick up. So these are the definitions of account and subscriber and bill and charges that will show up on that. This is a very, very common way of adopting these pieces in order to do this. So what is the audience? Many of the case tools that you use will have the ability to show or hide complexity around them. But believe me, the complexity is always there underneath. And then the other part of it is how much time are you given. And John, of course, you're never given enough time on your projects to do all the work that you'd like to get. So you do the best you can given the amount of time that you have. And that does constrain your approach in some context. So we've talked about formalizing these pieces, which is the idea, again, it's going to be a single precise definition. The overall Uber model, if you will, pulls it all together and bridges the understanding between people so that they understand a business area or a concept or an application or a capability of the organization. These are also really useful in training new staff members as well. And we found that people will actually come back to us after we're done and ask us for these models so they can use them in training materials and things like that. Finally, the model also tells us what's in scope and what's out of scope. If it's not on the model, then we have no real way of understanding it. So as I said before, we have some concepts around this that we'd like to talk about different types of models. It's guided by something that we technically call the ANSI spark three-layer schema. It really says that there are three layers of abstraction that we'd like to have within a data modeling context. We have the conceptual which allows users to look at the view but they can see it from a business perspective, a logical perspective, which hides the details of the physical storage but actually gets us down to the how are we going to build it. And finally, the physical build, which usually is actually code that is generated in a database somewhere. So this is about as close as we've gotten with a few exceptions to code generation in this process. Now, many people look at this ANSI spark model and they kind of go, all right, I don't quite get this, but let me just give a couple of examples. First of all, if I'm going to change the database, if I switch from Oracle to IBM, for example, as the database, the user really shouldn't be exposed to that change. They shouldn't have to go through retraining. And so the architecture of the data world allows us to do this kind of thing if we follow this three-layer schema type concept. On the other hand, it's a little bit hard to get this from the first bit, so I added some pieces in here about bridge building. And the idea here is that this was a bridge that they were putting over the Millau Valley in France where they were saving the old routes. Over the River Tarn, the old route is on orange on the right-hand side of there. You can see it in N9. And they were going to make a new road that was going to save a lot of time. Now, the first thing you ask is, why is the government spending money on this? Well, they presented this case at the same level we would talk about at an entity level, which was to provide focus and to get people to buy in and say, yes, I'd much rather drive on a fast multi-lane highway than I would over a two-lane road that takes me forever to get up and over the mountains in order to do this. You're seeing a picture of the bridge there in the lower left-hand side. We sometimes throw them away. We rarely maintain these conceptual models. They're mainly for, as I said before, scoping and balancing out the equation. So we moved to logical models then and said the logical models take us from this conceptual piece down into a how. So the conceptual is what, these are how. And this is how are we going to build this particular bridge. First of all, the tallest pier in the bridge is taller than the Eiffel Tower that I showed you before. So people immediately ask questions and say, oh, my goodness, how is that going to occur? Plus, you'll notice on the right-hand side the carriage cross section there. They're going to put a dual-lane bridge, actually there's a four-lane bridge across, but they're only going to have one set of supports in it. They're going to support it from the center. Now, from a logical perspective, we usually will develop these down to something we call third normal form. We're not going to get a lot into modeling terminology here, but it is important to understand this so that everybody gets it because eventually somebody says, okay, I got that you've got a good plan here. Now do you have a construction firm that can build this type of stuff here in and actually make it come in on time? And that's where we move now to physical models. And the physical models just become how are we going to do this? The blueprint for the physical construction here, and if you Google this Brever Tarn Bridge, there are some tremendous videos out there that show some of the things that are depicted in the drawings as to how they actually built a bridge where nothing else existed because, of course, they couldn't put up a framework in order to do it and take the framework away at the end of this. Our models evolve in exactly the same way. And that's what Joe could have talked to you about now. Right, thanks Peter. So there are really, I'd say, kind of four quadrants if you think about that if we ignore conceptual right now and think about logical and physical data models and the as is and to be. So the current state and the future state of those particular models. Now a lot of projects I work on, I'm not starting with a blue sky. I'm not starting a green field implementation. I don't get to start from scratch. There's some sort of existing data model either in a commercial off the shelf system or something, some custom developed data model. And so one of the first steps we have is to reverse engineer that model or that particular database or set of databases into a physical model. And that's what's in the bottom left corner. Now our next layer is to say, regardless of how this was physically constructed, let's get rid of some of the technical components and technical implementation and move up into the logical world and really start talking about entities and relationships. Another big key thing is a unique identifier or business keys. Those are very important to understanding how a business uniquely identifies an entity within our organization. And then you would think we would just move on over to the new future state of what that logical model is and then push that down into a new physical model. But in reality we forget, sometimes forget that we need to pause in the logical step of the current state diagram because there are some sort of other requirements that come in. There's been some business process change or maybe new technical acquisition that necessitates some change to the model. And so really what we're advocating here is a reverse engineering of a physical model and up into a logical model. Then make your modifications at that particular layer, so both on new entities, create new relationships, possibly redefine the granularity of the business key. Push those forward into the future state model and have that represented and captured to understand the gap analysis and the architectural requirements that will help you with estimation. And then that logical model can be pushed and forward engineered into a physical model and deploy it into your development test of production environments. And, John, I'm going to restate what you said slightly differently. Everything that you said is perfect. The only way it works, the first way you showed it, is if nothing is changing. If we're only changing out a physical database and we encounter that very rarely in our practice. Yeah, lift and shift is pretty rare. Now, sometimes that does happen as maybe a step one when someone is really feeling the pain of mainframe architecture and they need to move to a distributed system, but you're absolutely right. Typically there's a necessitation for a model change or some business has required a model change. And again, without picking on too many of my colleagues in the academic world, we don't find that this level of explanation is presented to young students when they're going through. So this is fairly new to them when you do this. And most people don't understand why I can't move strictly from technology-dependent physical as is to a technology-dependent physical-to-be model. And the answers to that are quite clear. In fact, if we look at it in a little bit more context here, we tend to teach students an awful lot about how to build new things. And that really falls into these buckets here that we talked about before. The conceptual, as in the requirements level, the logical as in the design and the physical as in the implementation-specific components of it. But we rarely introduce them the subject of reverse engineering, as John said, but yet you come on site and almost always something is there. You rarely, rarely build something from absolutely nothing that's out there. And that takes us then into a number of different flavors of reverse engineering that we get into. There's a lot of different ones I could spend an entire hour session with you all talking about these bits and pieces here. I'm just going to build the rest of the diagram so that you can see it. But the idea is that unless you are moving your data from one design to another design without changing the requirements, you actually have to go through all of these steps in order to figure it out. And each of these involves a model of some sort, which ends up being expressed in the same data modeling terms so we can at least maintain the consistency of the vocabulary throughout there. And finally, we get into metadata modeling as well, which gives us another component to layer over top of these. Metadata is simply data about the data that's in the models, whether we have it, whether it exists or not. So one final way of describing all of these is with a matrix that we put together a couple of years ago. We've got our as is and to be our conceptual, logical, and physical. And we've already spoken a bit about validated and unvalidated. And our goal is to say, everything we do within a model evolution is going to be mapped into some sort of conceptual space in here. One final point here, and then we're going to move on to a little bit of strategy around this. The relative time, the characteristics of the modeling phase evolve over time. And you can see the first change in here is one that we do where we talk about evidence collection and analysis. And we collect much up front, particularly during our preliminary activities. But during the modeling cycles, the emphasis changes from collection to analysis, and that we should really be focusing that in all of our sessions. We should also be looking for the coordination requirements. It can be hard to get ahold of people, hard to get ahold of objects, and things like that, hard to get access to certain things. Our target system analysis, the idea of what we're trying to build for the client, is going to increase over time as we go through this particular process. And finally, our focus on the requirements modeling is going to change very much from refinement initially. I remember I said a few minutes ago, the best thing somebody can say to you is that model's wrong. It's a terrific thing. It tells you that they're buying into your requirements, but that we should then change to a largely validation-oriented posture in order to do that. So let's move a little bit further now to talk about how models support requirements. The real first question that you need to ask is, what is the opposite? Were your systems explicitly designed to be integrated or otherwise worked together? And the answer is no, and you heard John mention earlier on, that data is the key point at which integration breaks down. Data is the most granular thing in your organization, and if the data doesn't match, for example, as moving an alphabetic character into a numeric field, it will not work. Of course, we also know that if you move a field that's five digits wide into a field that's three digits wide, that won't work. We could go on and catalog all of the various requirements that are there, but it is a problem in order to do that. So if they weren't explicitly designed to work together, what is the likelihood that they will just happen to work together? And the answer is pretty much not. So your organization is likely spending between 20 and 40% of its IT budget compensating for this poor data integration. And these structures can't be helpful to the organization as long as the models are unknown, and that's what we're trying to do with the modeling, is to expose them. So we're achieving efficiency and effectiveness goals at a slower pace. It's really, unfortunately, your organization is experiencing death by 1,000 cuts, but they don't know it, so they just kind of get used to it. And you're also providing them with a component of dexterity in order to do this. And John's going to take a few minutes here now and talk to us about some design styles of largely warehousing-oriented type things, but these can apply not just to warehouses, right? It can be to any sort of data store here, and the strategies behind them. Exactly. So as Peter's talking about organizational strategy, I think it's important to understand there isn't just a single flavor of data modeling. There are lots of different flavors, and I'm going to talk about three of them at a high level. The first one, and probably I'd say the bread and butter of relational database world today is third normal form. And Peter mentioned about that before, about when you move into the logical model, that's where you're heading. And this is a data design technique that's been around for many decades, and the concept is to remove data redundancies and so that you're storing information efficiently, and you're making sure that it can be uniquely identified and that those unique identifications can be related to one another in a way that's kind of expressed in a one-to-one or a one-to-many relationship or even a many-to-many relationship. And this diagram that you see in the top right shows lots of different things. It can be very descriptive about how customers are identified with a customer ID and how a customer may have multiple orders within a particular system, and for each order that may contain multiple order items, which links to a particular product. And that product may be provided by a supplier and it may be provided on a certain supply schedule. And those things can be associated with very intricate and complicated business processes, but this modeling form here is showing how to store that data in a format that's best suited for, let's say, a transactional system. So an order management system or a point-of-sale system would structure data in this way and store, have a separate entity for product information and have a separate entity for customer information. This visual can be understood by, I'd say, less technical personnel sometimes. We don't have to get too far into the specifics about indexes and computed columns and materialized views per se, but when you can describe a product with things like the name and the description and the serial number and say a product is identified with a product ID, that resonates with the business and they can understand that a product belongs to only one supplier in this particular case. That makes sense and that statement can be validated by the business. So this is a very, very popular design style for the transactional systems, and it's usually, I'd say, a foundation for what we're going to talk about in the next two styles. And John, can I just back up? Yes. I'm going to interrupt your slide there because it was dead on. The other part of this, though, from a third normal form perspective is that by definition, these models are at their most flexible and adaptable and therefore risk averse to the organization. So particularly when an organization is looking to avoid risk in its IT area, avoid risk in its business functionality, that's the key importance to coming up with this third normal form because by putting it in this most flexible and adaptable format, you're then better prepared for what's going to happen next. We had a little discussion about strategy a couple of months back. Remember, the bit about strategy worked really well until you get hit in the face, right? And so, you know, that's a component of it. But one other point, I just want to make sure that you do, you got it on the slide here, but this is a scientific piece. Right. Absolutely. And the fact that if you give the same inputs to the same data modelers following the basic rules of third normal form, you should end up with very, very similar models, if not identical. Now, there's certainly an art form to it. There are certain ways to implement the same constructs a couple of different ways, but ultimately you will end up in the same spot and have a common language both with your current and future data modelers. Thanks for ending up there. No problem. Okay. So, aside from the transactional systems, you may be familiar with analytical systems, so possibly data warehousing or reporting stores or cubes. And there's a totally different design style for modeling those data structures, and it's called dimensional data modeling or star schema modeling. And this design style has been around since about the 80s or so, and it is created by Ralph Kimball, who is still a strong opponent of this. And this has become a very analytically efficient way of storing data. And while we talk about third normal form and the process of normalizing, dimensional modeling, actually it takes a different approach, and in some cases we'll do denormalizing, meaning there's kind of repetition of data within central tables. And the reason that this is done is purely for the performance for analytical processing speeds. There's a sacrifice on storage efficiency, so that you may have duplicative data within your system. But on the back end of that, you're getting much faster processing, especially when you think about aggregates. So how much revenue did I generate in the eastern district for this particular product over the past two quarters? Those kinds of questions are much more depth and are much more easily answered by a dimensional model. So what you'll see is the central fact table that links out to two other dimensions, those dimensions that we talked about with customers and locations and products. Those are things that might be, that would be otherwise normalized in a third normal form model. And those dimensions, there are some variations to that where a star schema can be linked out somewhat kind of in a third normal form model where you have snowflake dimensions, although in general it's not recommended. And then there's also the difference between John you mentioned on the previous slide that that was more oriented towards transactions. Right. If transactions are there, these transactions are more likely to be updated. So we want a more granular model so that when we lock down a piece of the model and say we're going to put an update transaction against it, we don't lock up the entire model whereas this is a primarily read-only push to analytics type of a business intelligence type of a model. So again, what is the purpose of your modeling? What is the purpose of your delivery system? Picking the appropriate tool then is the obvious solution to making sure that it does match with the requirements. Absolutely. Picking a data modeling approach that's fit for use is critical. And so finally I wanted to talk about the data ball model and it's one of the newer relational database modeling techniques that is really well suited for central data warehousing. I wouldn't say directly for reporting per se like the dimensional model is, but certainly for central storage where you want to capture time variant data. So data that's happening in your transactional systems that may be changing on a consistent basis, but you want to have a capture of that data in a non-volatile format, meaning in a non-changing format where you could go back and query that model and say, what did data look like as of this particular day? Six months ago, what did my sales transactions look like or where did Peter live three months ago before he changed his address? And this data in the data ball model would be stored at an atomic level. So we will have, we won't be aggregating information and summarizing it. It will be down at the raw level, very similar to what you've seen in the third normal form. But this particular design style uses some constructs called hubs, links and satellites to store time variant data over time and uses business keys and links to store relationships. The relationships are actually engineered to promote flexibility, and as Peter was talking about before, in a case where there may be, you know, maybe a one-to-many relationship in a current system, the data ball will advocate turning that into a many-to-many relationship just to promote flexibility without having to do data model re-engineering in the future. So there's an approach, some technical components to this and a business component that makes this a very resilient, central structure and very powerful for data integration, especially across systems, you know, multiple disparate transactional systems where you can find some implicit links between business keys and the content that's linked to those. We'll dive into that a little bit more deeply in a future webinar on that. So the idea is that the data models, by understanding data at its most flexible and adaptable format, it means that the code that you build, the infrastructure that you build around it will be cleaner and less complex, and it'll be easier to implement and change strategy as it comes down the road. And I'll give you one just brief example on here. This is, again, from one of Clive's books on this. I was an employee in a sales organization at one point in time, and they only had salespeople and managers, but they didn't have the ability for a manager to get paid based on how much sales they did because managers didn't sell, which was the current rule. Well, we went through a recession, and all of a sudden they told managers to sell, but they couldn't do it, so we became managers and salespeople. And for a brief period in time, we got paid twice, not exactly the outcome that the company wanted to have in order to do that. So, again, very small example, but just so that you get an idea, let's look more at how data modeling can now fit in and help support missions and objectives in this case. So I'm going to take a great mission statement that doesn't mean anything, but they want to develop and support products and services that satisfy customer needs and markets where we can achieve a return on investment of at least 20% within two years of market entry. Sounds like a great idea. We take the nouns out of that sentence and drop them into a data model and say these things must be components that get into this particular piece. Well, that's good. What does the model actually look like at this point, though? And again, it allows us to come along and say, hey, within that context, there's some specific things that we'd like to measure as far as goals, the market analysis, the market share, how much innovation that we're going to have around this, et cetera, et cetera. And we can map those goals back into that data model there. Most people say, oh, that's great. We're done, right? And now it turns out that we then take the next step. Let's say we have markets and needs, but we really need to understand not the markets and the needs, but what's the intersection? What is the market need? And that is what really supports goals number two, four, and three that we had up on there earlier. That markets and customers are important, but we really need to understand what is a market customer in order to do this. And this level of nuance escapes most people when they're doing superficial analysis. Again, I get full credit to Clive on this for doing this. I've seen him do this with a number of different organizations. He's just brilliant in terms of articulating these things and saying, look, your problem is not identifying market need product as customer. It's identifying the markets that have needs. It's identifying the customers that are associated with the market. It's identifying the products that you can sell into that particular market, because only when I have that level of understanding will I then be able to come up with an overall ability to build systems around that. And you can get a little bit further nuance on this by refining that model down to another level of understanding here. But we're at the top of the hour, and so it's time for us to stop and tell you that we're taking you through a data management overview of this as we always do. Hopefully understand now more about what data modeling is and why somebody does it. That it's really key to incorporate these purpose statements into your models, not just definitions, but really get people and your subject matter experts to use them that way to understand how these models contribute to all kinds of organizational challenges as John was describing at the beginning there and help us guide problem analysis from a data perspective. It doesn't mean the other perspectives are not useful, and that's one of the things that's so valuable about John is that he's an excellent software engineer as well as being a data engineer, so he can see both sides of this before. We end up with some people who end up being what we call data bigots, and that can be a problem because there are other aspects of this. We want you to use the models in conjunction with your architecture and engineering techniques, and then really get to supporting the strategy of the organization as a whole. We're at the top of the hour, Shannon. Hopefully there's a couple of questions there ready to go, and let's see what people have in terms of Q&A Force. Thank you both for this great presentation. Of course, we have a lot of great questions coming in already. The most common question asked are people inquiring about copies of the slides and the recording of this presentation? Just a reminder, I will send a follow-up email to all registrants by end of day Thursday with both links to the slides and links to the recording and anything else requested throughout the presentation. In other words, a comment that came in, Peter, right off from your bio. I think we pulled back up to that. Peter asked me to ask this, hardly a scientific survey, but I have some indications, no surprise, that perhaps 5% of an organization system are covered by useful, complete models. Peter's comment, your field experience. John, have you ever walked into a customer that's had a data model already? I think you have. I have. I've certainly walked into organizations that are well-prepared and employ a team of data modelers and transition to the full, I would say, kind of an approach in the software development and web cycle methodology for maintaining data models. And I've also had people just give me access to the database and say, you figure it out, we don't have any documentation on that. I think it's across the whole gamut. I'm sure you've seen the range of experiences over your decades of doing this. What percentage do you run into that actually has stuff ready to go or some artifacts that are even useful to you? Some artifacts, I'd say, probably in the 20 to 25% range. People that are actively maintaining that, I'd say in the 10%-ish range, at least in my experience. And I will say this to my experience, it's about the same one in 10 organizations that have that. I think the point, though, and I know who's making the comment out there, Dave, which is great and we want you to keep doing that. What we have here are a group of people who are at least aware that this is something they should be doing. So our belief at the core is that many fewer than one in 10 organizations are actually doing this on their own. Again, who should be? I will categorize that. If you've got a small heating and air conditioning company and a small package from one software vendor, you do not need a data model in there. Unless something happens that you need to evolve that data model from state A to state B. But most organizations just do not have even an awareness that this is important. If you're going through college and university and they've taught you about data modeling, you're the exception rather than the rule. And if you understand how it's supposed to be used beyond the textbook exercises that you get. So it's really a case of people are out there not knowing that they don't know. And a lot of the work that we have to do collectively here is to help customers understand the value that we bring by doing these formalization techniques, by approaching them and saying, you know, if you don't have a model of this when we're done, you won't know what we did. And you won't know what you need to do next in order to do that. Absolutely. Good enough? Yeah. Perfect. Yes. There's actually an article put out by MSN Money recently naming the top 21 job skills needed and data modeling was the number three. Really? Yeah. Anyway, so it certainly, as you say, is coming to the forefront. So Peter, where do we store all the details so others can easily find it later? It's not just a question for me, but John, let's ask the question slightly differently. Of organizations that you walk into, how many have case tools that know what they're supposed to be doing with the, because the obvious answer to the question is, oh, well, you'd put that in an integrated case tool. A case tool would have all this wonderful documentation around it and things like that. I can tell you this, case tools are not being taught in college and university curriculum anymore. Period. No, I can think of one client that was fairly large and the one that had the most mature data modeling process that I can think of right now that I can recall that was really using technology and tools, I'd say, somewhere in the case tool sphere to manage their data modeling and the integration with that and communicating that with the software development team. And that's it. Have you been able to introduce case tools into an organization and they've adopted at least temporarily or to humor you maybe or something like that? I don't mean that badly. Well, no, what's really interesting, I think, is that when you show someone the power of even just, you know, a modeling tool, even that's strictly within the domain of data modeling, the power of visualization is huge. I mean, we see that in reporting, we see it in business intelligence. And those same concepts can't be applied to the engineering side. And so there are a number of tools that I've used in the past from Oracle Cable Developer Data Modeler, DB Schema's one. It's kind of the flavor of the day right now that we're using with the current client. Just to express and capture some of the requirements we have and show them the power of modeling and how the use of that tool, both for visualization and documentation of those requirements artifacts, as well as using them for forward engineering and reverse engineering to go through the automation can save a ton of time and reduce errors and complexities and improve the flow of the project. Thank you. Did he mention Erwin and Erwin? Erwin and Erwin. Erwin and Erwin. And all the other ones too, sure. But if you trust the university system to take you through this and come out on the other side and say, I've been educated now and you don't have exposure to these things, it's, you know, again, practical criminal in my mind, but it simply isn't in the textbook. So it's going to be up to us collectively, the community, to educate people around this. Now the question though, Shannon, which was a good one, is where do you stick this stuff? Okay, better to stick it in an Excel spreadsheet than to not stick it anywhere, right? And that's probably the number one form of documentation that you've seen. Some get into a little access database. We've actually built a couple of them here where we quote call the metadata repositories and things like that. But the point is if somebody goes to the trouble of documenting this information, keep it. Put it someplace where it can be accessed in the future, if it's a shared resource, shared server in your SharePoint site, assuming that's maintained very, very well, et cetera, et cetera. This is critical because it represents the expenditure of somebody's subject matter expertise to come up with a formalization on that and that we really do want to make sure these things are carried onwards because you know what, there's another Y2K bug coming about. There's the UNIX bug, which is what, 2034? That's going to hit? Are you even aware of this one? Yeah, right, exactly, okay. So there's another Y2K bug in the UNIX world that's coming up where we've got to deal with that. And Chris UNIX is much better documented than most other systems, and it's fairly ubiquitous on most of our phones and things like that these days. But it is a problem, and if we don't know where it is, it's a big problem with Y2K. It wasn't hard to change all those fields from two to four digits. It was hard to find them. And that goes back to the documentation piece. Great question. Thanks, Shana. Absolutely. We have a lot more questions coming in. I just love it. Keep them coming. From an organizational role, the responsibility perspective, how do we define the roles of data models versus data business analysts when it comes to defining business concepts, definitions for new data? Can I crack that one, or are we going to do it now? I have a couple thoughts on that. And I'd say depending on the size of the organization and their, I guess, advertising capacity for fulfilling data management roles, I think what I see is a lot of people wearing multiple hats in the organization, and so there's definitely a gray area between an analyst or a subject matter expert and a modeler. There are, I'd say, some technical skills in being a data modeler, but having a background in the business that is being modeled I think is even more important. So even though you can embed kind of a data modeler, someone who can absorb that business information is really important, and I guess if that can't be done, at least if that subject matter expert from the business can be on a team, on a collaborative team to participate in the modeling process, I think is huge. So as far as defining the responsibilities, I think it depends on the particular size of the team and the skills in the background that are there, but that's been my experience. I feel like there's been much more conversions and skills in the background at least on the teams that I've worked with. Frankly, we even see that among our own team where we've got people doing cross skills. So there's a dirty secret out there and that is that it's a lot harder to teach business to technical people than it is to teach technology to business people. And so we do find that it's sort of pervading, but in this case you can see a little bit of knowledge could be a dangerous thing. So there is a step function where we'd like somebody to be at least this qualified in data modeling in order to produce models that we consider useful to build out as foundational pieces for the rest of our systems. But as far as the labels go and things like that, I think the goal is collectively let's get it done. I don't tend to worry about the labels nearly as much in there, but I do see that more and more business people are appreciating this because one of the things they learn is that when they drag and drop something on a GUI like Tableau, there's actually some stuff happening underneath and if they understand conceptually what's happening with the SQL or whatever types of queries are going on, if you're in a big data technologies environment, it's still important for them and they kind of try to get this and they go, okay, if I put quotes around and I'll get better results than if I just enter two letter phrase and things like that. So we wouldn't, I think we collectively say not so much about the titles, but really it's the capabilities of the organization and to make sure that these things actually get done rather than arguing about who gets labeled what and where the division of responsibilities is. It's really collective and it can't be done by any one group on their own. Now, you mentioned modeling many, many of the relationships. Just curious, how do you do that? Sure. So in the relational world, there's something called like an association table or many to many table and has a couple different, I guess a couple different names. But the idea is that it's an associated table, excuse me, that basically stores combinations of keys from other tables and I wish I had a visualization to show this because it sounds somewhat esoteric. Yeah, let's see if we can find a slide that has that particular piece. In fact, I think order in order item, if you go back to that. There we go. So if we look at an order item being a many to many table between order and product, what we're saying is that an order can be, excuse me, an order can consist of multiple products and a product can exist on multiple orders. And so in the top right, if you see that order item column, sorry, order item entity, it has its own unique identifier as far as an order item ID. But really the key part that makes this an associated table and one that implements the many to many relationship is that it contains four keys to both the product ID and the order ID, which allows both of those to be represented in multiple combinations. And this is a bit toward what we're looking at in the final example of the strategy piece. Somebody might say order item and products, but I actually need to know what products were ordered. And until I have that little intersecting entity in there, I'm not going to be able to do this quickly and efficiently in a reporting context. Right. I hope that helps. If it didn't, feel free to reach out and we've got some more examples that we can show. Love it. I'm speaking of examples. The next question is, what book sites would you recommend for more information about data vault architecture? Ooh, there is. Now, I know the site name has changed. It used to be LearnDataVault.com. There's some online courses that are there. There's also Supercharger Data Warehouse and Building a Scalable Data Warehouse Version 2.0. So there's been a revision to the data vault architecture called Data Vault 2.0 and a new book from Dan Linsen out on that, which I think is an excellent resource for data vault modeling. The first two chapters of that cover a lot of information about data vault architecture and its role within central data warehousing. So I'd highly recommend that. I love it. And we'll get that information out in the follow-up email as well. That's a great example of things that will be included. Now, opinions on what tools best support data vault modeling? The tool set best supports data vault. Interesting tool sets. I think from a database technology perspective, it's basically technology agnostic. So there's no real reason to move towards Oracle or SQL Server versus DB2. And even there's, I would say, NoSQL integration, which you'll read about it if you look at some of the books in Data Vault 2.0. We've done some experiment, or I don't say experimentation, but some development of using BIML, which is the Business Intelligence Markup Language with SSIS or the SQL Server Integration Services Package to do automation. Because the data vault is very pattern-driven on implementation, there are some opportunities to do code generation for ETL and modeling on that side. So I'd say there's some technologies and possibilities there. And I'm trying to think about maybe Analytics DS may have another piece of modeling. Peter, I'm not sure if you saw that at the conference, but I believe they have some data vault integration as well. So that jumped into code generation and things like that. That's probably a subject we could do an entire webinar on as well. That would be a lot of fun. But again, if you have questions about that, this represents another advance in this context. As John said, because vaulting tends to be pattern-driven, it means the ability to apply automation is greater. We don't want to mislead you and say that we can automate everything or it tends to work out that way. But if you can get a big lift by doing that, it is absolutely worth the investigation. And a couple of our clients are looking at that. I love it. And there's a lot of really good questions coming in. Even though we have in half an hour for Q&A for this particular webinar, I don't know if we're going to have a chance to get through them all. If we don't, however, please keep your questions coming in. We'll get written answers to you for those questions, specifically from John. We could try that, right, Shannon? So the next question in the queue is, what type of model do you recommend for Datastore? Well, I think it goes back to what is the purpose of the Datastore. So again, you can store your data in a comma-feparated variables file, and that can do it. Yeah. And I think to jump onto that, one thing that we really did scratch the surface of was someone non-relational Datastores that are out there. And data modeling certainly has a role in that. And I'd say even a very large role in that, because when we talk about the NoSQL space, you commonly hear things about schema-less databases. That's actually not true. The schema is somewhat kind of baked into the data. But being aware of that and having the forethought to model content that would go into that type of database is, I think, crucial to the success of those particular data implementations. So the point I'm trying to make is that even in the NoSQL world where there's key value stores or document databases that are things like MongoDB, or a Shure document database, and those types of things. And then there's the traditional relational world. And I think generally in a relational world, for most operational systems, most transactional systems, third normal format is typically going to be your best bet when you start thinking about data warehousing and reporting, dimensional modeling, as well as data vaults for potentially enterprise data warehouse, a central data store could be appropriate. But there are lots of alternatives. I'd love to talk more about that. Right, but that's definitely an it-depends answer. Sure. And just a reminder, we have 14 minutes left on this next question. This is another one we could probably do another webinar on. But how critical is business vocabulary for defining conceptual and logical data models? I'm still glad you asked that question. Turns out Karen and I are doing a presentation on that next month at the data governance conference. Looks like I've got the logo up there for the winter one, but we'll be there in San Diego in the next couple of weeks. And really what these models do is that they standardize the vocabulary. So what do you mean by a transaction? What do you mean by a customer? What do you mean by a sale? Other things that we had, like our little model we put up at the very beginning, is much more useful as a reference model than just simply as being a data component that popped out of it. So we can look at something like this. Oops, missed it. Hang on, I'll get it there. But really getting to the vocabulary standardization point is critical because if you're not literally on the same sheet of page, you run the problem of having all sorts of business issues. And I'll give one quick example here. I know we're running short on time, but this is one of the health care systems that we worked with on a single field of data called AdmitDate. And that AdmitDate was very, very standard. It said, date, somebody has admitted. But we found 11 different ways it was being used and they cost the organization tens of millions of dollars because things were getting charged incorrectly. So yeah, important, absolutely important. Do you have anything to add on that? Just to echo how important that is, especially when you get into, I guess, maintaining the translation once you get from conceptual to physical model because the names of those business entities may change in the data model and something that a business user may be familiar with such as, I don't know, a union affiliate may really be represented in a physical model with something generic like organization or maybe even something like party. Some of you have heard of party models. And so understanding what those linkages are and putting context to maybe a more generic representation in a model to a very concrete construct in the business is very important. So I think maintaining that metadata is very important. Thank you both. Between design styles, for example, 3 and F, and styles of modeling are another example, UML. So we mentioned that there were different ways of representing the model articulation. This slide here. And when we talk about this, they get you to the same place in general. So these really, there are a few other even more esoteric versions in this. But all we're really talking about here, are we doing crow's feet or circles? Are we doing lines or dots? You know, this sort of thing. The important thing is you don't want to mix them within the same project and probably not even within the same organization. But the organization should pick one of these as a whole and use it all the way through. Now let me answer the question a little bit differently. We're talking about design styles. Design is a specific activity in here. When we talk about going from conceptual to multiple iterations of a logical model into a physical model eventually at the end, design styles can become important. For example, design styles in a service oriented architecture can be put all the modules that hide complexity in the same component of the system so that we know that that component has a higher complexity than other components which have much simpler complexity metrics that are involved with them. That's a different question on how to do that sort of design style really gets into a how to do the data modeling where somebody might come up with an initial version of a data model so that we call a preliminary design version and then through a series of additional checking in the process of translating that into the validated model we might go through some iterations that help to optimize it for example for performance or for maintainability or for storage expense. So there's a number of different considerations that you could put in there that would also apply to that design patterns moniker that was used there. A little bit more complicated question we can get into but hopefully you see it's a bigger issue out there than just a simple answer. How does data modeling adapt to the proliferation of unstructured data sources? It seems like it would introduce previously unknown relationships and dependencies. So great question. First of all, one of the things we all have to realize is that anybody that tells us that they can turn unstructured data into structured data you should hand them a glass of water and say turn it into wine for me as well. The definition of unstructured is really that. It is something that cannot be structured. However, if somebody tells you I can take some data that is less structured and make it more structured that's an absolute reasonable contribution and what we're looking for there is really hooks. If you think about it the opposite of this was the comment field where you had a nice set of structured data and then this 256 character blob that you could add anything into. And John and I have both been on projects that involve parsing those sorts of things. If you sort of reverse that and say I have more blobs and less structure that's really I think what the questioner is asking. And what they're asking is if I've got data that is less structured than I'd like it to be and I'd like to have more structure to it so that I can go through and search it better that's really where the data modeler can come in and look at these data sources and identify the hooks if you will so that we can velcro if you will or provide some structure around this that gives people the ability to query and say I want to go through all of the e-mail but I want to find only the e-mails that have to do with where do I find and log in for my webinar session which is one of the things we do every week right, Shannon? As far as working with that. So that adding some structure to data that is less structured but it's by no means making data less structured than structured data. Yeah, I mean I think when it comes to data modeling and what Peter's referencing as hooks I think really comes down to where you would find keys either like a business key link so hopefully if you have a set, let's say an image or possibly a document of text or audio file, something like that that's truly kind of unstructured data hopefully there will be an identifier that may be a link into your structured data system and I think that that's one key piece where data modeling between those two sets can occur. I think the processing of unstructured data into semi-structured data is another opportunity to say the output of that process could be modeled and so if you're going to do text parsing or natural language processing on some unstructured data that may produce some other structured fields that content should be modeled and considered in your overall data architecture. We've seen systems that didn't actually have keys but that actually had tags and the tags were useful indicators that probably fell into this bucket instead of that bucket and things like that. Yeah, absolutely. So maybe there isn't a primary key but that comes somewhat of an associative relationship kind of like a many to many thing is what the tags infer between maybe some existing entity and a structured system and that unstructured document. And this is where your handy data model or that's in your organization that understands your business very, very well can be of tremendous value and add some very significant steps because we've seen a lot of organizations struggle with this and simply forget to ask the data model or most ask the data model and they come in and go, oh yeah, I've been thinking about this a lot in the shower and as I go on vacation and things like this and they come up with some structures that people had never thought about before. Very interesting sets of problems. That's kind of what we like to do. It's really interesting problems like that. You know, I think we have time for one, maybe two more questions. Again, if you have additional questions, keep them coming in and we'll get written answers out to you in the follow-up email. How is the data modeling practice done in an agile way of building applications? So agile is a very good method for developing software. But we would both say that if you're starting agile and you have incomplete data requirements, it's very difficult to keep the agile pace running. I would go so far as to say that the minute you discover that you have a fundamental data requirement that's not understood in your coding, you really need to put the brakes on that particular sprint and stop and get those requirements pinned down before you try to go and build more data structures around it. Remember, these data pieces, like the databases that we talked about early on in the presentation here, are absolutely key to this. Hang on, there we go. Sorry, too far. But anyway, what they do is they allow you to put in place software that goes in programs A, B, and C around that database. If you change that great database to anything other than what its original state is, all programs A, B, and C have to be changed. And usually there's not just three of them. It's a ton of different things out there. So agile is about how to develop the programs once you have the database correct. If the database ain't correct, you shouldn't be spending any resources on those programs. Now, again, just a couple of things. Being a software developer in the agile environment, it can be a challenge. I'd say largely agile backlogs are managed through product functionality. And so you might get a particular user story saying that someone needs to perform this particular action within a system. I would argue that if you are really going to become data-centric and successful in an agile data development environment, think about all of the data artifacts that are affected potentially by that story and see if those can be further developed out or grouping of backlog items into agile sprints can be oriented maybe in a more data-centric way. That will cause us a lot less re-engineering going forward. So if you can think about your development process somewhat by grouping that in things that affect common data components, that's very helpful. Also, just following some conventions like third normal form and doing data modeling the right way helps extensibility of the model because you're able to draw relationships and linkages to previous data constructs within your model as you move forward. So you heard the answer. I'm a purist and John's willing to compromise, right? Any recommendations for usage of modeling tools? What tools to use for data modeling versus data classes modeling? Any opinions on that? So data classes modeling I'm guessing is that of an object paradigm? Yeah, I'm thinking using UML or some documentation that way. And I certainly think, I kind of heard something interesting, that UML was never designed to do data modeling. It's just the best thing that we had to do data modeling, at least for logical modeling. And I can somewhat agree with that if you exclude the concepts of methods or accessors on those. I like to use free and accessible tools because I know a lot of other or low cost tools because I know a lot of organizations are constrained by that. So Oracle SQL Developer Data Modeler I think we've had a lot of success with that. It's not restricted just to the Oracle environment but it is an open source tool that helps data modeling. And then something I've used recently is DB schema. And that's been very helpful for foreign engineering into multiple environments and I think it has a very nice user interface and logical grouping of layouts that help you segment the data model into more consumable chunks. And I will say this to the object management group. Has done a great job of trying to cope with the retrofit that John was describing. It was the best tool that was out there. And they've really gone to considerable effort to try and make it more supportive of actual data modeling in there. So there's new releases that are coming out that are evolving us into something that's a little more common within that object context. But you've got to get people that are interested in this in the first place and that's really the hard part. So hopefully that's what we've done here today is giving you all a little bit of an idea of how data modeling needs to be used in organizations to build these foundational pieces and come up with a good set of constructs that the rest of your technology environment can be built around. And that you do need to have good knowledge, skills, and ability in order to do this. And we hope that data diversity seminars like this one will help you guys to learn about this. Got a couple of upcoming events that we want to draw your attention to. Next month we're going to do a conversation with another one of our colleagues here, Karen Akinthou. It's done a terrific job of articulating a data quality success story at a customer that's going to be, I think, a lot of fun. We'll do that on July 12th. And then between now and then, Shannon, will we see you in San Diego or are you going to sit this one out? I'm afraid I am focusing on getting ready for our next online conference. So I will not be in San Diego, but I know it's going to be a great event. We're looking forward to it. Yeah, well, John and Peter, thank you so much for this presentation. It really has just been very informative, very good. There's a lot of comments coming in and just how valuable it's been. I'll share those with you guys afterwards. And thanks to our attendees, as always, for being so engaged and interactive in everything we do. We just love the community and can never move forward in an industry without lots of discussion. So always appreciated. And just a reminder again, I will send a follow-up email within two business days to all registrants. So by end of day Thursday, with links to the slides, links to the recording, and anything else requested throughout the webinar, including the book references and the answers to the questions we didn't have time to get to today. So I hope everyone has a great day and thank you so much. Thank you. Thank you.