 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity webinar. Why data modeling is fundamental. Sponsored today by Erwin. It is the latest installment in the monthly series called Data Ed Online with Dr. Peter Akin. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section or if you'd like to tweet, we encourage you to share your questions via Twitter using hashtag dataed. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And to open up the either the Q&A or the chat section, you may find the icon for those panels in the bottom middle of your screen. And to answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording and we'll likewise send a link to the recording to this session, as well as any additional information requested throughout the webinar. Now, let me turn it over to Danny for a brief word from our sponsor Erwin. Danny, hello and welcome. Hey Shannon, thank you. I'm so excited to be part of this. Data modeling is in my blood and to have such a esteemed speaker as Peter talking about the importance of it is very much appreciative. So anyhow, just quickly, if you're not familiar with Erwin, we are the data governance company, but for many, many years we have been the leader in the data modeling market. So we hope we know a few things about it. So I just kind of wanted to set up, you know, if you buy into what Peter's talking about for the bulk of this WebEx, then maybe some thoughts that you can think of in terms of how you might want to operationalize that once you decide to take it on. So when we look at challenges out there, we hear about transformation, we hear about compliance, we hear about innovation, we hear about a lot of things and that poses challenges and opportunities throughout the stack of the technology that we have in the business. What we've been seeing and it's been great, it's been a solid and long period of time where data modeling has been well entrenched, but we're actually seeing a significant uptick as people are continuing to try to get more capable with their data in terms of bringing on data modeling practices and integrating data modeling into the larger approach that they have to securing, managing and delivering data to their organizations. So what we're finding is people are asking for support for more database technologies because data modeling is a great practice and the real benefits come to it when you can actually connect to technology and drive out that infrastructure that's important to manage and deliver information. It's requiring new modeling techniques as these new types of databases come out there, whether it's new data warehousing approaches, no simple databases that have different structures than what we're used to, a big demand for agile data modeling so that we can become part of that agile process, deliver that same value, but do it in a slightly different way that really supports the way the organization works. And then of course with governance being top of mind for everybody as well as getting more out of your analytics, there's a real opportunity to use data modeling to drive semantic integration across your enterprise, which is one of the net benefits again of taking on a data modeling approach. So when you look at the types of databases that you want to support, it's not just the classics, all of these new cloud-based databases, no-SQL databases, new formats are important and the ability to go forward and reverse engineer those are something that you're going to want in your organization for an engineer you're designed to deploy those things as well as reverse engineer to document what you have and then take a model-driven approach, but it goes well beyond the databases. You want to make sure that your data models have integration into all of the other technologies out there. When we look at what IDC talks about, the most organizations have six different types of data deployed across 10 types of data management technology, so that's an average. You can think of larger organizations that they're looking at and then of course they're doing that across on-premise of the cloud, so some sort of hybrid. So making sure that all the work and knowledge that you capture in that data modeling process can make it out to all of these places is very, very important. You've got data modeling paradigms, logical physical modeling has been around for a long time, entity relationship is still the way for people to really look at, understand and make sense of data requirements and turn that into something that's usable and understandable, but when you get down to the physical models you want to make sure that you can actually deploy in the way and design in the way that they require. So seeing a big change where adding to your relationship needs, now you want to look at these NOSQL document designs with embedded hierarchies, collections, things like that. So having the ability to move quickly and easily between those types of data modeling approaches is very important because when you're talking to the business user and getting their requirements, generally a logical entity relationship diagram is the way to go, but then when you want to deploy that out to something like a MongoDB or a couch base, you actually have to transform that. So make sure that you have that ability to work in all of these approaches, have them integrated and actually get support from your technologies to make sure that you can move between these structures freely and with integrity. On the data warehousing side, we're seeing a big uptake in demand for data vault and data vault being an approach to really having a more agile data warehouse that's more effective and designed to be aligned with the business as it changes. So making sure that you can take advantage of these approaches and really as you move from where you are today in your aggregation platform and maybe moving out to something like this in the cloud, that you can actually support the architectures that take the best advantage of those technologies and will deliver the things that you want. In terms of collaboration and governance of the data modeling process, it's very important. So make sure as you take on that data modeling practice, you have it under control. You can set standards, propagate those standards. You can bring in everyone together, allow them to collaborate, work concurrently so that they're faster and then have that stored in a repository that becomes the central point of contact for your DevOps processes, for your data governance, data intelligence processes as well. And then of course, taking advantage of the wealth of information that's been built up there. At this point, I'm looking at the common data model from Microsoft. Data modeling technologies and data modeling practice will be a big part of integrating that throughout your entire estate into the metadata that sits in all of those different technologies. And it's not just the common data model. There's a lot of great information captured from a modeling perspective from a number of different vendors out there that can really help you accelerate your path both to building a more useful and usable and understandable data landscape, but also governing those things and making sure that they're synchronized and in sync with each other. So very important things to think about as you look at data modeling as a practice in your organization and how you're going to operationalize it. So a lot of value there, a lot of value coming from the process and the things that Peter's going to talk about here in terms of really providing better business alignment for your data, making sure that it's right the first time, that it's easy to integrate and easy to understand in terms of increasing the literacy because most organizations don't have the literacy skills that are required to take full advantage of their data and anything that we can do to provide them a clear understanding of that data is going to be a significant benefit, not just in terms of how you design databases, but how you communicate that out to the larger organization and stakeholders that are out there. So with that, I think it's time for the meat of the presentation. Danny, thank you so much. And thanks to Erwin for sponsoring today's webinar and in helping to make these happen. And if you have any questions Danny about Erwin, Danny will likewise be joining us for the Q&A portion of the webinar at the end. Now let me introduce to our speaker for today, Dr. Peter Akin. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. He has written dozens of articles and 11 books. The most recent is Your Data Strategy. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to get his presentation started. Hello and welcome. Shannon and hi Danny. It's such a pleasure to have you joining us for the second half of this on here. Those of you that don't know Danny, that was actually some music on a project that he was playing on that was playing on our way into it. So many of us in this business have some other talents. Danny is a musician as is Shannon. So great to have both of you here and great to have all of you joining us today. It's data talk about data modeling, but we're really going to talk about it at the end here. So I'm going to give you the basics, the usual stuff that we do. And let's just run through it real quick to see what we're going to talk about. So first of all, we're going to talk about how data modeling is critical to actually getting data management to work under any circumstances. We'll look at some specific motivations, which is that in systems, data is a component of systems data is generally not well understood. And the only way that you can truly understand data is by modeling it. So we'll look at what is modeling and why. And it really represents our understanding of the fundamental foundational system characteristics. The reason it's important from a foundational perspective, and you'll hear this from both Danny and I in a little bit, because when you build the data model, you tend not to get another shot at it. Very rare that the data model actually evolves at least in a production sense around all that. And the key to understanding it is that it's sharing information, not just from one system to another, but from all of these systems to all of the humans that are involved in this. So we have to share not just between systems, but between systems and human beings in this case. We'll look at a couple of specific fundamentals where I like to talk about the purpose statement in particular. We'll look at that specifically and we'll talk about what it means to be data centric in our thinking. The model components then will be used to complement other architecture and engineering techniques. So almost anything that you're doing in technology is going to have a data model at the center of it. If you do not understand the center, the data model that is at the center of it, it's going to be very difficult for you to diagnose any challenges or make any enhancements beyond this. So data modeling, while it's fundamental, it actually is critical. And it's also the part that's probably done the least well on all of these. As I mentioned, about 45 minutes, we'll bring Danny back on and start talking more specifics around some of these. But let's just start out with a basic, which is that data as an asset is the only asset that we have in our portfolio that is not depletable. It does not degrade over time. It is in fact a durable strategic asset. And what that means is when we compare it to the other types of assets that we have in our portfolio for organizations, it tends not to be as well known. Most people tend to say things like data is the new oil. I really actually hate that particular phrase because it does not talk about reuse. So a better way to think about it, if somebody starts talking about data as the new oil, say instead, data should really be thought of as the new soil. And the key difference there is two parts. One, you don't just fling seeds about your yard and hope that good things will happen. You instead carefully prepare the soil. And the second is a time dimension. You don't plant stuff on Monday and expect to eat it on Friday. And yet some of these things tend to be that way. We also, of course, have to sell some sizzle with the data because quite frankly, except for those of us that are on the call here, it's a pretty boring subject, right? Except when you can't get your Amazon account or anything else that goes on to it. So it has been a day already that we've seen some interesting things out there in the data world. As a unique organizational asset, data deserves its own strategy. It deserves attention on par with similar organizational assets. And it requires professional administration to make up for past neglects. I'm just going to give you one example that's been popping up here lately. Forbes ran an article last fall that was showing that American Airlines had a market value of $6 billion if you added up the stock. And that's what the stock market valued it. But they had a group of investors approach them and say that their frequent flyer data was valued between somewhere $20 and $30 billion on this. And United also had a similar bump in that sense. Now, these two statements, of course, cannot be true. It can't be true that American Airlines is worth $6 billion by one person and somebody else looks at it and says it's worth $30 billion. We've got to work on those. And there's quite a lot of work that we have to do in order to do this. But a lot of people are paying attention to this article and saying, I wonder what sort of stuff is in my portfolio, my inventory. Now, let's just do a little quick bit here. And I forget where I got this particular example from, but it's a wonderful one. Would you build a house without an architectural sketch? The sketch is the sketch of the system to be built in the project. Would you like to have an estimate how much your house was going to cost? Because the model gives you a very good idea of how demanding the implementation of this work is going to be. If you hired contractors from all over the world, would you like them to have a common language? The model produces that language. Would you like to verify the proposals to test the system? Yes, the models can be reviewed before thousands of hours of implementation work is set up. And if it was a great house, would you like to do it again? Yes, we can use the same thing on other platforms using the same basic model. And if you drilled into a wall in your house, so you're now into reverse engineering, if you will, would you do it without a map of knowing where the plumbing and the electric lines were? And of course not the blueprints literally tell us where all this goes. Now, I say this, but this is a relatively new profession for us. It's important to understand. I trace our lineage in this area back to a woman named Augusta Ada King, or as she was known, the Countess of Lovelace. She was Lord Byron's daughter. And she looked at this thing that I'm showing on the screen here, which is a weaving, actually when she looked at looked a little bit more like this version of it. And these things that are hanging to the right with all the holes in them where I'm waving my cursor around over there, those are what she looked at and said those control the weaving loom. They're punching programs and and she said, ha, I can figure out how to make a machine do that. So all we need to do is make a machine that can read those cards and I can make the machine do math. So she is the world's first programmer. When we compare our profession against the other profession that I like to compare us against, which is the accounting world, they've had 8,000 years to get their act together and formalize the practices and the practices that they formalized are what we call generally accepted accounting principles. Many in data are working on the same kind of thing. But just to give you an idea of how immature we are compared to these other professions and there's no excuse to stay that way. But it is something we should pay attention to. We define data management as anything that happens between the source and the use of the data. That's a wonderful way to describe it, but it doesn't give anybody any directions and it also leaves out the reuse component of it. So I like to build up here by saying data comes in on one side and we have all of these engineering or munging activities that we do. We're not going to go through each of them in here and you have these exploitation activities on the usage side of it. So one is repairs. This gets into the reuse and we don't have those procedures as well developed as we need to. In addition, of course, all this should be covered by a governance and ethics program to make it all worthwhile. Now, data is a lot like Maslow's hierarchy of needs. If you remember that from high school, if you have food, clothing and shelter needs that are unmet, then you will never be safe. If you're never safe, you'll never be part of something that's bigger than yourself. If you're never part of something that's bigger than yourself, you will not know yourself. And if you don't know yourself, you don't know what you're good at. So little quick bit in there, they now call this self-actualization piece in the TED World flow. Perfectly fine, all good. I take those practices and I put the teal boxes there up into that golden triangle of data practice. It says, and you'll notice that all the things that are in there are labeled as technologies. These technologies, of course, just represent the tip of the iceberg in the TED world and that if you're not able to foundationally make all of those things grounded very well, then you will not have much success with them. And it turns out our successes in those areas are the same as they have ever been. About one in three IT projects, particularly data projects, succeeds on time with full functionality for the price it was originally agreed upon. The reason that is so important is because nobody is building capabilities. I shouldn't say nobody, all of Danny's customers are. The key is very few organizations, however, are even interested to participate in an educational activity like this, much less have a desire to learn. These things in the bottom here, the things below the surface, are capabilities. And everybody's always asking me, well, I understand that, Peter, but can you do it faster? And the answer is absolutely we can do it faster, but if I go faster, it will take longer. If I go faster, it will cost more. If I go faster, it will deliver less. And if I go faster, it will present a greater risk to us in this area. And this is particularly important in light of the fact that everybody now wants to go digital. So I want to pass on a little bit of digital insight here that I got from my friend, Mark Johnson, just a couple of weeks ago on this. And he was sort of doodling in one of our sessions and he writes down digital minus data. I'm not sure what you have, but I do know if you do the other way, data minus digital, you still get the data. And yet this even insight is missing from most of our young people. And the reason for that is very simple. We've been doing a terrible job in colleges and universities for almost 30 years of educating people. Let me show you a very simple example that popped out on LinkedIn the other day where somebody's looking around and going, oh, wow, I just had this recent technology realization. They make a post and the realization is, gosh, if I put chocolate ice cream into anything awesome, it's going to be coming out chocolate ice cream on the other side of that. And that, of course, is true with blockchain or without blockchain. Well, I'm sorry, if that's a recent revelation, you still have a lot to learn, Mr. Individual, who posted that on LinkedIn. It can be expanded a little bit. I use this one as my own personal piece, which is a bad data plus anything awesome is still bad results. But if you're anything old like me, you may understand instead that it gets to variation our garbage in garbage out. And the reason this is so important here, again, remember, we're talking fundamental pieces here. If we have garbage data and garbage data models. Now, when I put a model here in the center, I want to clarify this just a quick second. I'm talking about an analytical model here, I'm not talking about a data model on this particular piece. So everything in the center of this diagram, data warehousing, machine learning, all of these wonderful things that we're doing here all depend on having good data or in this case, if garbage is entering any of these pieces, regardless of what they are, we are going to get garbage results in this account for most of the failures that are out there in the data world at this point in time, replacing the quality, excuse me, the garbage data with the quality data, improving our flows, harmonizing them, doing a little bit of normalization in the process, et cetera, et cetera, will get us to the place where we can actually have quality going in and quality going out because unfortunately, most people are trying to go digital, but they're not doing the data piece of this and data models are the key piece of this. If you just spell even data, it still does not work. We have to have a fair amount of work and this is what we're going to talk about for the rest of this. If this is the first time you are seeing this wheel, shame on you, but actually shame on us for not making it more accessible to you. So this is the Dema-Dembach wheel and you can see that data modeling is one of the 11 pie wedges that we talk around this. I'm not going to walk through each of these. These things are mainly put out there for your reference. This is the context model from the very first version of this. Let's move on a little bit here and talk a little bit more about architecture. Now, one of the main questions I get as a consultant is that people come to me and say, can you create a data architecture? And I say, well, if you've got a company, you've already got a data architecture. All of them have data architectures. The question is, do the ones you have, are they understood? And if they're not understood, are they documented because they can't be useful if they can't be shared back and forth? So all organizations have architectures and architectures are made up of a series of data models. The models are facing downward. The architecture is the high level of abstraction. Again, we could talk about a door or a clock, but we could also break those pieces down into more detail and talk about a collection of wood and glass and nails and brass, for example. The components that we're talking about in data models are linked together in a number of different ways. The details are organized into larger components. So we may have these little bits and pieces that are up there, but this is intricate. And when I say intricate, I mean detail. I mean, it is, unfortunately, not the stuff that most people want to talk about, but things that can keep Nanny and Shannon and I geeking together for hours and hours. We organize those components then into models, and that then adds dependencies. The minute you put something in and say something is related to something else or must be related to something else, you are creating a series of dependencies, and those conditions must be met in order for the system to work. And we then take those models and organize them into larger architectural components. And this is when we inject the component of purposefulness in here. There is no point at all in building something if it doesn't have a purpose. So that's architectures in general. We do the same thing in the data world. The attributes are organized into entities and objects. The attributes are characteristics of business things about which we are going to create, read, update, and delete information. And there's lots of examples that we can look at around there. Dependencies then gets us into the models, the combination of the attributes, the combination of the ways in which the entities are linked to each other. These poorly structured, poorly structured structures, poorly engineered structures, there we go, are one of the main things that causes us problems in these areas. And finally, the models are organized into architectures, which allows us to get past all of this. So for example, if I talk about a model here, the entity that I'm going to talk about here is something called a club. Now, first of all, you need to know, am I talking about a club like a baseball bat, or am I talking about a club that people can be part of? Well, this is one of those second that are out there. And what does the existence of the attribute club ID tell us? Well, it tells us that clubs need to be identified one from another, and that the club specific information is likely to be maintained, and that some concept, some organization likely exists above the club level, and perhaps some other things. That's one attribute on one entity in one data model. Here's some other things that we can put out there. Maximum period of obligation, number cancer of the year to date, number of members total units sold. These are all things that can tell us about the business thing that we're trying to do. And the entities, as I said, are organized into models. And they now tell us that clubs and products are related by orders to each other. So an order is what gets a customer to buy a product in this particular scenario. The reason we don't dive into architectures here is because they're just too complex. Very, very much information out here that we simply don't want to get into, particularly on this particular piece, but more importantly, when we're talking to folks as well. The people don't generally like to do this though, if it costs them money. And I use the word bleeding on this, we tell you that if you don't fix your data problems, you die death by 1000 cuts. The problem is nobody actually dies from this stuff on a regular basis, but they do lose money. This data in coherence is an absolute crazy amount of things that we end up losing. Were your systems explicitly designed to work together? Probably not. So they're going to only work together through your efforts. And the data model, of course, is one of those pieces that you're going to do. But organizations are spending between 20 and 40% of their IT budget migrating data, converting data and improving data. And in that context, we absolutely have to have models because the models are what tell us what's actually happening in the data. They are the truth statement, if you will. So as a topic, data models are complex and detailed. Nobody really wants to talk about them and most are really unqualified. It's taught very inconsistently across the world. It's terrible. We've got lousy books for the most part. And the subject material isn't even close to what needs to be taught to these young people who are trusting us to get this. And therefore, as a result, it's not well-learned. Now, the most important thing to think about is that in your group, almost everybody had to learn data in their own way. And the way I like to illustrate that is by a little quick YouTube video. You can see there's the link to it. And I'll just play it real quick here. This is a gentleman. This is a guy named Wally Easton who learned how to play the piano by throwing pink balls at a nice little piece. No problem. Great stuff. Glad he did it. Okay. On the other hand, if he was really learning to be a concert pianist, he probably learned incorrectly. What we're going to talk about now is how to get out of what I call a bad data decision spiral. Because most business makers and most technology decision makers are not data knowledgeable. Therefore, they make bad data decisions, allow us to use poor quality data models that results in poor treatment of organizational assets and poor treatment of data, which results in poor outcomes. The most recent example I am seeing of all of this these days is a really sort of stereotypical one. Salesforce does a great job with its software. They like to have it. But if you put Salesforce in and then decide that you're going to clean up the data, you're not going to make Salesforce look very, very good. So let's dive into the idea of data modeling in here. Modeling definition from our DIMBAC, the data management body of knowledge, is that modeling is the analysis and design methods used to define and analyze data requirements and design data structures that support those requirements. Model then is the set of specifications that come out and they represent something in our environment that employs standardized text and symbols in order to do this so that you can hand this data model to almost any other data modeler in the world and they'll be able to look at this. Modeling is also then a process. So we use both the model and the modeling. We use it the same way in architecture. When somebody you pay an architect to create an architecture for you, the thing they produce is the architecture. The modeling approach needs to be guided by two specific principles. One, what is the purpose of the model and what is the audience? And secondly, what sort of deliverables can you put together in the resource amount of time that you're trying to do? There's what type of constraints are you working under? These models facilitate all kinds of formalization where they produce a single precise measure of this. They do communication because everybody in the project is going to need to know the information that's in the data model. It also helps explain what's going on in the business area that people are working with. There may be also you can use these data models to train new business or staff around that and it tells you the scope because it tells you how much we know and how much we don't know. Now I'm going to just sort of slow down for a quick second on this chart. Most of this is not taught in schools. There's usually a chapter that gets to it, but there are different types of models. We use the ANSI spark reference on this and for some reason I don't have a reference on this so I will put that on here before I send these things out to you guys. Let me just make a note on that. But there's a conceptual model that's a very high-level, logical, more business-y type of model. A logical model that is sort of between the business view and the technology view and then there's a physical view which is how are we going to do this in Oracle or in the cloud or in teradata or whatever else we're going to be looking at. It's actually quite important to follow this particular scenario because if we want to change and go from on-prem to cloud-based pieces, the user community really shouldn't have to deal with that change. If you have had to deal with it it means they didn't do a good job with the data modeling as they were doing all of that. Each of these depends on various families of notions. Now we started out in our area and there were Chen, Buckman, Martin and Information Engineering and if you get data people together they'll want to talk to you about this. Don't let them, just pick one. And the one that everybody has picked is the Information Engineering one. Again, Daniel will tell you the tool will do various representations of these. So if it's real important for your organization, go ahead. But most of the other three on the top there are fading away. It's not that they weren't good. It's just that this assumes to be the way everything is working. So let's talk about the specifics now. In a data model you have relationships. A relationship is a natural association between two or more entities. The one at the bottom of the screen here has a relationship between two other entities that are right there. I'll put some labels on these. Room, patient and bed. And now we're going to talk about ordnality and cardinality. Ordnality defines the mandatory or optional relationships. So if I'm looking at this point a bed must have a room but a room does not necessarily need a bed. A room can contain zero or more beds. This is important because if we specify the foundation of a piece of software or a system functionality, that rule will be in place until the system is no longer in service. So when I talk about these fundamental pieces, this is what we're talking about. And if you make a decision here that is poor because you've been given bad requirements, that mistake will likely go on with the system. Here's the other side of the equation. A bed is occupied by zero or more patients. And a patient, however, occupies at least one bed. Hopefully the patient is in pieces. I'm sure that's a joke but we'll take it from there. So given that, what would be the proper relationships between members and clubs? Well, you can't really answer that question because I haven't given you enough context on here. But it could be exactly one, one or many, eventually one, zero one or many, and eventually one or many. These five breakdowns are critically important to understanding the basics of doing the data modeling. And it's going to be one of these. And what decisions you make are going to be important. I'm going to give you a very specific example now. I like to say that we should do informed information investing over technology acquisition activities. So here, for example, is a data model that I was faced with when I was working for the Defense Department many years ago. And it had a business rule, a relationship between two natural entities, a person and an employee. And it says one person can be associated, excuse me, one employee can be associated with one individual, one person, one employee can be associated with a person. Which means if you're going to moonlight, have a second job working for that organization, this system cannot handle that requirement without manual processing. That's a problem. Similarly, in this existing system, we also had one employee can be associated with one system, which meant that if we were doing job sharing, we also had to do it manually. There was nothing in the system that would allow us to do this. This was particularly important. And the reason I keep using this example is because the year I was working, the years I was working with the Defense Department, we had almost 30% of the DoD workforce that also had a second job with another part of the Defense Department. So this manual moonlighting was a serious business problem. And the way to fix this, of course, if we were building the system from scratch, is to change the data model just slightly. And that's what I'm going to do now. So watch this thing has really changed, except that I've altered the optionality in this. And now it says, for moonlighting, zero or more employees can be associated with a person. So even though somebody may be an employee, if we wanted to fill that employee spot with two people, we could do that with this data model whereas we couldn't with the previous because the previous model was brittle. Same thing with job sharing. Once again, I could have somebody who could work Monday, Wednesday and Friday and share that job with a Tuesday and Thursday position, or I could work until lunch and somebody else could work after lunch. Again, different ways of doing this. But hopefully you see the flexibility that has been built into the second data model. I'll put them both on the screen here now so that you can see them as well. The more flexible one is on the left, the less flexible one is on the right. Neither one of them is correct because we don't have all of the requirements here. By the way, any data model is always incomplete without the definitions that are there as well. The model on the right is going to require two more structural loops. And that's why these data structures must be specified prior to building IT development or acquiring it. And I've used the word understanding here a couple of times. I just want to make sure that we hit on this one more time. When we understand an architecture, it means it's understood by both the people and the systems that are working with it. If we don't have that understanding spot, don't have that ability to understand, it is a mystery. And mysteries are generally never good under any circumstances. So we've got to be able to have that between there. The process for doing the data modeling that is typically taught as following. You identify some entities. These are business things about what you're going to create, read, or update information. You identify a key for each entity. You draw a rough map of the relationships that are there. You take the attributes and you throw the attributes among the various entities where they should be. Each attribute generally showing up in one and only one place with the exception of the keys that are there because you need common keys in order to join these. And then this is the part that I have so much trouble getting across to the young people today. This is your first version, but you're likely going to evolve that model. Then if you understand and look at this model, put it away for a bit and come back to it, you'll discover that there are some other things that you haven't thrown into that model that you need to go and add here as well. The modeling tasks that you take are going to start out and fall into some categories, including evidence collection and analysis. And that should decrease as your modeling matures and goes through the process. And that you should do more analysis and less collection. You should also see the declining coordination requirements. Who do I have to talk to? Who haven't I talked to yet? In target system analysis, we should see increasing amounts of understanding what the system looks like and that the modeling focus evolves over time from largely refinement to largely validation. Most of the time when people are doing data modeling, the model should be labeled as draft. And until you have validated those models, it's irresponsible to actually put them out there as complete and done in this. Many people then ask, well, how do you start doing the data models if none of the people in the meeting that you're at know anything about data models? And the answer to that is don't tell them you're doing data modeling. Just write some stuff down, arrange it a bit, and then make some appropriate connections within your particular habits. On the other hand, if somebody doesn't still understand the importance of it, here's a quick little five minute exercise on table handling that's absolutely critical. So table is the collection of data items that we put into the model. We use this representation to start doing this. So here's an example of a model I'm going to carry for a couple of slides in an example. This is coming from the old iTunes or the now called music application. In this case, the song consists of album and song. It doesn't actually consist of length in this case because the length of a song is not encoded on a CD. So when you stick this into your Windows or Apple computer, the system will only know these aspects of it. But start time and stop time, we can figure out how long the song is. And you notice also that's probably a good way of doing it because it's more flexible and it presents less risk to the organization in order to do this. And there are correct ways to do this. The optimization can be done for flexibility, retrievability, risk reduction. Danny had a slide earlier where he talked about the containers. Everybody thinks you go to the cloud, you don't need to organize your data. Oh, yes, you do. And I'm sure he has some stories that he can share with us when we get to the top of the hour on this. Let's take a look at this music database or iTunes database. And this is one somebody might put together and say, hey, record, this is a data record, not the actual record recording. We're not talking about those. But I purchased some things and I've got the price for them. And that sounds really neat. But what happens if I lose record number one in this particular piece? So I'm going to go down here and delete that first record or sorry, record number two, I guess it's deleting in there. And the answer is we lost the fact that this costs a certain amount of money on this. Same here. If I go to insert another piece, and I want to add a song called Scuba, I can't add a song called Scuba until I have actually purchased the song Scuba from somebody. That's going to be a problem too. Again, bad data model design in this. You'll see the words usually unintended and undesirable. Yes, that comes up a while. You can have update anomalies. What if I want to change the price of Scuba from $1.29 to $1.99? I'm going to have to go find every instance of Scuba that's out in my entire database. And if I spelled it wrong on here, I'm going to end up with problems around all of this. Each of these gives us different ways of understanding. And this is a single table that we're trying to do this. So how do we actually get it to work right? We take the tables that are there, and we show Scuba has now been reduced to $0.99. And we match that pricing and move the pricing information out of the purchased table. This is a very small piece of data modeling that we're looking at. But you can see it's critically important to do this. Should it be done? Well, joining tables is absolutely critical. When we start to pull these tables together and pull these things, we can now have the right information put together. So again, Peter is now associated with that particular piece. And the price of Scuba has now been changed. I'm out. The pricing table is a better engineered solution. And again, we can put it in there with one of these five levels of cardinality in order to do this. One more thing that's critically important for this, just be careful when you use codes to use only smart codes, excuse me, only dumb codes on this. Gosh, hang on a minute, I think that's going to play it. I don't want it to play that particular piece. Oh, no, I turned it off. I'm sorry, I have to remember what I did yesterday. Anyway, the big commercial from AT&T there that's actually quite worth growing out there, AT&T did a study at one point and said, you know what? If we keep growing the way we're growing, every woman in the entire United States will have to become a telephone operator in order for a system to work. And that's actually what caused the invention of automated switching equipment because they knew that they wanted to have women using telephones as well as being telephone operators. Wow, what a crazy time we used to live in. Anyway, the point of dumb codes though is that in this case, the instance where we had long distance, I live in an 804 zip code. So it used to be that all the switching equipment could pick that number, zero number and find out where to send it off for long distance. Great thing to do. Here's an instance we had at a university where courses were in there and a dean actually said, oh no, I'm sorry, we can't add any more courses because you've used up all the numbers in that course sequence. So you can't have any more business course numbers that are there. That's crazy. And finally, a third quick example here is a very large organization that you all would be very familiar with. It needs to expand what we call a primary master data item number by a number of data to take their major customer number. And this is going to require them to change over 100,000 systems in order to do this. That is crazy. All right, so let's get into a little bit of more focus around this. The idea here is now let's look specifically at each model and say, what are we trying to accomplish? The process, as I mentioned earlier, it's not what the young people want to hear, but it is an iterative process. Data models are developed in response to specific organizational needs. If I have not built the system yet, I build the data model to create the first foundational piece of the system. If I don't, if I already have the system built, I need to know what's wrong with the existing data model so that I can now meet the changed organizational requirements. And of course, as I said before, everybody thinks this happens just once. We do need to provide a feedback loop on this, but we also understand this is an evolving process. We lock down these data models so that we can actually keep them understanding the difference. So for example, if I say a soda is selected by an individual or given to, now we understand the formal relationship between soda and customer. And the goal is, again, to try and come up with that. They may, oh, we forgot to have them pay for it. Gee, that would have been important, right? Here's another one, bed. All right. And I mentioned to the purpose statements, we want to make sure that we understand that the bed is going to have some sort of gender code assigned to it. So even though beds are not necessarily gender specific, the bed sex to be assigned says that they're going to actually make that a rule. And I already mentioned the third example, job sharing. Do we want to do that? Yeah, exactly. We'd like to do that. But we can't do it with the existing system because it does not permit us to do that. Now, again, I mentioned before, the standard thing is bed, something you sleep in. Well, let's look at the purpose statement instead. We were working with this. And again, it's another DOD example. I use these because they're in the public sector and much time has passed and nobody will be embarrassed on these. But here's the definition of this entity type for a bed. It contains information about beds within rooms. Cool. And we're talking to these individuals about this and seems to make good sense. And then we discovered that they were going to put a RFID transmitter on each of the beds. So we would find out where a bed was located because yes, believe it or not, hospitals lose patients on a regular basis. And that's a problem. We also then said, well, great, what is the name of the room when you push the bed out the door? Oh, that's the hall. Well, which hall? The hall on the first floor, the second floor, the third floor. You can see the system we were able to actually create a block that kept them from doing a very dumb thing that they were going to do here. And that's one of the reasons that the data models will have these statuses of either validated or invalidated around all of those bits and pieces. The reason we have so much trouble here just to give you the background on it is because for years, we've been telling people start with strategy, then put some IT things in there and that your data and stuff will be the tail wagon, the dog at the very end. A wonderful book by our colleague Dave McCone can give you lots of history on this. We definitely don't have time to go into it here. Notice the only thing I've done was shifted the focus to go from IT first to data first. That's literally what we're talking about here. And this gives us the ability now to maximally reuse information around all of these bits and pieces. The problem is when people say data first, they need something that will give them some ideas on this. So I've been working on a piece that I've got trademarked with the Patent and Trade Office that is a variation of the Agile Manifesto. And Danny mentioned that a little bit before. What should be going on? Well, if you're going to do data first, the data programs ought to drive the IT programs. Infermed information investing should drive technology acquisition, stable shared organization data should proceed component evaluation and data reuse over acquisition in new sources should come into this. And those of you that remember the Agile doctrine on this as well, very similarly, it's not that we don't like the things that are on the right, but the things on the left we value more. And that was a wonderful way to say this. Because if we value the things more, those are the things we should be doing then first, before the other things that worked well for Agile, let's try it ourselves in order to work this. Many organizations manage different types of architectures, all I say, just say many. It's really more one in 10 tries any one of these to do this. So if your organization is doing that, by all means keep moving on it. And I'm going to finish up this section here real quick with a couple of examples of models here. So this is a model that's in the demock here but more importantly it's a model that a bank might use to describe the relationships between subscribers, accounts, charges, and bills. And this is a formal representation of it. The primary deliverables become reference material in this case. And I had a CIO that I worked for for many, many years, who said I don't understand your data modeling stuff, but I know that when I go to my office in Singapore and Hong Kong and Tokyo, and I see these models up on the wall, even though they don't speak English, they understand precisely what it is that we are attempting to do in this process. That the models allow us to look at various aspects. Here's a second example here. And the interpretations are that this is a car rental company. Even if we don't know much about it, we can figure that out from the data model, that the agreement is clearly central to what happens. There is no direct communication between customers and the contract on this. A contract must have a customer, but that nothing prevents automobiles in this instance from being rented by multiple customers, which was the problem we were trying to solve for them in the first place. You can see it's an old example, but nevertheless works out very, very well. One quick third example here for a model. You get a little bit complicated. I expect you guys to take these back and look at them on here. We definitely don't have time to get through them. But you can see this one is a commission-based pricing information. It's going to be very difficult to change the customer address because the customer address is linked to the salesperson and the catalog item, which is probably not great. The price isn't included in the catalog, which means this might be more interesting. Excuse me. That might be interesting that variable pricing models come into that. The salesperson is not tied to the order in order to do these things. Nothing prohibits the sale from having multiple salespersons in this. Multiple invoices are allowed for one order. All of these are very, very critical pieces, but they don't tell us a whole lot specifically about what's happening on this in order to come up with the models. These models tell you what's actually going on in the system. Now, each of these models are going to come up in a number of various ways. They're going to come up in this framework here, where what I've put in place here is the idea for you to look at the as is and to be, which we haven't talked about, but the as is is what you've got on the floor. The to be is what you'd like to have on the floor. The desired versus the actual. We've already talked about the three layers, the conceptual, logical, and physical. Again, validated and unvalidated. All of your modeling is going to be somewhere within this framework. So it's a great place to start by thinking where is it we're going to do? What are we attempting to do with this modeling expert stuff? Similarly, model evaluation goes through a very similar process. Again, I've turned the modeling piece on its side here. The technology dependent, the physical as is is now the bottom on this. And this green blob here represents a model where we show students that you go up here and make it an as is technology independent. Sometimes we need to change the requirements on it. And then we move back into here, but we forget to tell them that the reason we do this is because we typically want to, when we get to this stage, incorporate additional architectural components in here. And notice I'm going to change this from a green to a green and orange. Yes, the trick of gradations and colors. But that now that revised model is the one that we in fact want to put out there and make sure it actually does in fact happen around that. All of these things are tradeoffs and our key is a very wonderful bit of wisdom that was passed down to me by a friend of mine who whenever he was doing business with somebody when he made the deal, he would then turn over his business card. And this is the piece he had on the back. All right, pick any two of these. Why? Because you can't maximize all three of them. In fact, it's very difficult to maximize even two of them in here. There are still tradeoffs to be made between price and quality. And these data models can be supportive of strategy or can be not supportive of strategy. For example, in this model here that I had that was a model for an organization I worked for through for one of the recessions, it showed that a manager could not be a salesperson. They had to be one or the other. And that was a problem because we turned all our managers into salespeople's when the economy went bad. So this system didn't support our strategy of understanding sales pieces. Again, we've already talked about the opposite equation. I just want you to think about this in a restaurant context. If our goal for our restaurant is to serve everybody with a particular dish, so the peach cobbler dish is different from the apple cobbler dish, that's not going to be an environment where you can maximize goals. But if you want to have the same thing and you're maximizing speed, you just give everybody the same stupid plate and everybody can work with that particular plate. The modeling ensures interoperability between your systems. If I'm looking at it from a software engineering perspective, I'm only looking at one program. But if I'm looking at it a database modeling perspective, I'm modeling a number of family, if you will, of programs. The question is, who keeps track of the things that are not in the gray database, but that are in the orange and the yellow databases? And if we don't have the ability to keep track of those pieces, it's going to be a problem. So an enterprise model is now going to help us go through the entire process. By the way, I want to do one thing here real quick. Let me put that back up. I missed a point that I made and I wanted to make sure that I made for you all, which is that I could use this title for this piece here, which says data modeling ensures interoperability, but it's also true that data models ensure interoperability. Very key point to keep that in mind. The key for all of this, of course, is who makes decisions about the common use of data across your entire system on how they interact with the various process models, how they interact with the other inputs and outputs. And one data model to do all of this would be ideal if we could get that in order to make it work all the way around. We are just about at the top of the hour and I just want to finish with a couple of things before I invite Danny to come back. Again, remember we've got some event pricing on the books if you go out and drop that into the coupon in there. Let's talk about models again. Data models are used to store and formalize information. I'm putting a little bit up there on the side in the upper right hand corner. That's how they got the Easter Island models, the dummies to move. And this was an experiment, a model, where they actually moved one of the Easter Island statues up and down around the process. So you store and formalize information, you filter out extraneous detail, you define an essential set of information requirements, you understand complex behavior, you gain additional information from developing and interacting with the model. And this is the part I can't get the young people to understand it's the iteration piece on it again. You can evaluate various scenarios and monitor and predict various systems and controls. I want to give credit to my colleague Karen Lopez. I got these slides from her from a while back, at least the genesis of it. The goal must be developing a shared model of the IT and the business understanding that the data sharing is automated and therefore it's highly dependent on successful engineering and architecture. If we don't have those pieces, it won't work. The modeling characteristics should evolve during the analysis phase. It should include motivation, purpose statements in the modeling. So not just it's a bed, but it's a bed that we want to use to track patients and turn to beds are really bad ideas to track patients in hospital situations on here. That you want to use the modeling to the exclusion of anything else. If you don't start out with a data model, it's going to be very difficult to have any components that work coordinated wise in the long run. So the use of modeling is much more specific, much more important than the argument over which modeling variation are we going to use. And as Danny will tell you, just pick whatever you want because our tools will take care of everything on there. The models have to be viewed as living documents, but they also need to be searchable because they are reference documents in order to do this. And the utility is absolutely paramount. I've literally had a knife thrown at me at one time where somebody wanted some clip art on the on the screen and I'll tell you that in a story when we can all get back together because we are going to all get back together at some point sooner or later on this. But the idea was I was showing this data model to a group of general officers as in generals, right? Well, I'm sorry, generals don't care about your stupid data models, Peter. But I was encouraged through a slight little bit there, I mentioned somebody through a knife at me, that maybe some little clip art would be helpful so that when we say these are the things that you have in the Department of Defense that float and these are the things that go under the water and these are the things that fly, the generals would have an idea about this. Data modeling is about communication. And that's what we're trying to do. I failed that test when I was younger of trying to communicate with the generals because they looked at it and saw which was the Charlie Brown teacher bit that we had there as well. So we're back at the top of the hour. Let's bring Danny back on and see what sort of questions you guys have, reminding you that we've got a couple of events coming up next month. We'll be doing business value through reference and master data strategies, data stewardship, and then we'll go on to data quality in September. So things are moving fast. This is wonderful. And I will now turn it back over to Shannon and invite Danny back on. Peter, thank you so much for this great presentation. If you have questions for Peter or for Danny or both, feel free to submit them in the Q&A section of your screen. And if you see somebody's already typed out the question that you wanted to ask, but you have to just put that little thumbs up icon so you don't have to type it out all over again. You can just upvote that. Fun little feature there. And just to answer the most commonly asked questions, just a reminder, I will send a follow-up email for this webinar by end of day Thursday with links to the slides and links to the recording of this session. So diving in here, guys. What workaround for code first design to data modeling? Sorry, is that a statement, Shannon, or was that? I know. It poses a question. You know, what workaround for code first design to data modeling? Oh, what workaround? Okay. Yeah. Danny, jump in any time. I'm going to go to that slide that shows the two data models side by side, where I was looking at them. So if somebody has done, put this slide back up since we just talked about it, somebody has done the model on the right with the less flexible data structure, then that means whatever system is built off of that. And of course, you know, you're going to have a system that's built off of more than one data model, but that system will be built so that it's going to have extra processing that it has to do because the data model was done poorly. So when somebody says they've done code first, I hear that they haven't thought about the data. And, you know, if I was able to go sell short in that particular project to sell stock short in it and things like, I just don't predict that's a very good approach. Danny, have you seen that? Things like, you know, you've worked with agile a lot more than I have. Yeah, we do see this, especially as we've, you know, started supporting, you know, the NoSQL environments, where they tend to take the data and just deploy it and then start writing code. So I guess, you know, the first step is, is reverse engineer, right? So if they've gone code first, there's something out there, you can hook into it and get a start in terms of what was the data model that they didn't create or think about, but was resulting from the code that they wrote. And then, you know, what happens then is you'll get some data people in and do exactly what you talked about, start to, you know, modify those requirements and ask some questions because really that becomes your roadmap to, is this how you want it to work? And then they'll say, oh, no, we didn't expect that. So, and then you'll modify that and then you'll iterate those changes back in over time. And that's, that's, you know, not just in the NoSQL world obviously across the board, but I see a lot of that approach, especially in NoSQL because they were told out of the gate that they didn't need a data model because it was NoSQL. And the fact of the matter is there's a structure behind everything and that structure is important because it shows you exactly what it is that you need to agree on between the business and all of what they want to get at the end of that journey. And Danny, who is it that tells them that they need a data model? Sorry? Who is it that tells them that they don't need a data model? Oh, you know, I never like to name names. Quite of me personally, but I just, there is a group of scientists out there. The key here was that, that, you know, a lot of the benefit, especially with some of these modern databases is that it doesn't take as much to get the thing up and stood up and get data into it to start getting data out of it. So, you know, they'll just basically ingest this data and by ingesting it, it creates the, you know, the, and I don't even want to use the word schema because they'll yell at me for that too, but they'll, it'll create the schema that's required for that data and then they start coding off it, right? So, whereas, you know, and then they'll find that they'll be unsuccessful and then they'll go back and start thinking about some of those basic things. Now, you know, we've got buy-in and partnership from all of the big providers of those new databases because they know for the success of their customers that you still need to go through that process and that process of doing that, you know, sort of business interaction, start, you know, naming out your entities, your attributes, your relationships and then turning that into a, you know, embedded structure, you know, that doesn't have all of those relationships and referential integrity. I saw in one of the questions but it still needs to be done with code. So, at the end of the day you have both pictures and you can deploy in the NoSQL world, but at least you have the full story in terms of how you got there and what the requirements are behind that for the folks that are going to be going to code. Danny, I think your microphone is very directional. So, next question, if you can talk straight into the screen. Yeah, this is strange. I haven't had issues before with this, but I'm going to, I will try to be less in my movement. Danny is a great speaker for any of you that have the opportunity to go hear him when he does his talks at ADDW at other events as well. Oh, Scott, you're here. I always feel smarter after I listen to you talk, so I like to go after you. I'm still learning from you too, so I don't have the mutual lovefest here. But anyway, let me go back to one thing that you had on your slides, Danny, that I think illustrates this point real quick. And that is that you said in one of yours, you had three containers and you were trying to look at throughput in this. So, you just mentioned that the new way that people are told to do this is you don't need to talk to the old guys, Peter and Danny, because we've got this new stuff that works magically and you don't need their skills anymore, right? So, Danny showed an example and it had some containers on it. Container you can think of as a structure that contains the data, so maybe a schema-like property. But the point that you had there was that the throughput was very problematic. So, one of the reasons we want to do this stuff was because we're automating the function. It means it has to happen. I've got customers that literally will run a billion queries a day. That's a lot in order to do this. And if your response on that query is slow because your container design was just the design that was brought in and never optimized to do this, how are you going to get from three containers to five or two or whatever it is you're trying to get to unless you have a model that can bring it back and take it in? So, if that case, if we are optimizing for speed, right, we will absolutely need to make sure that we are not loading things into memory that don't need to be loaded because that slows things down. And that we're not going out and grabbing things in a random fashion because that also slows things down. It's just a very, very big problem in those areas. I was just going to say, and it gets even more heightened, and we're seeing this a lot now as people are moving to the cloud. They just assume that the cloud is efficient. It'll give you all the resources I need when I need it. But a good design is even more important in the cloud because unintended consequences, they think they know what they're going to pay for this service from the cloud. But things that perform poorly and all of the rest of that are just upping your costs. And then when the bill comes in, management's not happy. It's pulling back down to things like data structures and the queries that result from them that make a huge impact there. So it's kind of interesting to see all of that come full circle. You're describing a degradation system situation there as well. There's another situation that's even more critical, which is a failure. And so that's where we go back to the point of referential integrity. If somebody's telling you, and this is a true story from Pacific Bell many years ago, and you happen to have three telephone numbers for a customer, and the customer makes a charge, how do you know which of those three customers pieces that you do? That's part of next month's seminar that shows up there. But this concept of referential integrity is also embodied in the data models. And I didn't talk about it a lot, but you mentioned it as well. It's the idea that if we don't put it together in a way that works, it won't work. It sounds ABC, right? But believe it or not, you and I both have seen lots and lots of these examples where people try to do crazy things. And there's no way that you don't have sufficient referential integrity just as you don't take a chainsaw and go stick it into the walls of your building without knowing which ones have the structural support and which ones are just decorative in nature. That's a great question. That's great. And at one point, Danny, you mentioned aren't no SQL and a structured database advising to move away from table relationship linking and say they are faster than the structured database. Well, that's absolutely part of their message, right? And trust me, I'm not saying it's not true or it is true. What I do know is that you have to understand the integrity of your data and how it hangs together no matter what type of technology. So they are definitely focusing on more of the real-time use cases, real-time analytics while you're moving around a website, coming up and telling me that you should listen to that if you like listening to this and all of those things. But there's a price to be paid. And again, if it's not in the database, the database will be quicker. But you have to make sure that that data is still made whole at the end of the process. So they're moving away from relationships, they're talking about embedding things and stuff like that. But what happens when you delete a customer? Is it going to work correctly? And they do have mechanisms, alternative mechanisms in the database for some of that. But unfortunately, it goes back to where we were when we were looking at flat files, which is a lot of that integrity has to be coded into the applications. And so, you know, pay me now, pay me later. It really depends on what's driving you. You can get there. At the end of the day, the data model is what's going to tell you what you need to do, whether you do that through a database management system or through some other mechanism that works in line with these more modern databases. Let me just emphasize what Danny said there. So you can create a data model that will support a strategy. If your goal is speed, then you're going to design it in one way. If your desire is cheapness, low price, then you're going to design it in a different fashion. And those are going to be able to meet future business requirements of some level. But if you design it cheaply, then it will probably not meet your speed goals. So again, pick two. And even then there are still going to be some tradeoffs that need to be made. Great question. So, is support for graph databases going to be in future releases of Erwin? Danny, do you know? Absolutely. So, you know, we've moved our sort of specialty NoSQL capability into the standard tools. Now you get all of your types of data modeling in one. Right now, we're primarily supporting, you know, document databases, JSON. But in the next release, we're going to have some column there databases that we're getting high demand from our customer base. And then graph databases we're absolutely working on. And you should see some things in this next release as well. Again, the goal is that the right type of modeling for the need that you have with the important piece that it's all integrated together and that we provide you the sort of normalization, denormalization and other mechanisms that are required that help you move from one one type of approach to another. So absolutely on the roadmap. And you'll continue to see as more, you know, more exciting ways of managing data come out, the way to get that into your modeling practices so that you can apply all those great benefits of understanding standardization and you really fade it out right there, Danny. Yeah, it's just going to say, you know, integrate it in with all of the rest of the good things that have been proven over the years around data modeling that make it a very valuable approach. I love it. It's awesome. All right. So how are you seeing data modeling adapt to support a world of software as a service applications with predefined and often proprietary underlying data stores and models where only API access is available? One of the things that I urge all companies to do when they're considering software as a service or any other type of it purchase is to request from the vendor the logical model of that service that they're using to do this so they can determine whether that will be a good or a poor fit for their organizations. You'll learn two things very quickly. Half of the vendors will come back and go, what's a data model? Right? Which gives you probably don't want to do any business with them. And the other half will say, hey, there's a smart customer. Here's our data model. Here are the options. You can do these kinds of things within this particular, you know, we're getting the vendor community is starting to realize better on this. I use that very much as a readout tool. What do you think, Danny? I agree. You know, that's really sort of a technology question and a connectivity question. So, you know, one thing that we're starting to see people asking for is to model microservices and things like that. So, you know, we've always gone directly to the database catalog, you know, and really at a metadata level. But, you know, APIs, there's no reason why we can't, you know, propagate and deploy that data model through APIs. We're starting to do it now. We have our own. But, you know, we're starting to make those connections with some of these more modern technologies. So, again, it's, you know, it's one thing, there's sort of two separate things. It's, you know, do you data model? Yes, data model. How do you connect that to the environments that are important to people? That's, you know, our job and, you know, our competitor's job. And we're moving in that direction and seeing requirements of that as people are starting to deploy these models multiple different ways. It does tell, too, that there is a very much caveat and tour situation out there. Many people are being seduced and have been for years by various, you know, technologies that are going to be just wonderful and everything can solve what's going on. The real key to this is it does give us as professionals more options that we can use. And it's important that we learn about these options, but we don't pick them up just because they're new. There was a reason that we could data modeling in relational database. It's easy to teach. It's easy to understand and it's easy to utilize in this. These others are more advanced beyond that. Doesn't mean that data modeling isn't useful. Data modeling is still going to be incredibly important in there as well. Again, it is fundamental. Data modeling may be about communication, but most people find data models difficult to read. So they lose the power to communicate. So how do you reconcile that? Great question. I'll give you just an example here, which just was literally a model that went into one of Clive's books that we were talking about. And, you know, here I am. This is working through this and I was a manager on this. And also, I was an employee, but I couldn't be a manager and a salesperson in this particular type of situation. That data structure did not provide a good piece. And I used this model, which is a very poor data model, to explain that to everybody and say why we couldn't watch. Literally the question was, why aren't you turning in your sales, Peter? That was that particular example. Danny, you got another one? Well, it's not so much of an example, but I think that I've heard that sentiment that was the question, if I'm understanding it correctly, that data models, they're too hard to understand whether that's from a size and scale from the amount of detail. Couple of things that I tell my customers and really holds true is we're all in sales. So no matter what role we play, we're selling something to somebody. Whether you're selling a data structure that's going to meet their needs, however, so know your audience. There's a lot of things in a data model that are important to you know, that really means nothing to others. So one of the things that we've learned from our wealth of people using this type of technology every day is having the ability to have multiple views of the same thing and really getting very targeted with those views, both from things like colorization, amount of detail, the types of things. You can look at a model in many different ways and get the result that you want if you know who you're talking to, what's important to them and equally as important, what's not important to them. Because if you get them with a bunch of technical gobbledygook and they're just wanting to get the business answer, you're going to lose them. And that's where I think a lot of that sentiment comes from. Absolutely. And again, even in these three types of things, the one I showed you just before Danny was giving his was a conceptual model. It didn't really show you much detail as well in there. But if you try to show somebody a physical as is database in there, unless they are also a data modeler, they're probably not going to understand it or want to learn how to do it. So when you're talking to management, you have a different way of interacting with them than when you're talking to other technical people. And again, as Danny said, it's very, very critical to make sure that you have that ability to do that. I know you guys want to hear the knife story at some point, we'll save that for another event or maybe when we can get together in person kind of a thing. So continuing on here. So how do you envision the data modeler's role in the world of microservices design? I was going to say a reference librarian. Does that make sense Danny? I think there's an element of that. Again, it's sort of, it's many hats of a different color, but in a lot of ways it's still the same hat, right? So because it's still data, it still needs to meet a need, it still has implicit requirements behind it for it to be not garbage, as Peter says. So it's similar to as data modeling has moved from waterfall to agile, we have to change our practices. But what we deliver, the net of what we deliver is still the same. And I think it's still going to be the same with microservices. Now, will there be different integration requirements around that model? Will there be more, again, I'm no longer in that role of architecting the product, but I think the same benefit is there. So from that perspective, the role is the same. It's a matter of integrating into their work style and into their technology stack. And again, that's our job and your job is taking the right technologies that enable you to deliver that value to the environment you need to support. I would extend Danny's remarks slightly and say that everything he said is correct, although I do think businesses are going to be less tolerant of geeks in the corner who know everything. I think it's going to be increasingly important as all of us continue to work in the new world that's unfolding on all of these and to try for a little more articulateness in there and to try for illustrating the data model with clip art in some cases as appropriate. It's not a part of the model language that helps users to place what they're talking about in the larger context. I mentioned earlier the CIO that never really understood the data models, but understood that the people that worked for him used them. And that was the smart part on it. So A, supported the project, and B said, you tell me what you need to make sure my people can do what it is they need to do in order to do that. Absolutely critical on this, but I think the idea that we can't stand up and that we have to be able to give some sort of cogent explanation, and it may be that you still have a person that you hide in the back room and doesn't talk to people, but when you're presenting these models out, they're going to want to see things in it that mean things to them. So they're going to want to know, why is it that I can't do this? Well, here is the reason. We have only one room and it can have one or more bed in it, but a bed can't appear in multiple rooms. That's just reality. Every system that's built on this data model from this point forward will have to follow that rule. A bed can be in only one room, by the way, at a time. It seems to be that the crowd and data lake experts and vendors are saying that you really don't need a data model. Some folks that talk about streaming the data into the data lake and ETL versus or ELT versus ETL, no need for those pesky data modelers, ETL folks and DBAs to slow down your projects. I heard you giggle there, Peter. I was going to ask Danny, what's off the top of your head, what's the success rate of data lakes in your experience? It's again, a first shot and especially with that kind of approach, I would say maybe a 5% everybody got really lucky. The fact of the matter is, and I don't want to bring up all the standard euthanisms of the swamp and blah, blah, blah, but the net net is it's data. If you don't understand the data, it doesn't matter where you put it. Just because you put it somewhere where you can go get it later doesn't mean you remember what it is and how to use it. Again, I have a lot of forgiveness for technology marketing, but it's always about making things way easier and you don't need to do all that stuff that you needed to do before, because now you have ELT, so now your data is all in its native format, so you can keep going back to it and applying more use cases. Great concept, excellent, right? A place where we didn't have to transform all the data to make it come together like we do in a data warehouse, but at the end of the day, so just people are going to put their arm into the dark lake and find out what they come up with in their hand. They still have to know what's in there. They still have to know where it came from, what it was, what was the business purpose, all of the rest of those things, so if you need to create a data model for the data lake, you need to create a logical access layer that people can understand and get the right data in the right form when they need it, and that at the end of the day comes from a data model because if you don't have the data model then the data is not going to be right and then it's just more muck at the bottom of the lake. And I'll offer a little bit of insight into that as well. The failure rate in there is, as Danny said, very high in your first experience at it, and again that's one of the reasons why you end up coming back and talking to us more because there is more in order to do this. Data lakes work really well when you have a work group that consists of somewhere between five and ten data scientists supplemented with a properly trained data munger, a data engineer that's able to do it, and the reason it works is because five to ten people will develop a shared common data model in their heads. It's just a natural byproduct of the way they work within this, but when you take a data lake and try to make it the one single source of truth for your entire organization, that is as about as big a failure and an easy a failure prediction as Danny and I I think can make without putting words in his mouth in there, but somebody says yeah we're just going to put all the data in the data lake and everything will be fine after that. You know we kind of set our watches and say well this is going to be a customer in a couple of months, you know they're going to be looking for help. All right I think we got time for about one more question here. In one situation data modelers try to define surrogate he's is it really necessary in today's world as organizations are data driven that support different formats of data? Wow detailed technical question so unfortunately the answer is going to be it depends on exactly what you're attempting to do and trying to come up with things, but in the case of trying to define your model are you looking to enhance your existing system? Can your system handle that type of inquire? Sorry I don't know why mine just flipped off there but normally that would be a catwalking over the thing. Anyway whatever you're trying to accomplish is really the answer that's going to give you the right guidance around it. Great Danny. You know absolutely there's a lot of you know imperatives for you know surrogate keys if you don't have a natural key that's going to work or doesn't you know take all the scenarios that you need to you know look at and those are questions that again you're going to have to make decide or decide for every use case and for for every set of business requirements the data is going to answer. You know I don't think that there's a surrogate keys are better you know natural keys are better I think natural keys are more understandable especially to business people but if there's a requirement for surrogate keys for you know to make it operational then you know you just need to make sure that you you have that well documented why what and then and then just make that part of people's vernacular. A language that we use to talk about data absolutely control vocabulary we can even go to but that's another webinar right Shannon? Indeed. Real quick what Danny was talking about there is something that should go in your business glossary so that's back to your data governance component so absolutely have that in there it's a real help for everybody. All right so I am going to sub in this one more question here in the couple minutes we've got left so how do you ensure that the data modeling data models stay current and which are the tools in which help in reverse engineering the physical models from the database to the modeling tool. Tammy I'll let you do that one because you know more about the tools. Sure sure so you know if you've got a good data modeling tool it absolutely that should be a basic capability right after you know providing your notations that are well proven and work and visualization capabilities then you need to have that connectivity so the other thing that's really important is to have a you know compare and synchronization because you know you could reverse engineer but then you need to be able to figure out what you reverse engineered how does it relate to what you have in the model that you think that it is today. So having that compare and sync is the ability to you know go there and figure out what's changed maybe you know somebody deep in the dungeons has changed aspect of you know how the database was implemented and you need that in your data model the reverse engineer is going to do that for you and then again compare and then figure out what makes sense put it in and if you're doing model driven development then again you can do the same thing at the other end as you iterate new changes and new requirements into that data model and then into the databases that that represent that model or operationalize that model so those types of capabilities are you know basic in data modeling tools today definitely in ours and and you know but it's the compare capability on top of being able to reverse engineer so if you're looking at Erwin it's called complete compare very configurable allows you to look at what you want to look at and ignore the things you don't or keep things out of sync if it makes sense but it's a very important capability to make sure that your model is that something that isn't going to just it's it's not a throw away I know in a lot of cases historically it has been but there's a lot in these technologies that means you can now really truly do model driven development always have those models up to date and and be the place where you really get to the new capabilities and innovation that you want and relating it to current events Danny the Amazon outage this morning they came out and said it was related to their bad configuration management practices so there you go configuration management and version control can keep Amazon from running and boy if Amazon's not running we get worried don't we yeah and you know data models if you're using the right ones they like the right technology you can you can be integrated across all of that so that you know everybody gets the benefit of understanding what's coming down the pipe and what's connected and what the dependencies are so that configuration goes out right the first time well thank you both for this great presentation and thanks to everyone for sponsoring today's webinar and helping to make all these webinars happen really appreciated and thanks to all of our attendees for being so engaged in everything we do and all the great questions but I'm afraid that is all the time we have slotted for this webinar just a reminder I will send a follow-up email by end of day Thursday with links to the slides and links to the recording thanks everybody I hope y'all have a great day thanks guys thanks Shannon so much Danny thank you for your pleasure absolutely good to see you