 Hello and welcome, my name is Shannon Kemp and I'm the executive editor for Data Diversity. We would like to thank you for joining today's Data Diversity Webinar, Trends and Data Modeling, the latest installment in the monthly series called Data Ed Online with Dr. Peter. Partnership with Data Blueprint. Now let me give the floor to Steven McLaughlin, the Webinar organizer from Data Blueprint to introduce today's speakers and webinar topic. Steve, hello and welcome. All right. Thank you, Shannon. Hello, everyone. Thank you for joining us and thank you for finding the time and your busy schedules today for our webinar Trends and Data Modeling. As always, a big thank you goes out to Shannon and Data Diversity for hosting us. You guys are the best. We're going to get started here in just a few minutes after I let you know about some housekeeping items and introduce your presenters. Today we have a one-hour presentation followed by 30 minutes of Q&A. We'll try to answer throughout the session. Steven, if you're talking, we can't hear you. There we go. You guys hear me now? All right. What was the last bit you heard? I'll just pick up from there. We said we heard hello. Oh, fantastic. All right. You can hear me now, huh? Thank you. All right. We'll run it back. We'll run it back here. I'll start here. We have a one-hour presentation today followed by 30 minutes of Q&A. We'll try to answer as many questions as the time allows, but please feel free to submit questions as they come up throughout the session. To answer the top two most commonly asked questions, yes, you will receive an email with links to download today's materials and the webinar recording so you can view afterwards. These materials will be sent out within the next two business days. You can find us on Twitter, Facebook, and LinkedIn. We've set up the hashtag data ed on Twitter. So if you're logged on, feel free to use it in your tweets and submit your questions that way. I'm standing by on the Twitter. We will also keep an eye on the Twitter feed and the Q&A channel here to answer our questions in our post session email for any questions we may not get to for today. All right. As long as you guys can still hear me, I'll go ahead and introduce our presenters. So first up is Dr. Peter Akin. He's an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint, and he's written dozens of articles and eight books, the most recent of which is Monetizing Data Management. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as the top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. He often appears at conferences and is constantly traveling today being a great example. Peter, you want to tell us about where you are today and just how close the timing was? Yes, so luckily U.S. Air got us down here, got up at 3 a.m. to get down to San Juan, Puerto Rico, where I'm down here for a wonderful academic conference called AMSIS, the American Conference on Information Systems, a very exciting event down here. Awesome. Well, we're sure glad they got you there in time. And sitting right next to me is Mr. Michael Lee. He's a data consultant and certified in a number of areas, including Data Vault 2.0, Kimball ETL architecture, and a certified data management professional. Michael has over seven years of experience in designing data quality solutions, improving data management practices, implementing data governance frameworks, architecting data warehouses, and implementing system upgrades and migrations. Michael's worked in a number of industries, including telecommunications, banking, insurance, defense, commercial manufacturing, and international shipping. And Michael has recently just returned from traveling abroad as he was just married. So all of you can give him congratulations if you get the opportunity. Michael, how are you doing today? I'm doing great. I just left where Peter is, so we're covering from that. Did he leave the region a mess, Peter? Because I believe Michael would have left that place a disaster. They were talking about Hurricane Michael down here. Maybe I didn't connect that. Okay. All right. Hi, everybody. Thanks for joining us. Again, today, one of the things I think Michael's going to talk to you about is the Data Vault methods, which some of you probably aren't as familiar with. But let's start off with our standard credo on these. Today we believe data is our most important, underutilized, and poorly managed data asset. It is our sole non-depletable, non-degrading, durable, strategic asset. And if we use this language to talk about it, it actually helps because people that we're talking to actually are starting to pay attention and listen to this. I also keep seeing things like data is the new oil. And while it's a good thing to think about, it's really not a good way to think about data because oil is a production function. It runs through tubes and doesn't do anything. So a better one I've seen out there is data is the new soil. Maybe you plant something in it and something good grows at some point in the near future along the way. I also saw this one recently, data is the new bacon. Hey, whatever it is you need to do to get people to pay attention to this stuff, we're very happy to do it. And I do want to call out Stephen's latest creation here. We're going to start putting his new tattoo-themed data logo, another piece on our literature here so that you guys can see some of that because people are actually relating to that. It's a very, very nice way of articulating the whole process. Collectively done as data managers. I was just going to say, you have no idea how happy that makes with Peter. That's a funny design. It's a great design. Nobody else knows what the heck we're talking about, so maybe we'll make sure we include that in the next webinar here so people can see it. The real key, though, with us as data managers is that we're trying to strengthen your collectively organizational data management capabilities, provide solutions that are appropriate for your organizations, and build lasting partnerships with the business all the way up and down here. So three sections for today. First one, business to data with the relationship. We'll talk a little bit about what is a data model. What are the conceptual, logical, and physical levels that we talk about typically in this? What are the issues that poor data modeling can introduce into the process? Then Michael's really going to dive into this next section, different data models, different uses. Again, it's having the right tool and understanding where to apply that tool in the right context. I think he's got a very nice articulation here on this. Then we'll talk about some trends moving to the business. A little bit of self-service comes up in this. People are always looking for ways of making this process more efficient and effective, and sometimes that's possible. Sometimes it's not. We'll talk about Agile because all of us have encountered Agile in the world out there. Agile is a very good way of developing software rapidly. We'll talk about the data sharing world, pattern to reuse, and metadata modeling to finish them at the top of the hour. Let me dive in here and talk specifically about a data model. The real key here is that a data model organizes these data elements and standardizes how the elements relate to each other. You can read about it in Steve Hoberman's book. We've quoted him here on this. A data model is a wayfinding tool for business and IT professionals to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment. Now, that's a nice definition. It literally means we've got to be on the same sheet of paper. When you are doing IT, and everybody today is doing IT, if you are not on the same sheet of paper, you have problems. Now, that usually ends up with more money for somebody, but it doesn't end up with good business results. That's one of the things we really want to work on in terms of going forward on this. From a challenges perspective, what we really look at is that poor data modeling can cause data quality issues downstream. Next month, we're going to dive into data quality in a little bit of detail. I'll show you some examples of how that worked out for one of the groups that we're working with on this. If the model doesn't represent the business concept, it impacts confidence in the business, confidence in the data. People, on the other hand, don't communicate between the system and the data. They just blame the system and it doesn't work. We'll talk about an example of over-normalization, as we'll get a little bit further into the presentation here today. Lack of flexibility can cause difficulty in aligning the evolving business environments, integrating future data into the organizational environment, talk about complicated re-engineering, creates tremendous operational inefficiencies. And I like to talk about really just by a thousand cuts in this context because little things that happen billions and billions of times in some of your organizations add up still to an awful lot of time and effort around all this. Again, limiting workflow transparency and creates a bunch of workarounds which are really problematic. When you look at all of this, you're not able to understand the impact of changes on your organization, being a big problem as well. So data models are expressed as architectures. Most of you, again, would be very familiar with this concept. Attributes are organized into clumps. We call the clumps entities, sometimes objects. The attributes are characteristic of things, and the entities and objects are things whose information is managed in support of strategy. If you're not doing something in support of strategy, you can legitimately ask the question, why are you doing it at all? These attributes then organized into entities. The entities then are organized at the next level into models. You can see we're going from the most granular, the attributes, to the highest level, the most abstract, which is our ability to understand architectural components in here. These poorly organized data structures often constrain organizational capability to deliver information. And these models then, unfortunately, the architectures inherit the characteristics of the poor models on this. So it's a very, very big challenge within all of these concepts. People just not really understanding. And again, I said it earlier, but I'll repeat it here. If the attribute level of things don't match up, things don't work. And when they don't work, it causes problems in here. So let's talk conceptually what a data model is. Again, it represents these concepts, entities, and relationships at the highest level of abstraction here. And it identifies the domain and the scope of the data and helps business users understand core data concepts. And I use this a lot by telling people it's really, really key that a business person actually be able to understand that that's information. And that if an IT person is proposing a project for them and they're moving forward on the project without this shared understanding literally being on the same sheet of paper, then it's generally not a good prognosis because everybody's smart, but everybody understands at a high level how these things work or don't work. And everybody comes away with their own perspectives on it. So we'll come back to this in a little bit, but I'm going to show you an example of how this actually works. I'm going to give you a quick second here to make sure these slides catch up with us. So here's a map, a data map from... this is actually from the VA, the Veterans Administration's data model from the late 80s, early 90s. By the way, the Veterans Administration has one of the most integrated healthcare systems from a data perspective, precisely because in the late 80s and early 90s the group I was involved with, the Defense Information Systems Agency, spent a fair amount of time and effort putting together an integrated data model for these and putting that integrated data model together is a great way to ensure that what we're doing will work in every environment. So this data model here that we're looking at now is you can see very high level and conceptual, and I'm just going to dive in on one area here to show you one example of one of the things that we encountered as we were doing this. It's the relationship between the entity admission and discharge, and I've highlighted it there in red. So if we look at this, this is some more detail from that data model, and you can see here that the definition is that an admission is associated with one and only one discharge in this case. And so then you have to say, okay, what does that mean? Well, what it means is our admission here contains information about patient history and a discharge in a table of codes describing disposition types. So one of the things that we had to have an argument about, and it was a very good argument, is being dead is that a legitimate reason for being discharged? And if they didn't want to portray that concept in that fashion to the users, it was going to be a problem. So I hope everybody follows that example. It's not necessarily that is being dead a reason for discharge, but it was a matter of portraying that information. Discharge reason, one of the reasons, had to be unfortunately dead. Now, again, that aside, what it really does is allows people to have discussions about exactly what's happening within that context. Another one that we did for that same example there was a bed structure here. So here's the sample entity bed. You can see it as a bed description of bed status, of bed sex to be assigned, a bed reason reservation, and an association within a room. And the purpose, by the way, I encourage all of you to use something I learned from Clive Finkelstein many, many years ago. We talk about definition, somebody will define a bed, right? A bed is the thing you sleep in. But the purpose statement actually gives reason and helps us get a closer link back into strategy. So our purpose statement here was this is a substructure within the room, this is a substructure within the facility location. It contains information about beds within rooms. And one of the interesting things that we discovered in the thinking about this process was that the VA wanted to have a RFID system in place so that they didn't actually lose patience. So every bed would have a self-identifier that you could find it wherever it was, because, believe it or not, people do get lost in hospitals. Now, thinking about this example here, though, for a minute, what it also meant was this provoked a different discussion, which was good discussion, but allowed us to get going on the process. What do we mean by a room? Clearly, a hallway had to be a room. And one other thing had to be a room according to this definition as well, and that was an elevator, because we have had instances of patients getting stuck on elevators and just simply riding elevators up and down. Now, I'm not presenting any solutions here. I'm simply using this as how people went through and understood these types of models. I use these conceptual models to communicate with the business people. The next one up is a logical data model. And again, it drops in here into a context that says it's going to include, in this case, more detail, attributes, names, relationships, other metadata that we're doing in here. And just a quick note, we're going to develop these models using UML. There are at least three different variants of modeling notation. The only important choice in there is that you both parties decide to use the same modeling notation as you get into this. So again, here you can see we've taken an entity called customer and an entity in address and put more meat around the bones, if you will, and allow people to better understand what's actually happening around all that. Finally, of course, our physical model, which is the as is, the piece we're actually going to implement in here, and now what we've done is we've turned around and made IT-specific names for the business concepts. So this is where translation can occur and can become, again, challenging and problematic if you're trying to A, understand what somebody else did, or B, improve upon an existing system as most of our IT expenditures are done in here. So again, we have a customer, but it's now become a T customer and a T customer address. And then we have other types of addresses that we've included in here. I want to shout out to Scott Ambler for letting us use these particular definitions. He's working with us in the data modeling community and helping us understand these things. Now, another important component for all of this is to understand that the reason we do this when we are making changes and improvements to our modeling, and I mentioned we did this, about 80% of our dollars in IT are spent improving existing things as opposed to building new things, which only accounts for 20% of our dollars. We want to make sure that we follow a pattern here, and this is a pattern that we look on this, where we get a physical as-is model, as the bottom left-hand corner on this, and then we move up to our technology-independent level, a logical as-is model, then we move to a logical to-be model, and then finally come back and drop down into the physical to-be model. It's very important that we understand this particular process, but I'm going to show you it's a little bit more complicated than we show, and if you come out of college and university with this kind of an understanding, you're actually doing very, very well in terms of how things are working in there, because what really happens here, I'm going to use this green blob on this box to show. The green blob moves up again to the logical structure, but when we move up into that logical as-is area, we don't just do it in a single stream, we actually incorporate a fair amount of other material. These are where we integrate other logical as-is data components in here. So I've done this by illustrating that the box changes from green to orange, and I've included orange in there, and then we move on to the next phase of what happens going down. Again, it's a lot more complicated, it's a little more subtle. It is important to understand that every aspect of what we do in data modeling frameworks can fit into this next chart that I'm going to show up on here. And if people have a hard time understanding this, it's really, really useful to put this particular chart up. Stephen, you could double-check me on this, because I think we picked an error up last time we looked at this, and I think I got the error corrected on it. But everything that you're doing in data modeling is going to be as-is or to be conceptual, logical, or physical, and I'm adding in another dimension here as well that we incorporated in the DOD modeling components that we were talking about earlier with the Veterans Administration, but other parts of DOD. And that is that the models are validated or unvalidated. Our goal is to have, in this case, a validated as-is physical data model. If we don't understand what that is, it really represents the truth or lack thereof the truth in order to do this. If you're having trouble figuring out where things are, what's going on, this chart here can be very, very useful. I'm going to get ready to turn it over to Michael here and get him to tell us about different data models and different uses here. Michael, you're going to start off with normalization, which is everybody's most fun thing, right? Absolutely. Normalization is always fun to talk about. I'll let the slides catch up, but really there's five, six-ish forms of normalization depending on who you talk to. The only one that we really care about is a third normal form. That gets into... And the reason I say it's the only one we really care about is because that's the one that's really prevalent in data warehouses out there for those who know the differences with the Kimbell and Mon methodology approaches to data warehousing. It really focuses on normalizing things out so that everything in the table only focuses on the key and once you get above that, you're reaching the point of diminishing returns as far as data warehousing is concerned. That's why we focus on the third normal form. If you go to the next slide, we talk a little bit about just the different Kimbell and the InMon and then a little bit into Datavolt at the end of that that focuses on how things are structured. Just give you an example of some use cases, that kind of stuff. Here we have a use case of an artist and songs and CDs and where they were recorded and that kind of stuff. Really strong in transactional systems and storing transactional data, which you don't necessarily put in a data warehouse, but a lot of your transactional systems will be in third normal form as well. That's one of the business kind of best practices type thing. But there are some pros and cons of third normal form if you move to the next slide. Really some of the pros of it, if you're looking at the model just as it stands, it's a lot easier to understand sometimes than some of the other models, particularly by business uses, which is really what we want to go after as modelers. But you get a lot of reduced data redundancy really enforcing that referential integrity and not to go off on too much of a tangent, but the referential integrity is hit or miss when it comes to warehousing because some people turn it on, some people turn it off, so that's a question there. But just indexing things and really improving, to focus on improved performance. The cons, obviously, I decided to focus on improved performance, but the joins can be expensive in a third normal form because everything is so normalized out. It's hard to kind of grab everything from different parts of the model and bring them back in together. So because of that, it's hard to scale these models up. So the next type of model is the star scheme, and that really gets into trying to answer some of the problems or approach the problem differently than a third normal form model would. Here we have an example of a star schema where it's really just a sales company, and I'll let that slide come in. But star schemas are not going to spend much time. I know a lot of people have seen those or heard about them, but they're comprised of fact tables that usually contain quantitative data about some kind of transaction and usually dimension tables storing detailed information about those transactions. Really heavily optimized for business reporting. You'll find a lot of BI systems exclusively designed to sit on top of star schemas. Really also designed for online processing, really fast storage. There's lots and lots of documentation on Kimball modeling, which is Kimball and star schema are kind of synonymous when you talk about data warehousing. But star schemas also have their own pros and cons. And the pros of star schema is that it's a real simple design. You know, having the fact tables and dimension tables there, it's usually not a lot of tables. It doesn't take them too long to see all the tables that are involved. You have some fast queries that come in when they are looking at the star schema. And most DVMSs are optimized to hit into star schemas. And I'm going to let Peter slide. I think we just lost, so I'm going to... I just restarted the thing. It's right back. Okay. Sorry about that. But yeah, so talking about star schemas and talking about the pros, the simple design, the fast querying, things being optimized to go into the structures. Again, cons, really a lot of cons on the back end of star schemas. And what I mean by the back end is the design and the setup. Once they're running and once they're going, they really work well for end reporting and that kind of stuff. But a lot of questions on the front end into the design as far as what goes into a fact table, what goes into a dimension table, when do we load them, how do we track history, all those things, one of the major points being the re-engineering of it and we'll get into some of those issues later. And then your data marks really only focus on one fact table. It's hard to have data marks on multiple fact tables. You can do it, but then it really loses its viability and its usefulness, to be honest. The third one out there, at least from a data warehousing perspective, is a data vault, which some people may have heard of, some people haven't heard of. It's been gaining a lot of steam, I think, recently. It's really kind of viewed as a go-between between the two previous mentioned models that we talked about, really heavily focused on the auditability and the historical tracking of data, as well as tracking the lineage, the source, when it was uploaded. It kind of talks about all the data all the time, as opposed to one version of the truth that's one version of the facts. There's lots of different taglines that you can throw about data vault, but it's mainly a hub and spoke model, so that's where you see kind of the influence of the star schema, instead of the fact table and the dimension tables, you have hubs and spokes, but it is a lot more normalized than the star schemas, which is where the influence comes in from the in-mount of the third normal form approach. And so it's really designed to be, again, just a source to store your data. It's not necessarily highly valued as a BI back end, but it's more of a enterprise, let's get all our data centralized so that we can do a bunch of stuff with it, is really what it's used for. Looking at the pros and cons of it, it is real simple for integration. Data vaults are very easy, it's just adding another hub or spoke to the equation, whereas when you talk about star schemas, you're talking about adding dimension tables, new fact tables, how do they relate to existing ones? It's hard to kind of figure out sometimes. But the data vault, you know, the full data lineage, we talked about the auditability. Some of the cons of data vault, obviously, is that a lot of the complication, and it says back end, they probably should say front end, it's pushed up. The data vault says, let the end user figure out what the errors are. And we'll talk about that a little bit in the next slide, but it's difficult sometimes to see from a business user standpoint what's going on. One of the other big problems with data vault right now is the tool support. Not a whole lot of tools that can automate some of the data vault loading right now. Usually you have to find some qualified resources. And again, we'll talk about that why don't we have to go to the next slide. So this is kind of a quick and easy comparison of the three as we look at them. And it's interesting because the things on the left, the scalability, the flexibility, the auditability, all of those things apply to poor data modeling. You know, we're here talking about data modeling, and if you recall the slide that Peter mentioned earlier with the poor data models, every one of these things go back to that, the flexibility, the auditability, being able to see impact analysis, all these things apply to data models as well as data warehousing. But you can kind of see some of the points I touched on as far as the third normal form really has some challenges in the flexibility. The third normal form and dimensional modeling really have challenges in flexibility and re-engineering. But the dimensional, the star scheme is really optimized for that presentation layer. Once you get it working, it's fantastic. The problem is getting it working. The vault is very easy to scale, not really easy to read from the business point of view. So you can't really look at one of these and say that's the one we want to go with because they each have their pros and each have their cons. And the green boxes that aren't really checked in, those are things where they're making improvements. If you look at the data warehousing field and the data modeling field, you're getting support from vaults. You're getting auditability capabilities for third normal form and dimensional modeling. So these are all the current trends that we're seeing in modeling as well as some new ones that Peter's about to get into, but really have to look at what you need. I think that's the most important part when you're talking about data models. Before you start developing data models, what is your business objective? What are you trying to do? Which is really important. But a lot of these modeling forms really deal with structured data. They're dealing with things that are familiar for the past 20 or 30 years. So some of the new modeling forms that are coming out are really getting in this on unstructured data, some NoSQL, some big data. And I think I'm going to throw it back over to Peter to jump into that. Sure. Michael, one of the things you mentioned to, and I want you to come back to it a little bit later on, is that you are certified in the data vault method. And what that means is not just a technical certification but an understanding of how to use the tool given the context that you're talking about. So we'll come back to that particular point. So I've been doing a lot of lectures lately on the post-big data error. And I need you guys to understand conceptually what's happening here. And to do that, I'm going to use the Gartner 5 phase hype cycle that many of you are familiar with. You start off with something that's called a technology trigger. It rises to the peak of inflated expectations and then goes to the trial of disillusionment, eventually climbing up the slope of enlightenment and getting to the plateau of productivity. In other words, everybody starts off and they say, it's really great, and then it really sucks, and then somewhere in the middle we find out the answer. So what I'm showing you now is an example of a Gartner hype cycle from 2012. Now I'm looking for the slide to catch up on this one here. But in 2012, it was very interesting in that Gartner was positing text analytics of that circle there in red as being on the curve, getting ready to go up the slope of enlightenment. So if you were in text analytics, it was great. Similarly, however, if you were in social network analysis, you can see it's up there circled in red in the upper left-hand corner, you were about to go over the plateau of inflated expectations. And similarly, we can look and see that predictive analytics and web analytics were also in good shape and they were relatively mature as far as Gartner was describing it. So let's say you use these cycles. Now let me go to the next one, which is same year 2012. And what you'll see here is that in July 2012, big data is in the upper left-hand corner here, circled in red again. And it is in slight blue, which Gartner has coded at the bottom of being two to five years away from peak height. Gartner in 2012 thought it would take big data two to five years to get to the peak of inflated expectations. It's a very good useful piece of information there. It means we have at least two to five years of height before the bottom pops out of the market and things go oopsie on that. So that was 2012. Now I'm going to move on to 2013. This is just one year later. And again, Gartner is very, very good at this. So we'll take them at their word for it. What they're showing here on this next year's chart is that big data has moved slightly up the height curve. It's closer to the peak of inflated expectations. However, they've now shown that it is, excuse me, five to 10 years away from that peak height cycle. So in one year it's gone from being two to five years from peak height to being five to 10 years from being at the top of peak height. And that's an interesting change in just one year. Well, things are moving really fast with this big data height cycle because now I'm going to move to 2014. And it turns out that in 2014, Gartner has taken big data and pushed it over the top of the height cycle. Even though last year they thought it was five to 10 years away from the peak of inflated expectations, and now they're showing it's five to 10 years from the trial of disillusionment here. Now I show you that not to say that big data sucks. It's actually a very, very useful development. But it's really much more of a technology. And again, it's going to depend on people like Michael having the expertise in your organization to be able to come in and talk to you about how big data techniques can, in fact, be useful solving business problems that you have. Michael, back over to you. So when we're talking about big data and no SQL, we're really looking at, you know, and again, we want to focus on the data modeling aspect of it, but a quick overview. You have the document stores, AKA key value stores, RDF triple stores, graph stores, column stores. The main things they're focusing on are the high availability and the partition tolerance that sometimes you don't get with some of those modern or, I guess, current technologies. RDF triple store, you know, document key value stores really helpful if you're dealing with XML documents or just regular, if you're getting feeds from a vendor in a document form, there's, in the past, you had to parse out that data and put it into some kind of structured form. And now with some of these technologies, you can just take it as it exists and get it into storage so that you can start processing it and doing things with it. And it allows for the scalability of really amping up that volume. The RDF triple stores kind of more of just a common sense approach to dealing with data. Bob likes football. You know, it's really oriented around let's take two things and how do they relate. So that's really getting in there. You'll find a lot of that behind the semantic web kind of thing when you're trying to talk about websites talking to each other and how can they do that. They really have to find the things that are common, speak a common language, that kind of stuff. And so that's where some of the query languages get in with the triple stores and that kind of stuff. The graph databases are really interesting, in my personal opinion, just because you get into, it's very similar to a triple store where you're still talking about how do two things relate and really that's what most data is talking about. But graphs put out some very interesting visuals. So if you're looking for a visual way to look at your data outside of BI, just something to where I can gain some insight just by looking at the data and looking at it visually, graphs are fantastic for that and I think we have an example of one of those later on. But graphs also boast some fast querying by the way they handle indexes and all that kind of stuff. Column family stores really very similar to your current store, current database technology, current database technology, storing things from rows and column stores, as you can tell by the name, storing it in columns. It really just changes up how you index things and how the things look inside the back end. But it's really a very similar concept to modern technology. But we are here to talk about modeling. So the next slide gives you some examples of those different techniques and what a model might look like given those. Obviously, we gave the example of the RDF triple store. That's a pretty basic model, just how do two different things relate and in which way do they relate? Bob likes football. Pretty easy thing to model. The document store is a tricky thing to model because, again, how do you model documents and how they relate to each other? So usually it's separated. Instead of tables, you have to think of it as a document. So this document comes in and this type of document, a user document relates to a contact document, that kind of stuff. The grass store, again, I really like the visualization that it represents because you can add those visual elements to it like a person's picture or a name or that kind of stuff and really boost the visualization of your data. But it's very similar as you can see to the triple stores in that it's two objects and how do they relate. And then the column store, again, that's probably going to look the most familiar to somebody. It's really just broken up by the different columns and the different segregations of the data. So that's really the different models that you're looking at as far as who can help you if those are some of the things that you're trying to move towards. I think on the next slide, we have some generic providers that are out there in the industry today just dealing with the document stores, the grass stores in memory, something that we didn't really talk about, and that, again, doesn't have much to do with modeling. It's more the style of technology. So if no sequel is where you're leaning and if that's why you're here today, definitely take a look at some of those. But really pay attention to modeling because modeling doesn't go away in no sequel technology. Here, again, we have an example, and I just threw this in there because I love the graph models and I love looking at this and I love Marvel, so I'm a geek. Wait, wait, you're a geek and you're giving a presentation on data? Wow. Don't tell anybody. But for anybody that's seen all the Marvel movies and theater, and this is I think a couple years old, this actually comes directly from Marvel. It shows how they model their data internally and really kind of shows you how you can segregate and visually see your data. That big red dot that doesn't have any circle around it to the left of the middle there is Stan Lee, which represents all his connections to everything that's going on. Those are just some interesting things to work out, but that's some of the newer technology that's coming on. Again, models aren't going away. Just because you're dealing with newer technology, the models aren't going away. It's just how they look and how we utilize them. It's really key. With that, I think we're going to get into the next section talking about some newer trends in data modeling. Peter, I guess I can throw it back to you for that, if you want to handle it. Just to introduce the next section, but for those of you that are listening, if you want to learn more about this, the Smart Data Conference is coming up in just a couple of weeks. In San Jose, Stephen, I know you're going to be there. Michael, are you going out there as well? Yes, I am. Both of these guys will be out there talking at the conference and interacting with folks and giving a fairly good, I think, approach on how to look at this. And also there to learn, too, because this stuff is moving very, very quickly. What we have realized is that the old stuff can do certain things, but we have new needs. You mentioned a very good one just in the graph theory, which is I want to connect something, but the only thing that I have to represent them is a picture of that person. I'm not sure how I would actually identify that person. Well, graph databases may be a very good way to see some sort of identity resolution in that context. So moving on here for the last couple of minutes on this as we wind our way back to the top of the hour, what we're really trying to do is to look at these models and to say how are they useful to the business. And what we want to do is give you all the listeners the ability to be heard in this space. So the idea is, of course, the models need to add value. They have to be perceived as being valuable. And one of my first pieces here, and I'll let you chime in after this, Michael, I had worked for a group up in New York City for a while, and the CIO up there was a very, very smart person who just trusted that we were doing the data right. He never really wanted to learn about it himself, but he would say to us and the rest of the group on a regular basis, I don't know what the modelers do, but that I know that when I go to our office in Singapore or Bangkok or wherever else in the world I happen to be, Sydney, and I see their data models hanging up on the walls, I see people making use of the data models. There's pretty decorations. They are actually being used as products. And so the data modelers in my mind are absolutely essential to what's going on in my operation. Again, you can see there this particular CIO understood that the models did add specific value, that it was not just a documentation process, but in fact it was a way of visualizing and understanding how these models were useful in context here. Michael, you want to do the last bullet on this one? Sure. I'm actually going to touch on a couple of the bullets, but models need to be part of the process. I really can't emphasize that enough. I know in some of my experience with some of the clients I've worked with, the modeling was really viewed as, yeah, we need to do it, but it doesn't really add that much value. And it didn't add that much value because it wasn't part of their process. They were trying to document it at the end and it became shelf-worthy. They made the model, they threw it on the shelf, and everybody promptly forgot about it. And so when they came back and did production upgrades and changed some things around for a couple of years and, oh, now we need to do a system upgrade, well, does anybody have a data model? Oh, yeah, I think we have when we made five years ago that is completely wrong now and it just creates havoc. So it really needs to be part of the process and I think we talk a little bit later about how to make it part of the process. But also, the last part, assisting improving capabilities, not hindering them, this really gets into the self-service BI as a huge component of this. Models shouldn't be, again, going back to the example I just used, they viewed models as a hindrance. It wasn't fun or shiny, just like Peter said. It's not interesting, right? But done correctly, models can improve your capabilities by providing that business insight, by allowing your business users to kind of take control of their data. And I know that's a scary thought sometimes inside the IT world. We don't want to give up control of our data, but it really needs to, we really need to start working together with business and saying, you know, how can we help you, how can we help the enterprise as a whole? And so that really gets into the self-service BI and I think that leads into the next slide where we talk about the virtualization and the self-service BI going on with that. Although I'll make one more point real quick on there as we move to the next slide, Michael. The self-service BI is sort of a code word. What it really means is that we've never worked with a smart business person who, once they understood what a data model did, didn't use it as their primary source of documentation. In other words, they would say, they would demand to see the data model when they encountered a new system. Smart business people can be taught these data modeling concepts. It's not that you want to sit down, explain to them the ins and outs of normalization. But if you show them that this represents a map so they get how to give them, empower them to get the business value out of the data that they're trying to get out of, they will absolutely be on your side in terms of how to use these. Yeah, that's absolutely correct. And the self-service BI, like you said, giving them that visual of the data so that they can then grab and run with it, so to speak, and not grabbing in one with it and changing the model and doing all this and all that, being able to get at and use the data that's inside that model is really what we're focusing on. So when we talk about the self-service and virtualization is a big part of it, virtualization is really that abstraction from the physical layer and kind of deals with a lot of integration issues. It deals with getting data into a presentation-ready and understandable format that the users can use it. A lot of that gets into some ETL and things that aren't necessarily related to model, although it could be related to metamodeling, which we talked about in a little bit, but getting it into some of those virtualizations and getting it into business terms, getting it into reportable views, things that people actually need is really helpful from a self-service BI because now me as a business user, I can go in. I not only understand everything I'm looking at, but I can look at the model that you've shown me and look at what I'm looking at, and I can do the math in my own head, so to speak. I can see that, oh, this field that you've titled STR underscore five relates to my sales statement, and it relates to this. So it really deals... it really gets back to some of the impact analysis that we mentioned earlier on about the problems with data modeling, because now changes to the back end can impact our front end once we have some of those self-service things set up and the virtualization set up, and we need to know where that impact is to help bridge the gap, so to speak, between IT and the business, but self-service BI and virtualization does present an interesting problem, and you see in the second bullet point there, I talk about the presentation data models. It's really interesting, and I haven't seen a whole lot of them in some of the clients that I work with, maybe because they don't have the virtualization environment set up, but it'd be really interesting that once you have that virtualization set up, you then create these presentation models specifically for the business users. So rather than handing them this, this logical model that maybe doesn't have all the attributes and they're like, oh, well, where's that attribute? Oh, well, it got renamed when we made the physical model, right? So now you got to go look at the physical model and they're completely lost. There's this idea out there of presentation data models where you actually modeled your virtualization layer. You model the... It includes some of that metadata, but you model, hey, here's what you're looking at. Here's how it relates to this other thing that you're looking at, and if you want, you can include some links to the physical attributes, but it's not necessary. It's really about how can we help the business understand where their data is and how they can get it and how it can help them, is really what the self-service and virtualization is all about. Once you get to that level, it really increases the agility of the corporation as a whole because now they're not waiting on trying to go figure out where the data is, what I can do with it, and you as the modeler, you're not really trying to build these holistic models that are going to become nice shelfware that nobody's going to use, right? And so that gets into agile when we talk about doing modeling for the business, right? It's not an academic exercise. So incremental builds of the model is really... There's some good sides to it and some bad sides to it. One of the things that's not is it's not an excuse to create a bad model, and what I mean by incremental building of the modeling is a lot of systems are incrementally built, whether through agile and they do scrums or whatever, however they're building the system. They incrementally build the system and sometimes they build it and they look at it and they try it out. Oh, it didn't work. We need to go back. We need to change it. We need to do this. So you're working as a data modeler with the development team. You're working with the programmers, the business people. Everybody's working together and you're building that model as they build the application. And so you're incrementally building this model as part of the agile process. And how do we do that? How do we get into that agile mindset? One of the best ways that I like is the 80-20 rule, where you just sit there and say, hey, what's 80% of what you're trying to accomplish with this scrum or with these two weeks of programming or this module that you're building? What's the bulk that you're trying to do? Yes, there's going to be some exceptions. There's going to be some things that occur once a year that we're going to have to model for and figure out how to handle. But let's get the core functionality first. And then as you add those exceptions into your code, we're going to add those exceptions into our model. And so it really talks about building them together. So then the other approach to agile is the coding first. A lot of programmers today like to... They don't like to deal with data people. They don't like to deal with models. They just want to code. And so I wanted to talk briefly about the code first problem. When you ignore the data modeling as part of the process and you just say, let me code it and we'll come back and model it at the end, the problem is that a lot of your business rules end up in your code. So I'll look at a model and I'll see these two entities relating and maybe it's a one-to-many, maybe it's a one-to-one, but you say, well, how do they relate? Well, which one needs to be loaded first and why does that one need to be loaded first and all that exists in the code? And so for a business person to understand it, you got to go look at your code and it just becomes a mess, even if it's well-documented, nobody wants to read code, right? Unless you're a programmer. And if you are, I'm sorry that you're listening to me. But you have some re-engineering concerns there. Again, if everything exists in your code, it's really difficult to parse it out and that's when you're pulling those old models off the shelf and they're all wrong because some coders been tinkering with stuff for the past four years. There's obvious governance concerns because nobody knows the lineage of the data. Where is it coming from? How did it get here? And just again, lack of business insights. Where is my data? How is it getting to where it is? And what does it mean, you know, the impact analysis? What does it mean if I change something? It's a big problem. So how do we address those issues and stay on top of the database first and how do we make the programmers happy with that? Some people may be familiar with database first coding and I think Microsoft has some things around database first. But it's really particularly good if you're trying to build web apps. But what you essentially do is you build your model and it generates code for you of how you set up in your project, how do you populate the model, and then it generates the code for you and some of them can even generate the web front end to basically populate your data model. So you figure out what data do I need, model it and then build the system off of that. And that's a real powerful tool that I know we've started utilizing for some clients here. Just makes a lot of things really easy, especially when you get into showing them that data model and they want to know how things work and you can point to a field on their web page and say that's right here in your data model. Really helps when you're talking to people like that. But those are some of the things that we really want to focus on with agile and what agile then gives us is it gives us the when we're building those models and we're developing our apps around them, it really gives you the ability to share data across different enterprises, across different applications. We've talked about, you may have heard out there, the enterprise hubs and all that kind of stuff and how do you talk to things. So that gets into the next slide of the data sharing world and Peter, I don't know if you want to say some things about that and some of the design patterns. But we're going to get a little close on time here so I want to make sure we hit these things up. If you want to speed agile up, make sure you understand your data requirements first. So that's sort of a nutshell of that previous slide. This also, of course, then the real question is, are we doing any less sharing in today's environment? The answer is typically not. We're doing more sharing. And what I wanted to illustrate here for you is that there's a lot of design patterns that you can start. If you look at it, restrooms generally don't occur in the same place in every building. They generally occur in the same place in every building. And the reason for that is because it's cheaper in order to do that. If you put the restrooms in the same place, whoops, in each building starting with the wrong slide there, what it gives you is the ability to make the building with the least overhead possible in order to do this. And these design patterns exist in data as well. So again, why do the restrooms go in the same place? Because you don't run pipe into everybody's building. And of course, you do the same thing for electrical wiring and HVAC and all the rest of the things. There are several funny stories that we can tell you about that we don't talk today. What we do want to do though is sort of finish up around this concept of reuse of patterns. And the idea is that in most data modeling exercises, one-third of the data model is common throughout the entire business. One-third contains fields that you're using to interface with the external partners that you're doing. And one-third of it is going to be specific to the organization. And that's very closely with what Michael was talking about, the 80-20 rule there as well. These patterns are things that we just want you to be aware of that you don't have to start at the beginning. There are off-the-shelf solutions that you can use. And it's kind of the difference between everybody having a blank sheet of paper on your environment and actually having a piece of paper that you can start to edit with. So there are several books out there. I've got one on XML. I've been friend David Marco. And Mike Jennings put a terrific book together called The Universal Meta-Data Models. Len Silverston has a three-volume set and data model resource book. And David Haye really gets credit with starting where this all started from in the first place with his original data model patterns book conventions for thought. They are terrific places to go. I'm going to call it Len's and David's books in particular because when you buy their book, they come with models. In other words, there's a CD, yes, back to those days, that is inside the book. It comes as part of the book that shows you what these things look like when you start off with. And the point is, when you start off with a particular model here, you typically say, how would this work for our organization? It becomes a lot easier to make changes in a case-tool environment. Most of these are in the Irwin environment. There are some for the other ones as well. But the ability then to say, hey, what is going on here? And again, both Michael and Stephen will tell you they've been in dozens of clients over the years. And there's not a whole lot that's really new. And I say that in the sense that there are new and interesting things that are happening out there, but they are evolutionary steps. Very much following an evolutionary pattern in this. And so consequently, it's much better to simply say, all right, let's look at what some other people have done. You can even go to Google Images and pay data model for accounting and things will pop up there. You know, it's just wonderful the way the web has helped us out in this area. So what we've talked about today is sort of, what's the relationship here? And we'll give you very, very brief descriptions of conceptual, logical, and physical models. We can go into a lot more detail on each of those as well. Similarly, Michael talked about the traditional data modeling uses and normalization concepts that go into there. But really what we wanted to get to were the differences. And I think that one slide that we showed you earlier, we'll come back to it, I'm sure, with some of the questions, where they've articulated the differences between the third normal form, star scheme, and data vault, are very, very useful in terms of planning and thinking about your projects and trying to figure out how to go. But that all of us are learning how this new SQL stuff is coming into play, these big data techniques and technologies that are here. None of those new developments, however, change trends that we're seeing, which is moving more to the business value to looking at self-service and visualization, to looking at agility in the organization and trying to figure out how we can work better with the very good movement that is Agile Software Development to look at these real-world sharing partners and to go into different patterns and reuse-type processes where we don't actually have to start off from the very beginning. So Michael, why don't you take this one here? So, you know, we're just wrapping up things. Data modeling is important to get right, and I've talked about it just recently with the trends and back to why we should care about poor modeling. If you make it part of the process, it doesn't need to be a hassle, but it's really important to get right because it's dependent on business cases, your organization maturity, flexibility, all those things are heavily impacted by data models because when you're coming back down the road a few years later and your models haven't been kept up, it's just a complete nightmare. And I've seen that on several clients and previous jobs just trying to understand how the system works without data models as impossible. There's many technologies and ideas available to help solve a lot of these problems. We didn't really get into software out there, but really all the software can handle it. It's really about the process and the people and how they approach what they're trying to do. But really the final point, the really big thing to take home is that don't try anything unless you've considered everything that's out there. We had that pros and cons slide for all the different warehousing modeling. It's really important to figure out what are you actually trying to do in any project, not just modeling, but what are you trying to accomplish? What is the goal that you're looking for? And the model and the system should be focused towards that goal. I'm going to put that one slide back up there as we move into the Q&A piece here at the top of the hour. But I really do think that this is one of the best discussions. But if you're out there and looking at a data project and you haven't at least explicitly sat down and considered which of these three architectures can be most helpful, like was already told you several times, that none of them are perfect. It all involves trade-offs somewhere in your organization against your business processing. But this type of an analysis here, more importantly, if somebody's coming and proposing some of this for you, this would be a great question for them and say, well, you know, what are the pros and cons of doing it this way versus that way versus the other way? And if they can't answer those questions, I'd really suspect the quality of the advice that you were getting in that context here. So we're at the top of the hour here, which means we have to go to this slide. Oops, that's wrong slide. There it is. All right, well, thank you guys so much for that. It looks like you guys had it just right down to the hour, which is awesome. So that means for you panelists who are with us, it's time for Q&A. So you should see a Q&A window feature right at the top right of your screen. You should be able to submit your questions through there. I've already gotten a couple, so without further ado, I'm going to go ahead and jump right in. I think this first one's a really good question. So Peter and Michael, these different stores seem to be useful for portions of your data or specific aspects of a domain space. Is it typical to end up with a mixed model overall? Luckily, you've probably seen more actual as-is environments recently than I have. Do you see this mixed environment being prevalent? So there is a concept of a mixed environment, but it depends on which of the three stores they're talking about, and I'm not sure if they were talking about NoSQL or the other ones, so I'll address both. You can have, for example, when I was talking about the star schemas and the data vaults and all that kind of stuff, there is things out there where people mix data vaults and star schemas. For example, that comparison that you had up there where the data vaults are really good at the flexibility and the back end and all that kind of stuff, not so hot on the presentation layer, whereas the star schemas are fantastic on the presentation layer. Thank you. Really hurting on the flexibility and re-engineering. So there's definitely use cases. We haven't seen them simply because data vault is really a newer methodology that not a lot of people are implementing, but I've definitely seen use cases where you might have a vault on the back end and a star schema sitting on top of the vault for your BI tools or whatever. As far as if the question was addressing the NoSQL stores, absolutely, you'll see a grab bag of mixed models depending on what you're trying to accomplish. You might have a data store that feeds documents in and you want to know how that document relates to something else. So then you set up a triple store or you set up a tagging system to relate back. So there's lots of reasons why you would have a mixed model. I wouldn't try and force it. I wouldn't try and say, well, if they all are good for one thing, let's just have all three and we'll cover all our bases. So I really wouldn't force it. I would only, again, start with the problem you're trying to fix. Find the best solution for that and then move into that agile mindset and say we'll take it as it comes. If we identify a need and we look at the options that are out there and say, man, this thing would work really great, use it. Don't try and force it into something that it shouldn't be in. If you need a mixed model, go for it. But if you don't need a mixed model, don't try and force it. Let me add one other point on here, too, Michael. And that is that while it is appealing to look at some of these things and say, okay, vaults offers flexibility and so it might be a good idea to do some preliminary work in the vault and then once you understand more about your business problem, then move into one of the more conventional forms. It also requires, then, your team to have multiple sets of expertise, just the same way as working in multiple forms of big data technologies requires your teams to get the export. So we want to do a proper balance. Again, like when Steven are people who live and breathe this stuff in case you can't tell on this. And so they're going to have this type of capability and it's a really, really fun thing that we do where we just talk to people oftentimes where they'll call us up and say, hey, I'm really trying to figure this thing out. It's a really great intellectual exercise that we go through in both of these areas to try and figure these things. So there's very often no real one-right answer in order to come up with it, but you try to get the best answer that you can come up with, given the circumstances that you have, but you also at the same time want to look at it not just from a technology perspective, but to say in addition to that, how is it that these things can help my business and also what types of capabilities do I have under my belt at the moment right now? And again, as I mentioned before, the NoSQL conference, sorry, the smart data conference coming up next couple weeks is a great place to go and have conversations with people like Steven and Michael, but also the other hundreds of people will be out there presenting on this type of topic. So great question, good start-off on that. Yeah, thank you guys. I thought that was a good one to start it off with. Here's a quick lighthearted one. All right, I will. Here's a lighthearted one to take us away for a second, so I think this is directed at Michael. So Spider-Man has a weak relationship with the Fantastic Four, huh? So I'm going to caveat my answer and say I'm more of an X-Men kind of guy. All right. But the model that was shown to you was given by a Marvel representative at a data modeling, I think it was actually at EDW. And it was really a subset more heavily focused on the cinematic universe. So somebody can correct me if I'm wrong. I don't think I saw a whole lot of crossover between the Fantastic Four and Spider-Man movies. But yeah, there is probably a lot of crossover between all these characters and comics and things, so I'm not going to spend a whole lot of time on it. I could clearly talk for another 15 minutes. We'll cut jobs there. Okay. Stephen, any insight on that one? I'm not as familiar with that universe as you guys are. And the point here is that somebody looked at this and said, you're a conclusion from it, right? Is that a reasonable conclusion for them to draw off? Right. That's right. I can't speak to that either, to be honest. I've only seen a handful of these movies myself. The thing that the back on topic seems to take out of the question is that you can look at your data in two seconds and draw a conclusion, like you were saying, Peter. You can look at this and in two seconds say, don't those two things relate? I thought they related it, and it's a really good visual way to look at your data. Yeah, because looking at this, I certainly agree with that observation, but having not seen it, I can't comment on that. So let's get back into something a little more technical. This is a good one. What do the panelists think about including business rules in the database via constraints versus enforcing them through code? So there's two parts of that. One of them is that oftentimes we see business rules embedded in databases through stored procedures. It's a very efficient way to do it, but it also hides those business rules from visibility, and that can be a big problem. And while we're re-engineering databases on a regular basis, one of the things we'll go in and check for is, hey, you guys do this, and they'll say, oh, no, we don't do this, and we'll move it over, and it won't work, and we'll go back and look, and sure enough, we'll bunch of stored procedures around there. Now, if you're talking about enforcing it with referential integrity, which is the way, of course, it was designed to be done, that is, in fact, a much better way of doing it than, in fact, enforcing it with code. If you enforce it with code, it requires somebody to know the code to know the code is there to know the code functions. Whereas when you move the database, the database is usually moved completely separately from the code. So it's a huge, huge issue in terms of what we're doing and almost nobody is building these things with the idea that they're going to maintain them for the next couple years, and having lived through Y2K, we can talk very explicitly about the types of challenges that has created. You guys want to comment on that as well? Yeah, I think when you're talking about where do business rules sit, and I know, I've seen lots of blogs, and I think people at Data Looper have written lots of blogs about this topic. I know it's kind of a common answer, but it really depends. There's issues to consider. Security is a big one. It's obviously a lot easier to maintain security on database objects and store procedures and that kind of stuff than it is to actually get into the code and try and manage specific lines of code and executables and that kind of stuff. But one of the other things is really when you're talking about business rules, you're talking about soft rules or hard rules, and what I mean by that are hard rules that have to be there. They're the referential integrity, like Peter was talking about. Soft rules are things like the format of the data. You can have format masks on the front end that really deal with business logic around data. That kind of stuff doesn't need to be in the database. But really it's about performance when you're talking about business rules and also the governance around the business rules. What are they? Why are they there? What happens if they change that kind of stuff? So really the documentation around it, it's not as critical do they belong in the code, do they belong in the database, as it is, are they documented? Do we know why they are business rules? Why does our business operate like that? And what happens if certain parts of our business change? How much rework is needed, right? If it's existing code and it's duplicated all over the place and it's something that changes regularly with your business, that's a whole lot of work. And that's where we get into the re-engineering and that kind of stuff. So yeah, hopefully that gives some better insight on that topic. Cool. Okay, moving on. And this one, we may or may not be able to give a strong answer here just because the nature of our experience does tend to be based on our clients. And since this is really a new technique we're still sort of approaching things academically. But someone asked, can you please recommend a data modeling tool which supports the JSON structure? We know that's a major structure used in document databases, things like that, key value. Michael, can you speak to that? I don't have a lot of experience other than a little bit of playing around with Apache drill, but I couldn't really give you a pros and cons compared to other tools or say why you should use it over something else. It's kind of just been the nature of my own plinking, my own playing around. So, and this would almost be, I guess would have to be a lengthy conversation about what they're really trying to do. But, you know, JSON structure getting into really the only experience that I've had with JSON is when you were talking about ETL and grabbing either feeding in JSON to a system or pulling it out of a system. It really gets into the ETL aspect of things as far as modeling tools. I think you can represent JSON with anything in a model unless you're specifically talking about some of the database first coding and can that spit out JSON code. That I'm not as familiar with. If anybody listening has any ideas, because they can talk about it up there, the Q&A section will certainly repeat it out and tell everybody else. Absolutely. Great. In fact, Eric Keaney just commented, says they use free mind for conceptual modeling of MongoDB documents which relates well to the JSON. I am unfamiliar with that tool, but we'll definitely write a note there and add that to our massive spreadsheet of technologies here. Okay, so I'm going to jump to one here that says can you elaborate a little more on the third normal form models? Specifically, when are we over normalizing? Supertypes and subtypes maybe. Let's only take that one. Sure. Go ahead. I was just going to say, you know, as far as when are we over normalizing, I think it's, is your normalization practice becoming an academic project? Are you normalizing simply for the sake of normalizing? That's really when you're getting into over normalization. As far as maybe you already have a structure and you're trying to tell, well, is this structure over normalized? That's when you have to get into performance issues, because the whole issue is people accessing the data, people viewing reports, that kind of stuff. So if it's too normalized, those reports aren't going to have good response time, and that's when you need to look at kind of denormalizing and getting it into a structure that's going to be good for reporting purposes if that's an issue. Now, if it's not an issue, then, you know, by all means, go for the normalization. And I think that's why the third normal form, which is kind of in the middle of that list, is utilized heavily because it's just enough to make it make sense, but it's not so much that it's going to cause optimization issues. Great answer, Michael. Let me elaborate a little bit further, too, because it's very pertinent to what we're talking about. Again, I mentioned before, I'm down here at an academic research conference. So these are the information systems professors in the world gathering here in Puerto Rico to present their research. And one of the things, a good friend Stephen Miller from IBM, is here actually trying to convince this group of really wonderful academics that they need to learn a little bit more about what's going on out there in the real world. And a good friend of mine is coming down, who was one of the first chief data officers in the entire world. And there's a great story involved where one of the offspring of this individual was at a college and had a homework problem and called the parent and said, hey, you know, can you help me out with this homework problem and have me some trouble with it? The parent looked at it and said, well, I know a little bit about this stuff. I'm after all a chief data officer. And hustled with it for a bit with the offspring and neither of them could figure out what was actually going on and they both were kind enough to call me and say, hey, can you help us out on this as well? And the alter got on a little WebEx and went round and round and we finally figured out that the professor had given them a problem that was completely unsolvable. So we gave the student a couple of questions to go back to the professor with and the answers that came back demonstrated 100%. And the professor had no clue what they were teaching in this place. So while Michael gave you a really nice articulation of how to get to third normal form here, I will tell you that my measurements show that 80% of the college professors that are trying to teach this stuff to people don't know what they're doing and consequently it's no wonder that people are confused about this stuff. And that's really the message that we're here is to say we've got to put this stuff into the curriculum in a more rigorous format and make sure people understand this. Now let me just add a touch to what Michael said earlier here. Third normal form is generally a great stopping place. If you go to the next slide that I had up on this here, oops, I'm sorry, the next slide. Let me go... Hang on, let me just get to this. What happens here with normalization is that it makes absolutely good sense to normalize to a certain degree. And this is our ability to go from technology-dependent pieces, which we do need to have for performance in particular because these things have to run in the real world to get to an abstract level. And that's what this business between logical and physical is. Conceptual is one more level up from there. But it's absolutely critical when we are re-engineering our data, when we're moving from one data type to another, when we're moving from one data structure to another, when we're moving between big data technology and, if you will, the traditional data technologies, it is absolutely crucial that you follow this process of going to a logical level for it because at the logical level, you can get to that normalization process and you can do as much normalization as appropriate, which is, I think, what Michael was describing on there. Now, when you talk about how much is appropriate, it turns out there's an actual very good guidance that you can do. Again, first and second normal form are very easy to understand and people get it. Third normal form is generally considered to be good enough. The other levels, fourth, fifth and sixth normal form that we talk about or something we talk about, business normal form in some occasions are tools. For example, fourth normal form is very good to drive out type entities. So if you're at a third normal form and you think that a portion of your model might possibly benefit from the introduction of a type entity, then take that portion of the model and make it fourth normal form. All right? If you're looking for fifth normal form, what you're looking for there are subtypes or, in this case, different types of structure, entities around us. So each of these levels have a different use that you can make of it. And it's not that you should take the entire model and look at it from that perspective. It's that you should look at it from that portion of the model and see what you're doing adds value to the overall process. And as Michael said earlier, if you can keep this as part of your overall process, it will become much more useful in the long run because once your programmers and your coders understand that these things should be cast as a series of types, for example, or raised to a different level of abstraction, they will see the types of synergies that can occur in these areas that others have not been able to see before. Now, the reason this is real problematic is that what we teach you is to go to third normal form for understanding, actual logical as is and physical as is, but then we denormalize it back for performance. So people say, well, why should I do it in the first place if we're just going to normalize and then denormalize? And the point is, because you understand and then you understand the trade-offs that need to be made in each case. I'm sorry to get pedantic about that, but that is actually one of my soapbox issues that I get on awful lot. And it's the reason I'm down here in Puerto Rico today. Awful duty. I'll tell you guys, awful, awful duty. Yeah, we really feel for you, Peter. We would have been there to support you, but I noticed that plain tricker wasn't looked for us. Okay, this is a quick one. And someone just asked if you could switch to the slide that has the MetaDataModels publications. Gosh, sorry about that. And again, we will be making the entire presentation available, but if you guys wanted to add any thoughts to that, maybe frame up a little context. Michael, how often do you use these concepts when you're out there working with people? I bet you've got a lot of these patterns in your head and you don't actually jump to the books, do you? No, I think the books are a good starting point. What the books really do, and where we found value in books that are out there, is when you come into an organization who doesn't have a data model that looks normal, and I kind of say that in jest, but it's really about maybe smaller companies that went into the market a lot of years ago and they just set up a data model and this was maybe before the model standards were even in place and they just ran with it and they've added on to it and built this big honking model that really somebody came along afterwards and said, here's a better way to do it. And so sometimes that resonates and people look at it and say, oh, that makes a whole lot more sense than what we got, right? But it really serves as a good starting point to say, do you understand what this model is trying to do? If you're not following these best practices, why did you determine to go in the direction you did? And if you want to move more to this way, then we can start talking about that 20% that's going to be different from your best practices. And, you know, it's in those 20% that you can really get to know an organization. I mean, it's kind of interesting, you know, for our work, we tend to learn about a business from the data up, which is really cool. It's kind of, you know, the data kind of tells a story and you can eventually, especially once you align that to sort of in business terminology and business rules that the end users of that data are applying, we can quickly sort of become micro experts in those fields by understanding we already kind of get 80% of the data model and if we just sort of assimilate that extra 20%, it usually is really interesting for us because we can kind of quickly learn about that industry or that specific organization. But, you know, to kind of close out the question, it's not a matter of, we're going to show you this book, this is the best practice and let's go implement it if you don't have it because that's not the right approach. We want to talk to the business. We want to talk to them and see what their model is, what they're trying to do, why they're trying to change it, right? I think the main purpose of the books is do you understand the concepts that are involved in this line of business because what the books do a lot of times is give you specific models for specific industries or lines of business and so it's really more of a starting point. It's not a finishing point and I think that's the key point that I would want to make. Well said. Well said. Cool. Okay. This question I'm actually going to ask, not because I know the answer, but specifically because I don't and I think it's a kind of interesting question. Someone asked, data diversity has been promoting data resource design instead of data model design. Thoughts, I have not heard the term data resource design before now. I'm taking a guess. It sounds like maybe an additional level of abstraction and I just did a quick Google of it too because again the term was new to me and I'm finding this a little bit as there's a book called Data Resource Design by Michael Rackett and I've read a couple of snippets from that. Peter and Michael, is that something you guys are maybe familiar with at all or is this a new term for you guys as well? So I know Michael Brackett very well. He was president of Dana for a number of years and what he's really talking about is the same kind of meta-analysis in here. When you look at data as a resource, he's saying are there structures that can in fact be helpful to the organizations? That there are things... And really I think one of his major points is that the architectural differences that organizations adopt and implement are some of the things that can give them a strategic competitive advantage in the long run. Again, getting these things designed correctly can lead to the implementation of better business practices or more advantageous business practices that are there. So yeah, Michael, I think he's probably got twice as many books as I've got out on the field. You're right, his book Data Resource Design is one of the things that covers this and I think he is going to be in one of the upcoming data diversity webinars. So maybe he can expand further on that. Yeah, I'll have to check that out because it sounds really interesting but it is the term I am unfamiliar with. Okay, we'll go ahead and move along here. Michael, I guess this one's really for you. What specific modeling methods or tools do you suggest for NoSQL, graph databases, and or Hadoop? Sure. So modeling methods, just going back to the slide we were just looking at, you had a bunch of books that really talk about some of the current modeling best practices. For NoSQL, because it is, and I say new technology now, it's been around for probably five, ten years, but there's not a whole lot of those best practice models. Unfortunately, and it is unfortunate in my mind, that NoSQL modeling does tend to be a little bit of a documentation of the process right now. I'd love to see it mature to the point where some of the more structured modeling methodologies are where it can become part of the process rather than a documentation of the process. I think the modeling methods around NoSQL are really just make sure you model it. I mean, there's not so much as like an in-mon or a Kimball or those kind of ideas out there as far as the methods. As far as the tools are concerned, I think actually they had a, the university had a webinar last month, I think from Karen Lopez, it talked about some of the tools that are out there and their abilities to handle some of the NoSQL that's out there. I think they talked about, well, it's blanking me right now, but Irwin and how that interfaces with some NoSQL environments and that kind of stuff. So there's lots of tools out there. I'd highly recommend doing some research on your own. But that webinar was a great resource if you're looking for very detailed specifics on what tools can and can't do with relations to NoSQL. That's a good one. We'll add our recommendation. Definitely check out Karen's stuff. She is one of the better experts in this field, just a superb resource. Cool. And we were asked again, can we get a copy of the slides? Yes, these will be made available within two business days. That's what we strive to do. So we'll get those out to you. Let's see. I think I have one more question and this is something else I'm unfamiliar with, but Peter, maybe you can jump in here. Someone said I didn't see object role modeling and I think I'm a little familiar with that concept. I mentioned in this discussion, what's your thoughts and observations on the use of this technique? So a good question. Object role modeling is one that was more of a UML oriented flavor. And I think Michael Blaha is one of the main proponents. I think he's got a couple webinars coming up as well. Again, it's a variant if you're going to highly... Okay. What we've been talking about here have been really business concepts. And so we're talking about how these relate to the strategy of the business. Object role modeling is incredibly useful when you are developing within a software context. Because what you're doing there is creating a much closer link between what's happening with the software and how the software needs to relate to what's going on in the data handling pieces. And by doing object role modeling, what you can do is end up going towards code reuse in a fashion that's very, very practical, very, very useful within that. So it goes back to really where we started this thing, which is what is the business problem that you're trying to solve and how can these models help to start to address some of those problems. Because if you pick up the wrong tool and try to do something with it, I don't know about you all. If you try to tighten a nut that needs a wrench and you're trying to use a screwdriver, it just doesn't work as well. That's really where we want you guys to be doing this, is to understand that there are a variety of tools out there that you can use. Hopefully we've in this webinar given you some ideas on how to go and find some other things. We've also given you some resources in here as well. We try to always document these things so that you can come back to it. And as always, we welcome additional questions and answers. We'd love to talk to you one-on-one at the conferences or give you a call. We don't bite and we'd love to get into these things because as you can tell, we learn a bunch and we can go beat out accordingly on all these. If only we had more time in the world to do even more geeking out. We can always go back and talk about Marvel's graph database if you want. Yeah, that's right. Guys, I think that's the last of the questions I have. I don't think I'm missing any. Let me scroll down here and see if one came through. Someone recommended Norma, available through Sourceforge, which relates to ORM modeling. So that's pretty cool. So check that out. Norma, is that available? Thank you for that. Cool. Okay, so I guess we're going to go ahead and close this out as we hit the 30-minute mark. So thank you, everyone, for participating in today's event. We hope you enjoyed it. We certainly did. Thank you again to Daydiversity and Shannon for hosting us. I'm not sure why she continues to put up with us, frankly. Once again, you'll receive today's materials within the next two business days. The webinar next month will be Data Quality Success Stories. We're looking forward to that one. That's on September 8th. So make sure you join us, as always. It'll be free of charge, and it'll be a fantastic one, I believe. So hopefully you'll be able to join us there. Feel free to contact us if you have any further questions. We love to geek out. You know, you can talk to us about Marvel or data or whatever. We're happy to do that. You can find us on Twitter at Data Blueprint. You can find Peter at P.A.K. and Daydiversity is at Daydiversity. I am at S.J. McLaughlin. I wouldn't dare try to make you spell my name, so good luck. Mike, do you have a Twitter? I don't. I use Data Blueprint Twitter. All right, all right, all right. It might be too geeky, evidently. All right, Peter, Michael, thank you guys so much. Sure. Shannon, back over to you. Perfect. And, you know, if you guys keep talking about Marvel and Data Diversity so much within a webinar, you can come back every day. Awesome. Although if you ask me about Stanley versus Frank Miller, I'm going to tell you Frank Miller. I'm with you. 100%. And thank you guys so much for this great presentation and discussion. It really was fantastic. And thanks to our attendees as always for being so engaged in everything that we do and asking so many great questions. It's part of the learning experience is just to get involved in the discussion. And we really appreciate it. And as Stephen mentioned, we will have the slides out within two business days, so by the end of the day Thursday. Links to the slides, links to the recording and the additional information requested throughout the webinar today, including the additional resources. We'll get that all out to you. I hope everyone has a great day. Peter, enjoy the San Juan. Yeah. And Michael and Stephen, I'll see you in San Jose next week. We'll see you next week. Sounds good. Take care, everyone. Thanks. Bye-bye.