 Hello and welcome. My name is Shannon Kemp and I am the executive editor for DataVersity. We would like to thank you for joining this month's installment of the DataVersity webinar series, The Heart of Data Modeling, moderated by Karen Lopez. Today Karen will be discussing you just inherited a data model, now what? A couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section in the bottom right hand corner of your screen. Or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag heartdata. As always, we will send a follow-up email within two business days containing links to the recording of this session and additional information requested throughout the webinar. Now let me introduce to you our speaker for the webinar series, Karen Lopez. Karen is senior project manager and architect at InfoAdvisors. She has 20 plus years of experience in project and data management on large multi-project programs. Karen specializes in the practical application of data management principles. She is a frequent speaker, blogger, and panelist. Karen is known for her fun and sometimes snarky observations on data and data management. Mostly she just wants everyone to love their data. Follow her at Datachick on Twitter. And with that, I will turn it over to Karen to get us started. Hello and welcome. Thanks, Shannon. Thanks so much for that introduction, even though I wrote it myself. And I want to thank everyone for attending today. And as Shannon said, I want to remind you that you can chat with each other and chat with me a little bit in the webinar features, as well as ask more formal questions or make comments in the Q&A. And Shannon will take care of logistics ones, like is there a recording yes, will the slides be made available? Yes. Is there going to be a blog post following this up? Yes. So she'll take care of all those, and I'll take care of some of the content ones. So we're going to talk about inheriting a data model. So this is about the real-world situation that rarely gets covered in formal data modeling classes. I know I've not seen this, and maybe that's a need for a new type of class, is we rarely do fully blank page data modeling, and we've talked about that with some other webinars where we talk about industry standard data models and data model patterns. But in today's webinar, we're going to be talking about what happens when you have to work with someone else's data model. So someone else's data model might be just a former modeler in your company, a model that's been acquired, a model that's been shared with maybe a sister company or a company that's acquired, a model that might come from a vendor package, a model that might come from patterns and everything, or maybe one of those industry standard data models, what are you supposed to do with it? And that's what we're going to focus on today. So as a reminder, I'd love to see tweeting. And if you'd like to tweet with hashtag heart data, I'll keep an eye on that, plus the discussion can continue after the webinar on these things. So my goals for us today is that after this presentation, you'll know what things you need to do that you need to plan for, that you need to ask your project manager for resources to do, that you'll need tools for. You'll know how to do it, and you'll use these five steps that I've broken this into to get ready. And I'm also going to leave you with some tips on things to remember to do. So, Shannon, how do I get to the poll? So we have a poll. Oh, no, we don't have a poll about who you are because we've done that one a lot. So I'm going to make an assumption about that we're mostly data architects or designers or other types of modelers here. And if you're not, it's because you just love learning about modeling. Let's go to one of the polls. And what I want to know is about how you model, and we're going to start with how many data models in your organization. So you should see the poll now. And, yep, people are answering away. And I'm probably going to give you about another 20 seconds or so to answer this. So start making a good guess. So get your votes in. Five more seconds. What I'm seeing is that about 20 of you have fewer than 25 data models. About 16 of you have somewhere between 26 and 100. Another 20%, 22 of you have 100 to 500. About 10 of you have 500 to 1,000. And 15 of you have no idea, and 27 of you have no number of data models. So I think those of you who think you have fewer than 25 data models either are thinking about the models you manage or your organization is brand new to models. But I'm going to say the same thing that John Zachman would say is there's actually quite a few data models in your organization. They might not be entered into a data modeling tool. They might exist in a spreadsheet. They might exist in lots of people's heads. They might exist in PowerPoint or Visio. And there might even be yet another modeling tool in your organization full of models. My guess is most of the models, especially the physical models in an organization, are documented in scripts or in version control systems because they're part of visual studio or part of some development environment. So I think there's usually a lot more models than that in an organization. Now the next thing I'd like for you to answer is have you ever used somebody else's data model as part of your modeling exercise and how did that go? We're voting right along. We've got about 30 more seconds. No one else is going to know how you voted. 15 more seconds. So what we see, what I'm seeing is that, wow, 70 of you. Yes, you've used someone else's model and it was painful. 15 of you, yes, and it was delightful. 11 of you, no, and 21, not sure if you've used someone else's model or not. That's all pretty interesting. I think I'm back to my regular slides. So that was an interesting thing. So what we found is some of you have worked with other people's models. The majority of you who have felt it was very painful. Sometimes, and I've dealt with that before too, so some of the reasons it's painful is there's not enough background material for you to understand it. You felt it was all done wrong and it should have been done your way, which is another session that we do that I do about data modeling contentious issues and database design contentious issues. That's all sort of what goes into this. But for today's presentation, I want to focus on the ways that you need to work with an inherited data model. And I mentioned some of the ways that you get these models. A lot of times, you know, sometimes they're just provided to you, like I said, by a vendor package. Sometimes they're mandated that you use them or sometimes you're new to a company, so you pretty much have to use them. So the five steps that I want you to think about is we're going to assess the model, we're going to inspect it, we're going to do some comparisons, we're going to report on it, and then we're going to get started modeling and measuring. So the first part, this assessment. I think it's really common, so as a consultant, I come into organizations, I'm asked to review models, work with models, enhance them, fix them, create new ones. And what I find in a lot of organizations is there's not a clear picture of where the model is. So typically, if the model was done, let's say a couple of years ago and no one's touched it and they want to use it, or someone's provided you with a model or a vendor model, even finding the right version of it. Because the typical story I get is that, yes, we have the model, it's out on the shared drive, and I go open up the shared drive and I start looking for files that have their modeling tools extension, so in this case, DM1, and I see something that looks like this only, and my eyes still almost look as blurry. I just find the zillions of modeling files with a variety of names, and sometimes they have the word test, and sometimes they have the word temp, and other times they have the word model, like my favorite data model model name is data model, which doesn't help me at all. So management and project management and program management, they assume that someone was taking great care of these models, so maybe they weren't using their repository or model mart, and all they know is it's out on the project driver, out on the share point. Now what I have to do is try to go find which model I'm looking for, and they'll say, oh, just use the date and the time. Well, the problem with that is open up a model, you got auto save on, it updates the date and time, maybe someone else was looking for the model. I can't use that. And in a lot of very dysfunctional organizations, just trying to track down what are the official copies of the models might take a significant amount of time for finding out where our starting point is. The other things I want to look for is I want to find where are the naming standards, both for this model and for this organization. Are there modeling standards? Were there requirements documents that these things were based on? Are there process models? Is there any documentation that was generated from the models? And what are the issues and defect lists that apply to this model? Now, one of the things that I typically find, again, in an organization that doesn't have mature data governance or data modeling governance, is that someone will say, yes, we have naming standards. I'll say, great, send them to me. They're out on SharePoint. Point me to them. We can't find them. Same thing for the modeling standards. So one of the tips I have is that I usually get this sort of run around. And so when I write up my findings, which we're going to talk about in a minute, I mention that this project or this organization has no naming standards, no modeling standards, none. Now, they've told me there are, but we can't find them. The point of writing that in your findings is that if they actually do exist, someone will read that in the findings and will go find the standards. If they don't actually exist, then no one will find them and the findings stand on their own. The requirements documents, which may or may not exist, which may or may not be models, the reason you want to go find those is you want to find that traceability about why something was done in this data model. You could point it back to the requirements. It also, you might find that there are other pieces of documentation that are either in the same source control library or on the same file share or in the same paper files, and the same thing goes through for all this. Having said, to go find all these, it's very common for me to find none of them or only find the naming standards. And the naming standards just turn out to be a generic naming standards document that someone downloaded, usually from a government site, since those are usually shared in the public domain, but they have not been followed. The last one is a contentious one, and I rarely find that data models are participating in a project's issue and defects list or their bug list, whatever you want to call it. And I'm a big proponent of that being an important part of modeling so that people have visibility into the issues of the model and to what's going on. If the data modeling part of the project has not been participating in these processes, this is the time to start doing it. So the next part about this is inspection. And one of the parts about inspection that I think is really important is that we use the tools we have and that we might also identify places where there's a gap in our tooling where we would need to acquire something. So the first part of inspection is opening up those modeling files. And I tend to take a high-level survey of all the features that have been used in the modeling tool. So just like Office productivity suite, I find that most people use 10% or less of the features in the modeling tool to produce models. And of course that's going to vary based on the number of resources, the number of staff members on the data modeling team, their own experience and maturity level, their familiarity with the tool, and also whether there are other tools that are available to document some of the same things. So if I opened up ER Studio, I'd be looking through to see what submodels are in there, whether the logical and the physical submodels are kind of matched up. I might look at whether there are separate features that are used. So if you look in the left-hand part of this window, I can see that, yes, I have tables and columns and indexes. I have some views, which not all models contain views, but I appear to not have any functions or procedures or triggers or schemas. There are users and roles. This is just sort of the high-level exception. And the same thing can be done with any other modeling tools such as Irwin or Power Designer. You're really going to look at the components that are available to you, what features have actually been used in the model. And that includes flipping through all the tabs or the options down there. So I'd also want to see is there a data dictionary, are there macros, are there links to the repository and those things. This gives me a feel of how many objects there are in the model, how complex the model is, whether it's just a reverse engineering of a database versus a full-blown logical to physical with definitions and all of those things. So here we can see in a zoom in of those that in the AdventureWorks data model that we have these sub-models. We have a logical, we have a physical, we have a physical for data warehouse. In e-movies on the Irwin side over there that we have subject areas and these are the subject areas in there. I also mentioned looking at the data dictionary. So I can see from this AdventureWorks model on the left that there have been some data security objects created and populated that there aren't any reusable defaults or rules or reference values, that there are data types, user-defined data types, but just a few of them, that there aren't any domains. And I can see sort of the same thing over in the Irwin model that I have domains, that I have some default values. The reason I want to know about those and the reference values in the upper right is that a model that's been prepared using domains or reusable defaults or reference values shows a different type of modeling style than one that doesn't use those things and it will also help me estimate the amount of work I need to do to adopt and looking at this. Now the other thing that I want to do with my modeling tools is all the modeling tools have some sort of built-in validation or error checking. Once I've done my assessment, I think it's a great thing to use this error checking to, and they're not all errors, just to point out some things. So, you know, the fact that there might be issues in the model where we have indexes with no members, there could be, I mean, usually these are highly configurable, you know, show me all the objects that don't have definitions, show me objects with overlapping indexes, show me what foreign keys don't match. I mean, we can see all kinds of issues that are identified by the tool that investigating these things by hand would take a really long time and we'd probably miss them. And the reason this all becomes so important is that if you look at real-world data modeling, you know, we get taught these things and people get taught these things in classes and tutorials and workshops. You know, you prepare a logical model, then you prepare a physical model, then you generate the database and off you go. But in the real-world, models get reused, they might have thousands or tens of thousands or hundreds of thousands of properties and objects in them. We have a modeling team that's supporting many models and working with those and the models are supported by many team members. These things are all in a repository. As soon as you create a physical model from a logical model, now you've doubled the number of modeling objects potentially and now that you have multiple database environments for that model with dev and QA and test and pre-production and production, now you've got even more copies and versions of those models potentially. The math on this is just huge that we can't just adopt a model and say that we totally understand the model because we might have inherited this model but this model might have been used several times over several versions and it's not just one model file and it's not just deployed in one place, it might be deployed in dozens of places and other environments. There's all kinds of work that needs to be done to understand the nature of the model. I'm going to take a break and just look at, I have a question here. Having used other models has been both painful and not. That's probably the big it depends thing. I've definitely used other people's models and pattern models. It's been work to try to understand it, to get my head around it, to read all of it, to figure out where it's going to fit and then there are just some where there's very little definition where there's been no consistency and no naming standards. I think that's what makes it most painful and this is the lesson learned that I try to tell management and also my project teams is the reason I'm writing down definitions, the reason I'm writing stuff down, the reason I want all of our data-related constraints and everything in the model is it makes it much easier to share those things and pass it along to other people and it leads to better database designs, I believe, because we don't have parts moving, hundreds of moving parts trying to make a database. We have them in one central location and we're using that for model-driven development. So the next step is we want to do compare. It's been interesting talking about comparisons in data modeling tools over the years. It's been common that people are frustrated either with the comparison facility or frustrated because they try to compare every property of every object in every submodel. You could stand there and look at these screens and there could be tens of thousands of differences especially if you've turned on all comparison properties. So you're comparing where the entity box is on the diagram. You're comparing, you know, just excruciating levels of detail that may or may not be important at this phase of your assessment. So when I start using these compare things, I don't turn on all the comparison features. I really start with the key parts, like compare for me if I'm looking at a physical model, the tables, the columns and the data types, maybe the nullability, maybe the constraints, like keeping that comparison. What am I comparing it to? I'm comparing it to two different versions of the file I've found. I'm comparing it to what it's supposed to be a model of. If it's a physical model, if someone tells me it's a physical model for what's currently in production, I'm going to compare it to the database that's in production and see just how aligned they are. I quite often find that they aren't aligned at all and that's because we haven't used model-driven development and management is under the assumption that the model has been maintained, but because we didn't use model-driven development to make changes in production, it's slowly become more and more out of sync to the point where it's probably going to take days or weeks to get them back into sync. Especially if we find out that things have been done in production that are no longer compliant with the business requirements. Either they become the new business requirements or re-refactor them or try to fix them. I also think that comparison, a lot of people when I ask and I'm giving these in person, a lot of people are saying, oh, it takes too long to do the comparisons, I just can't do that. Well, part of that is understanding the nuances of the comparisons in your tools. One of the tips I recommend is build a reference data model. By reference data model, I mean one where you create a bunch of entities called entity one, entity two, entity three, table one, table B, table C, primary key attribute with identity. So you're creating what is basically a data model that looks like the meta model and then you make some changes and you see how those changes show up in your comparisons. So that you're not comparing address to address or customer to customer, you're comparing entity A to entity A in two different models but you understand the nature of your change. That kind of removes the business abstraction from the model so that you can learn the comparison features of your tool to make it easier. I also think the other way is to make this better is to constantly where you find issues with comparison, report your pain points back to the vendors so that they understand. One of the things we suffer from in the data modeling world is it's rare that the people who build data modeling tools actually do data modeling for a living. So they don't get to dog food or use their own tools in a production environment. They rarely have to open up a 3,000 table data model. They rarely have to compare a data model that requires reverse engineering the entire SQL server or db2 instance before you can drill down into the model you want. Some of the questions, it would be the approach for a new enterprise data modeling initiative in an organization which does not have much in-house development rather by official products and try to integrate them internally wherever possible. Oh my gosh, that's a whole other webinar. But the basic answer to that is, and I'm biased, I'm created data model that's a common language of how the enterprise sees their data and then I'm going to map all of those off-the-shelf products back to that data model so I can see the gaps or overlaps. If possible, can we have some discussions on data lineage and movement? This is an important part of data warehouse. It is. I'll try to save some time at the end to get to that. I've never had to inherit a lineage data model, but that's an interesting question. So we don't have a poll for how do you compare, but if you want to leave in the chat about, you know, what your experiences have been using the compare features in your products as well as have you used other tools like third-party tools that compare say DDL to a database or DDL to DDL. I'd be interesting in talking about that later. So the next thing, where we are now, we've assessed what we have. We think we have the right versions, the right additions and the right dated copies of our data models, and we've gone through and assessed them all. The next step is to go document just what you found. So the best way to document what's in a model is to generate reports, and that's exactly what I'd like for you to do, whether it's an HTML report or Microsoft Word report or both. I want you to go generate the full reports of what you found. That could mean publishing it to a portal or to some shareable place as well, as long as it's clearly conveyed that this was the model as you found it or inherited it. The reason you're going to want that is that you're going to want to talk to people about these models and we're going to talk about some of the other reasons why you're going to do that in a minute. But you want to talk to people about these models and you want them to be able to refer to them. And as we know, not everyone in your company has access to the data modeling tool or even if they do, they don't necessarily have the skill sets to work with it and these reporting things are there to allow you to collaborate with other people. You want to publish this model and make sure that everyone understands it's as inherited. It's kind of a snapshot of it. You want to check it into your repository or model mark again with the knowledge that this is how you found it. And that's going to become sort of your point of reference for everything that you do later. So reporting on data models isn't just for the official ones, it's for anything that's now under your set of responsibilities. So what we've done so far, we've assessed and we've inspected and we've done some compares, we've done some validation, we've done all these great things. Now what I want you to do is to start thinking about, so we've compared. So you know the differences now and you have some idea of where the gaps and overlaps are for your models. So now it's time to start modeling and measuring with this new model. I think it's really important to use issue lists or bug tracking, whatever your project teams call this for things that you find in the model. So let's say that your shop standard is that all of your varchar data type should be in varchar because you need to support extended character sets in your data. And you find in this model that it doesn't use in varchar or it uses it inconsistently or it only uses it in this one case and it should be used in this other case. Instead of just putting that down on your to-do list and changing it, that should be documented in an issue list so that once you get through this entire list of issues, it doesn't mean something is wrong, it's something that needs to be discussed and may require a fix. Then you can start prioritizing those and people will have greater visibility into how much work this is. The reason this is so important is that a lot of inherited models, especially if they came from third party or from vendors, the expectation is that now you only have a tiny amount of work to do because all the modeling has been done and yet you and I both know that it's a model. It may not be a good fit. It may or may not be done or well documented or even usable or even it might even cause more pain on your project. We just don't know this until you start writing these things down. Then also start listing outright defects. So by defects I mean the type of gap or overlap that is a high risk to be adopted on a project. So one example I'll give is that I worked on a project once where the relationship between a contract and a contractor was always one to one. Or sorry, in an application, so in a data model that was inherited it was always one to one. But in our real world, in our real business, we signed contracts that had multiple parties to them, not just us and another contractor. So that got logged as an issue but it was never really treated like a business defect. It wasn't just an issue, it was actually something that would stop us from going into production. Well, what happened is the implementation contractor didn't understand the fact that this was a business defect and therefore went and implemented according to that pattern model and it wasn't discovered until just before we went live that we wouldn't be able to use any of the contracting features in this application. That was a severe defect in the project. So I tried to separate, it might all still go in the issue system but I have a special term for the types of problems that should not make it into production and I call those defects. The other thing is that to either adjust the sub models or subject areas to what's going to work for you. So I work with a common retail data model that has hundreds of subject areas or sub models in it, hundreds. So it has hundreds on the logical side and hundreds on the physical side. That's a very difficult model to work with. It is a fairly large model with close to a thousand entities and tables but the hundreds of sub models mean that I have some sub models that are only two or three entities different. It means it takes a long time to load up the model. It takes a long time to publish the model. It takes a lot of time laying out all those diagrams. When I go to implement that, one of the things I do is I make sure that the sub models are set in a way that actually helps the model and helps the team. So while some people use sub models or subject areas as an architectural feature, like they say, all of the entities in this subject area go into this one spot in the database, maybe a file group or something. I'm not a big fan of that but I see why people do that. I want sub models to be there because they help me break up the work in a logical fashion. I don't want to use either no sub models or too many sub models. So setting that in the right thing. Layouts. So we all wish that automatic layouts worked so well that we rarely had to do any adjustments to it. I know that that's hardly ever true. So I'm never given enough time nor would I want to sit with hundreds of sub models and lay them all out again. But I might take 10 or 15 of the key sub models or subject areas and get them laid out really well. The ones that I'm going to tackle first or the ones that have the highest priority or highest number of issues and defects so that I can start working with them. I might apply a naming standards utility which of course applying a naming standards utility to an existing data model is something you can really only afford to do if you don't have a physical database that uses a different naming standard. But I might apply a naming standards utility to yet another physical model so that I can start working towards that. So then I can map the physical model that came with the data model to ones that we would like to see how those things are named. The other thing is if there isn't a data dictionary and so by data dictionary I mean not just one that's published not just a glossary but the components that are going to help me model better like reusable domains, reusable defaults. I want to start applying those modeling tool features that help me be a more consistent modeler, a better modeler and a faster modeler. So I start working those things into the models. So also in the chat which we'll get to in a little bit is what are some of the common issues that you find when you inherited someone else's data model. I don't need examples from the data model. What I want to know is what things cause you pain when you're working with a data model that you didn't develop on your own. So let me go before we go through these tips. All the topics covered so far around keeping track of an accurate up-to-date master model that reflects production in a given time. We are currently in the process of implementing a process around this. Do you have tips or a starting point for us? So I think what you're asking there is you don't have a master model. What you have is a bunch of production databases I'm guessing and you're trying to sort of derive from them some data models that can be considered a master model. I think also that's a very long discussion but the tips are is that I would say is start modeling master data first. Not necessarily, you know, I'm not using the formal master data as in master data management program but the truly enterprise reshareable data. The things that are typically managed in a master data program so customer and product and location and facilities and accounts and locations did I say that. All of those things by modeling those at least having a common definition of those things will allow you then to map your production systems to them so you can identify your gaps and overlaps. And I don't want to make it sound like every model you inherit is some type of master model. Most of the models that I'm asked to inherit and work with are usually project level models. Say someone, a project team had some developers, they brought in an outside consultant who did models for them that helped them design a database. Now the consultant is gone, now the model is being used, expected to be adopted by the data management team or group and now we're trying to fit it into our modeling scheme. That's the real life way that I see this happening. In those cases there's often very little documentation because no one had held the contractor to our standards for doing this. It might be in a different format. It definitely always uses a different naming standard and a different style. All the pain that data management people go through. There are master models and enterprise data models that exist that I'm trying to measure things to but for the most part these end up being very specific, very physical models that I work with. I love the comment our developers created the model so they used ORM conventions for naming. Yes, I have seen that. So let's go through some of these tips. I have seen the biggest mistake I've seen is someone comes in says okay you have to take over this data model. Here it is attached to this email and a modeler goes off and finds these problems and starts modeling the heck out of it. Only to find out that the project manager that sent her the model just pulled it out of an email that he or she got 18 months ago and the models continued to evolve. Maybe it's already checked into the repository. The biggest mistake I see in inheriting a data model is not doing that forensic level investigation that you're going to confirm and reconfirm that you have the right model that's been given to you. That it's the right version of it, the right addition and by addition I'm using these two terms and we see these in technical tools all the time. So version I just mean out of change control like the most recent one would probably be the version you want but even the most recent one might involve there's a dev version, there's a QA version, there's a pre-production version, there's a production version. But the additions, lots of people take a model, cut out all the parts they don't need, don't ever rename it. It's now called the CRM model but it only has 20 of the original 120 entities in it. That's kind of what I mean by addition. And status. So by status I mean was this just a play whiteboard model that someone came up with when they were thinking about customers? Or is it the official approved by management, approved by your partner, consultants, you know agreed to, this is what we're going forward. All of these factors. So think about your data model is really just data, it's metadata but we treat them like crap and therefore we have lousy data modeling governance all the time. And I think that it's important that you confirm and reconfirm what's been provided to you as the right model. I told you I worked with industry standard models. I've seen people email out those industry standard models to prospective people who are going to use them only to look at the file that's been attached and it's three years old. It's not even the most recent one. So find the actual model. Don't just trust an email around data model and you're going to confirm and reconfirm that you have all the parts not just the diagrams, not just the dm1s or the .erwin files, not just those. The reason is that you want all that supporting information to help you work with it as well as I gave you that trick on the way to confirm that you have it is if you don't have it, you put in writing in your findings that you don't have it and people will not like to see that in writing and they'll provide it to you. That modeling file is only a small part. Find all those parts. And then you're going to take copies and you're going to back it up and you're going to publish it just as it is so that you can refer back to it as you received it. And that includes the models, all the reports, the comparison reports and your findings and observations. Your findings and observations should be done in the least blaming way. They should be. They should be written like an academic report that says our shop standard says that I'll pick on one of my favorites. Entity names shall be all uppercase and singular. This model uses mixed case and sometimes plural and sometimes singular. Now that specific thing I mentioned isn't so important but it also uncovers the fact that this model does not comply to our standards. We may or may not be able to change that. Usually if it's in the logical model we can. If it's in the physical model we may or may not be able to do that. Likely we can't rename things if something's gone into production. But you write up these findings and observations as well as a report with maybe just a short list of the gaps and overlaps but also the things that you're going to create in your issue list. And you start publishing those issues and you're also going to make sure that you call out business defects with the model. And you'll want to specifically do a comparison report of the physical instance of the thing it supposedly models. So if it's a physical model of your CRM database and when you do that comparison that's an important finding of where the model is out of sync with the database. Because it's not up to you. Usually you don't have the authority to decide which side is going to get fixed but you're going to get that documented. You're going to want to start using version control on these new models and if you're not using it on your regular models that you get to do you need to be doing that right away. And you also might want to throw these things so we use repository model mart versioning because those things can do versioning and differences between actual objects inside the model. But our teams tend to use other versioning things that we can't use because it only does it at the file level for data models. But you want to start sitting copies of models in there as a reference not as the gold copies. So start using it there. And then you're going to start monitoring for data models outside your normal accepted processes. So what I mean is not only do I inherit models that are given to me I also go look for models that show up in other places like team drives like in the repository like on share point or portal or other shared resource. You know you're working away one day and all of a sudden someone mentions in a status report that their Erwin model can be found over here or that on the shared drive the data models are there if you're part of the data management group that should start having you ask questions. So I used to have jobs that would run against our team shares our shared drives in my area that would look for dot dm1.er1 and the power designer or look for files with data model in the title just so I could keep my eye on some rogue data models and you don't want to quash all rogue data models because data modeling is important and it's good that people are thinking about data modeling. But if a very large data model is being developed outside your accepted processes that could be a risk to the company it could be a risk to your group and it definitely is a risk to the other teams as well. So I said what I wanted to do today was that you'll know what to do so what you're going to do is assess, inspect, compare, report and model and measure. You're going to know how to do it and I hope this wasn't too much of a surprise but you're going to use your modeling tool features in addition to potentially third party tools and you're going to use those five steps to get ready to modeling with it. So let me check the questions again. We used to run jobs to monitor the DB system catalog to look for objects being created without data architecture review. Yep, I have that. I have third party tools that I point not just to production but all the way through our data life cycle to the environments I have access to. So for instance, if I see new things happening in dev usually that's okay because on most of my project teams devs are allowed to experiment and develop and play with their local databases as a part of their development process. What they don't get to do is then migrate those or promote those to QA because we use model driven development. My models go into QA so even while they're playing in dev the way they get it into the model driven life cycle is that they provide the changes that they've actually decided that they want or need into the data management data modeling process and together we work on a solution that gets them all the way through. Having said that, I have found that sometimes people sometimes things change in places where they're not supposed to. Sometimes it's for a valid reason like an emergency production fix that was done on Saturday at 3 a.m. while all the stores or the hospital was down I get that but I want to know about that before I wait to have someone do that or to stumble across that. So there are monitoring tools that will report all the changes that have happened and send me a nice email that I get every morning that tells me not only where the data structures change but I also monitor the reference data that's supposed to be controlled like the codes and the state codes and the list of countries and all of those things. I have tools that monitor the instance level of the data as well because I need to know when those change. So lots of good chat in there which I'll come to as I get to the end of that. So that's what we're supposed to do. That's all I have for the slides. I want to thank you for being with me today. I also want to remind you that I blog, lots of people blog at dataversity.net and you can follow some of the things. A lot of the blog posts that I blog there are follow-ups to all of these things where either someone has asked a great question or I'm sharing the tips and tricks that I use there and I think that I'd love to see your comments and additional questions there as would all the other bloggers. So I want to thank you and I'm just going to look at some more of the questions. We created standards and people didn't follow them. Yep, I've seen that quite a bit. Does data modeling change when addressing structured versus unstructured data? I think at this point this is where I'm supposed to throw in that really there's no such thing as unstructured data. There's either loosely structured data or very unstructured data. So yes it changes but the overall foundations of data modeling, of understanding what the data is, what its applicable use cases are, what its privacy requirements are, what its security requirements are, none of that changes. I think what mostly changes is either notations and tools and it all depends on really what you mean by unstructured data. One person's unstructured data is just another tool set highly structured data. So one example of unstructured data is maybe multimedia files or large articles or publications. Those all still have a structure. They still have metadata. They have location data or authorship data or rights data. All of that is still metadata that we need to know about it. The differences is the content might have all kinds of facts and information in it. Think of a magazine article and we might use different tools to figure out what those facts are. We might be using sentiment analysis, we might be using tagging, we might use all kinds of things so it's really dependent upon how one goes about deriving the facts and the meaning inside that content. Oh yes, the other great thing that I didn't really mention but in the comments about sometimes you inherit a model that clearly shows that it has had multiple modelers working on it and you see that because you see different styles of generalizations versus specific modeling, different naming conventions, different use of modeling tool objects, that's another great finding to document is sort of the variation in standards or styles and sometimes that's because of multiple models modelers and sometimes that's because a long amount of time has passed since it was first created. So that retail standard data model that I'm talking about, it's been around for almost 20 years and it's had lots of modelers modeling on it and conventions have changed and so you can see parts of the model that are very traditional and the way we would approach things 20 years ago was very different than the way we do now and it definitely shows that type of approach in it and that's something that the modelers who work on that model struggle with all the time is trying to figure out what things they can modernize and what things they really can't afford to change because it would make such a huge change in the model. We see this even in engineering models, accounting models, financial models, all of those models even in these other professions people would like to make changes to them to modernize them but the impact of that can sometimes outweigh the costs of the benefits of bringing it into this century. So people are listing a lot of sort of the things they found, I'd so love to share this. Maybe I'll start blogging about this but I think the important takeaways here today are that most people assume most people have been sold on the fact that inheriting a model means a whole bunch of work has already been done and therefore it's going to go faster and therefore it's going to be easier for the modeler and I think it really just changes what the modeler does. I like to use the term forensic modeling versus blank page modeling. Those are the key things. So what I'd like to do now, Shannon, I think I've reached the end of the content and most of the questions, did you have a wrap-up? Shannon, you might be muted. Exactly. Thank you so much again for this presentation as always absolutely fantastic and thanks to our attendees for being so engaged in everything we do. We just love the comments and the questions that always come in throughout these presentations. Just a reminder that Karen will be speaking next week enterprise data world 2015 in Washington, D.C. if you don't have signed up already it is sold out which we're very excited about. Karen, what are you speaking on next week? So I'm doing a workshop with Joey D'Antoni on columnar data stores and how to architect for those and I'm also doing a co-presentation with Tom LaRoc on database design throwdown where we debate the right and wrong things to do so I'm the data architect and he's the DBA and I always win. Nice. I love it. And next month we will be talking about normalization. It's not your friend or your enemy. So that will be next on April 23rd. It's up on the diversity website if you want to get registered for that. And again thanks Karen for everything and thanks to all the attendees. I will turn off the