 And hello and welcome. My name is Shannon Kemp and I am the Executive Editor for a Data Diversity. We'd like to thank you for joining this month's installment of the Data Diversity Webinar Series, The Heart of Data Modeling, moderated by Karen Lopez. Today Karen will be discussing seven ways her agile project is managing data wrong, sponsored today by Embarcadero. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar for questions, so we will be collecting them via the Q&A section. However, if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag heart data. And if you want to chat and join in on the conversation, there's always a great conversation going on throughout these webinars. Just click the chat icon in the top right corner of your screen to pull down that feature. And then again, the Q&A is a great place for questions for Karen in the end of the webinar today. And as always, we will send a follow-up email within two business days, containing links to the recording of the session and additional information requested throughout the webinar. Now let me introduce our speaker for today, Karen Lopez. Karen is the Senior Project Manager and Architect at InfoAdvisors. She has 20-plus years' experience in project and data management on large multi-project programs. Karen specializes in the practical application of data management principles. She is a frequent speaker, blogger, and panelist. Karen is known for her fun and sometimes snarky observations on data and data management. Mostly she just wants everyone to love their data. Follow her at Deadichick on Twitter. And with that, I will turn it over to Karen to get us started. Hello and welcome. Great. And how are you doing, Shannon? How do you look? Good. So thank you so much for the introduction and for the pre-chat and everything. I wanted to remind everybody that the formal part of the webinar started now and we're doing the recordings. So just as a reminder, Shannon said you'll get the recording and the slide deck. No need to ask about that later sometime early next week. And one of the other things is you see that hashtag there, hash heart data for the heart of data modeling. I have my column open and I'd love to see that flooded with comments, even snark, even opposite opinions or observations. So if you want to just put things in the chat or in the Q&A, that's great. I also wanted to thank Dataversity for hosting these webinars. Couldn't do them without them. As well as Embarcadero Technologies for sponsoring today's workshop. I think that's really great when vendors step up to make sure that community-based, real user practical discussions can happen. Especially given the way that Dataversity, we run the webinars, which is you guys get to see each other, chat with each other, see each other's questions. It's not a lockdown webinar. So we might as well get started. A little background here. I've been working on Agile projects for quite a while. And today I'm going to use the word Agile to be a little bit more inclusive for Agile and Scrum. I have some slides about that. As well as even sort of any sort of modern development technologies. So let's get right in there. So please do cheat. Tweet. I'm Karen. I'm a little bit snarky. I would love to follow you on Twitter, friend you on Facebook, or link to you on LinkedIn. Send me a request and say we met at Heart Data. So I know to go ahead and approve that right away. And since Data Modelers are people too, let's get to know you. Let's do the first poll. I just need to know a little bit about what you consider your primary job. Are you a Data Modeler architect? Not making any distinctions there. DBA, developer builder, BA analyst, or something else. And let us know in the chat what you are. So you're just voting right along. You have about 40 more seconds. Oh, sorry, 10 more seconds. I'm going to go ahead and tally the results, just like the Oscars, managers, yeah. And you're going to see the results. So, yeah, a big chunk of us are Data Modelers architects, a few DBAs and developers, some BAs and some others, and 17 of you who are busy working. So we're going to go ahead and open up the next poll. So how long have you been doing this crazy data modeling stuff? And you've got 20 more seconds to figure out who you are or what you want to be when you grow up and to show the results. And it looks like a few of you are new to this stuff, but most of you are old and experienced like me. And then the last poll, what are you working? How are you working? Is your organization using Agile or Scrum or other modern development methods along with data modeling on the same project? And if you're going to answer any of the questions, this is the one that I will find the most interesting. A couple more seconds. And what I'm seeing is 11 out of 130 of you, say you're using data modeling and agile-like technologies with great results. The majority of you are using them together with mixed results. Some of you are doing them, but not together. And 20 out of the 130 say no, and 23 of you had no opinion either way. So thanks for the polls. So that's pretty much what I expected. I expected that we would have mostly data modelers, since it's a data modeling webinar, but I'm really excited to see developers and analysts and DVAs also involved in part of this discussion. And from what I'm seeing in the chat, is that people have other types of roles that are closely around data or managing people, so stewards and those sort of things. So this is the reminder about the Q&A in the chat. And here's our plan. Notice it's not an agenda, because we're talking about agile things. So the plan is going to talk a little bit about what is the formal definition of agile, what isn't, what are the related methods, where does data modeling fit here? This will be the focus of the things that I'm going to talk about today based on my experience working on projects. And also attending conferences where people are talking about agile, their processes, and reading up on sort of the crossover between enterprise class development and agile methods, as well as the different types of projects and the different types of concerns that crop up when we do those. So I'm going to talk also about sort of seven mistakes that happen, as well as leaving you with tips for how a data modeler should ensure that agile and data modeling can work well together. So first we're going to talk about what are they? And the first thing I want to say is I love working on agile projects. And sometimes there's frustrations. There's a lot of dysfunction out there. There's a lot of misunderstanding, which we'll talk about. And stealing from Alec Sharp and a few other people, it's fragile projects that I hate working on. So there's agile data modeling and fragile projects, and fragile projects. Some of the agile gurus talk about there is no bad agile. There are only dysfunctional people. And as a longtime methodologist myself, I think that, yes, people and resources can be a big issue with any set of methods. I think the agile and the scrum community have, let's just say, there's a lot of discord and conflict, especially in online forums and on the Internet and on mailing lists, including a lot of them that I manage or that you guys belong to. We've seen it. A lot of those sort of fights have fallen off because no one wants to keep having the same arguments or debates. And I think from both sides, either being pro-agile or anti-agile or pro-data management and anti-data management, I think the core of it all is that both sides have a lot of misunderstandings about not just what the goals are, but how things work on both sides. So that's what I want to talk about today. So the first place you should go to, to find out about agile or agile, depending on how you pronounce it, is to go to agilemanifesto.org. This is a webpage that was created in 2001 when this group of men got together and came up, in their description, they had some discussions, probably some beverages, and came up with a manifesto. And we're going to go through those. Now one of the things about this website is it's very painful to read and work because it does have this graphic background behind all the text. So the thing I like to say is, if this isn't the exhibit for agile methods of build something, get it into production, and not worry about it, but not worry about refactoring it later, it's this website. But the principles are the core part of it all. And when we look at the principles, they are like most manifestos, they are wonderful and delightful to read. If you have ever been on a Death March project, you can see why people got together to come up with these. And before agile, we had a method or an approach called XP that was developed primarily by Kent Beck and a group of people. That was a response to a horrible, large Death March project where people were working hours and hours a day and missing out on their family life. And you can see how these things became a natural sort of set of goals and principles that people wanted to work on. So let's look at some of these and make some comments based on sort of from a data modelers point of view. Our highest priority is to satisfy the customer through early and continuous delivery of valuable software. I think that's a wonderful thing. As a matter of fact, we could say as a data modeler, our highest priority is to satisfy the customer through timely and flexible and iterative delivery of valuable data. It's the same sort of concept. We welcome changing requirements, even late in development, agile processes harness change for the customer's competitive advantage. That reflects a lot of the things that we talk about. There's a huge myth out there that data models and therefore the database designs that come out of them are tightly constrained. There's no way to make changes. It's just impossible to change a data model. Impossible to change a database that we can't make design decisions in relational technologies that are flexible or find the right balance between flexibility and data quality. I think that's one of the biggest misunderstandings in a lot of the conflict that's happening in the changes in IT development. Three, we deliver working software frequently. Now we've repeated what we said above about continuous delivery from a couple weeks to a couple of months with a preference for a shorter time scale. The interesting thing about that is that is a huge difference from what people were talking about in IT software development in 2001 when the manifesto came about. At that time, we would talk about doing traditional development, whether it was information engineering, data-driven, whether it was structured refinement or any of those words from back then. This concept of delivering software every couple of weeks to couple of months, and the first time I heard it, I thought, okay, not only is that crazy, not because it couldn't be done, but because end users can't absorb that rate of change in an enterprise environment of having new software every two weeks or a couple of months. And it turns out that what we mean by that now is delivery towards the goal of getting into production. It means that chunk of work is pretty much done or done for now, and that we can move on to delivering other software. Now, there are types of projects where software can be delivered that frequently every couple of weeks or couple of months, and those typically are self-contained, usually web-based or app-based, not application, app-based, where the product owner, which is one of the terminologies we're going to talk about, where the product owner is the sole responsible person for deciding when an end user should be impacted by change. And in the app world and in the web-based functionality world, think Facebook, think very large, complex systems, you know, it's the owner of that application that makes those decisions, and often consumers are begging for changes and fixes and enhancements. Sometimes they get frustrated by the changes or the rate of change, but the sort of web-scale, online application, not really an enterprise interconnected thing, is delivered at a much more rapid rate than our internal enterprise systems. Sometimes that's for a good reason, and sometimes that's just because enterprises take longer to make decisions, to get approvals, or to integrate across a whole suite of applications and databases. Point four, business people and developers must work together daily throughout the project. Well, one of our complaints as data architects a lot of times is we don't get access to the business people, so I love this statement that business and IT works together. The sort of odd word there is developers, because on many projects at the enterprise level, not all developers get to talk to, or are the only people who talk to business people. In fact, a traditional IT enterprise IT role is we have assigned people who specialize in talking to subject matter experts, and they're either business analysts, or data analysts, or user relationship people, or we have subject matter experts that sit somewhat in IT and somewhat on the business side. So that's a little bit different than just saying we need access to business people. In the agile manifesto, that use of the word developer is really important and very specific. We'll talk about that in a minute. Five, build projects around motivated individuals, give them environment and support they need, and trust them to get the job done. Okay, I just want to stand up and salute that principle, because that is something what working person in the world wants anything other than that? But to be on a team of motivated, passionate people who want to do great things and want management to remove obstacles and give them an environment and all the support they need, and then trust them to get the job done, I think that's a perfect principle in anything we do in life. Six, the most efficient and effective method for conveying information to and within a development team is face-to-face conversation. Now that one is kind of interesting. I don't know about you guys, but the vast majority of projects that I've worked on over the last few years, I've worked on highly geographically dispersed teams. People were not even in the same day, let alone the same time zone or the same building. Having said that, I do think that, you know, modern collaboration technologies like the WebEx we're having now, like online chat and link and Skype and all of these things have come a long way rather than just having phone calls. And so I see how the face-to-face conversation is not necessarily a requirement. But I know that handling conflict and getting things done fast is often a factor of how easy it is to have a brief and important conversation with someone where email fails, where chat fails, where you want to sit together around a piece of paper and mark it all up and sit around a whiteboard and whiteboard some ideas. But I don't think this geographic dispersed teams, I don't think that issue is going to go away. So in lieu of this, I think one of the concessions we have to make is that on large enterprise development projects, it's more likely that we're going to have team members that are traveling, that are located in other cities, that are located in other buildings, or just working from home. As a matter of fact, I've always wondered if people who attend webinars, if the rate of people who are working from home is higher because it's easier for them to sit and be part of a live webinar or not. Always interesting. So the seventh principle, working software is the primary measure of progress. Now, I think that sounds great, and we're going to talk about the word software there in a minute, but you can see how this manifesto is focused on the development part of a solution. So we data people, like it depends on whether you think a database is software, whether you think an XML document is software, if you think a report is software, if you think a data visualization is software, I can certainly make the case for considering any of that software. But I think that a lot of the conflict that happens on these teams is that from a business point of view, most business people do not think just working software is the only definition of progress on a team, on a project. So I can make working software and still have the wrong data coming in. The software can meet the specifications because they didn't know about outliers or because the data coming out is being misrepresented, such as in bad data visualizations or anything. So we can debate whether that's working software or not, but this gives you a hint into what some of the conflicts are going to be coming next. Eight, agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely. So notice the players there, sponsors, developers, and users. There's that developer word again. There's no other IT role mentioned. The constant pace doesn't mean that there are a little bit faster times and slower times. That's something we're going to talk about. But I'm going to make some running analogies as I go through here. When I'm doing my training runs, I usually try to main a consistent pace because if I have to do a long run, I don't want to start out too fast because then I'll fade at the end. I don't want to start out too slow because I might not be getting the training that I need. Constant pace is a good thing for working environments, and the context in which this was written was that you can't expect us to work really, really hard killing ourselves with no sleep and eating junk food for years at a time. That's the context of that. Nine, continuous attention to technical, and good design enhances agility. I love that. What data modeler would not love that statement? I think that the issue part of this is the definition of good design. Let's just say it tends to vary among team members. Ten, simplicity, the art of maximizing the amount of work not done is essential. Now, I will interpret that to be we don't want to do any wasted work. We don't want to do any throwaway work, and I'm a big believer of that. That is an essential part of agility. But again, we're going to see varying definitions of what is essential as we talk through these. Eleven, the best architectures, requirements, and designs emerge from self-organizing teams. We're going to talk about self-organizing teams in a minute, so I'll leave that for then. Twelve, at regular intervals, the team reflects on how to become more effective in tunes and adjust its behavior accordingly. I think that's wonderful as a project manager and a methodologist. Those checkpoints, those say, are we doing the right thing? Are we doing them in the right order? Are we doing them at the right pace? Do we have the right people? Are we missing equipment? Is there other testing we should be doing? Are there things we should stop doing because they're not making us faster or better? Yes, I think that's a wonderful thing. That particularly comes into play specifically for data architects and a few slides from here. This is the point I want to make about Agile and the Agile Manifesto. It was specifically created as a software development method, and a lot of the principles around it are very great around software development, where I think a lot of the conflict and misunderstandings and misfit to what we're trying to do is that those of us that are architects or data models or data professionals or business analysts or project managers do a lot more than just software development, and yet we tend to think of these as Agile projects as if software were the only part or even the most important part, which is what we heard in the manifesto of an IT project. So the Agile Manifesto, that's what it says. So those are the principles that then lead to a lot of frameworks. So there's a lot of interpretations and there's a lot of fighting in the community about is it Agile, is something else not Agile. And I just called these extensions. I just mean that it's extending upon the manifesto and the principles there. So a lot of times we'll hear everyone on an Agile team must be a generalist. The colliery to that is no one can specialize in anything. So the snarky side of me says, well, if you're going to develop secure, legal, compliant, performant, scalable, loved systems, then what we're really saying is everyone has the same level of skills is that everyone on the project needs to be a specialist in everything on the project. And I'm not sure that this is one of the key points that I am not filled on in the Agile space is that I see nothing wrong with having people who specialize in databases, people who specialize in security, people who specialize in a programming language, people who specialize in integration and reporting and data visualization. To me, those specializations are what we need in an IT shop and what we need on projects. Now the background on why people think that this is an issue is that they hate working with someone who's over-specialized. So for instance, if you had someone on your team that only worked with, knew about how to generate XML documents out of SQL Server 2000, then if your project didn't have SQL Server 2000 because it had moved into this century or if you weren't using XML or if you were consuming XML and not generating it, then that person might not be as valuable to you on your project. I get over-specialization is a problem. I don't get that everyone is a generalist. Then there's a concept of agile blocking, which I'm going to talk about that in just a minute. Some people in the agile space, especially in the consulting services space, believe that there are people with certain names and words in their titles that need to not just be excluded from the project, but from interacting with anyone on the project or even speaking to business users who are assigned to the project. So that would be people with administrator, architect, manager, quality, any of these words, because to them, those are bureaucrats who are going to stop them from getting work done. That's a really extreme method for IT. It's a very extreme method for enterprise IT where there are legal issues, where there's existing resources, where the applications we're building have to integrate and play with a whole complex system of network and hardware and software and all kinds of things. Test-driven development, something I love, which is basically we don't leave testing to the end when it's too late to fix stuff, is that basically, I mean, we could do a whole hour on test-driven development, but basically it is you don't put anything into the next phase of the software lifecycle until you have one test on it, even very, very tiny, small tests, and you just keep using test-driven development. I really like this approach only because I've lived on so many projects where all testing is left until the end, and then it's too late. That is such a dysfunction in IT, so I love that. Then there's a concept of no big models, modeling up front or no big designs up front. These acronyms are disparaging terms used for the traditional waterfall method as well as any modeling or design that may need to be done before development starts. I think part of the pain that people have experienced in the past and the reason that they would want to disparage all up front modeling and design is that we have, those of us who experience have been on projects where someone spent six months a year trying to design out everything in the project to a detailed specification level, much like is required in many government scenarios, only to find out that the design now has to change and it was too big and it was done by people who weren't implementing their design decisions and prototyping them and playing with them right away in order to find out if that actually worked. So I'm not a fan of, you know, let's stop everything and model the world for six months or three years or something like that. But as we're going to talk about, I think there is modeling and design that needs to happen before we start coding. Another concept is paired programming and the idea behind that is that two people sit next to each other and you rotate these pairs of programs and that you get higher quality code and that you get better code, you get better requirement satisfaction because you had two people working together. There's a misperception that we don't do paired modeling. I think we do. I think we do it but we do it on whiteboards. We don't sit two people together typing in definitions and adding attributes. I think our paired modeling happens more all those times you sit down with someone and try to figure out how to solve a particular data story. I do that all the time. And then there's blocking. Yeah, I'd love to do a poll to see how many of us work from home. I don't know if Shannon can do that while the webinar is going or not but if she can that would be great. That was in the Q and A. So agile blocking. This is something I cannot get my head around and I see it happening on projects all the time. This comes from an essay at agiledata.org and it's basically a method of what the author here calls a process facade which means pretending to do the work and misreporting the deliverables you're creating and the work that you're doing so that the rest of the organization thinks your team is following procedures, compliancy things, data stewardship, data quality rules, getting approvals. Because they see all of that as just make work, paper trail, bureaucratic work. And all this highlighting, the emphasis is all mine, it prevents the bureaucrats, a.k.a. us, the data modelers and architects, from meddling with the people that are doing real work. Now the type of, the types of, gosh now I'm speechless here every time I read this. There's a whole lot of loaded emotions and conflict and misunderstandings in here. I know why people come to the conclusion that data modeling is not real work, that database design is not real work. That trying to figure out what to do with customers who we don't know their date of birth or finding out, you know, what the state of Indiana is doing with sales tax or pie this time of year. Those to a data modeler are interesting, trivial things that we like finding out about. But they are real work because they're about the data. I think this is also a reflection of what software, of the software focus of agile has spilled over into this concept of blocking. He even talks about the role of a blocker is switched among teams, mostly so that they don't have to work with us, the paper pushers. The reason I bring this up is first of all I think it's professionally unsound advice and also I think it's important for us as architects, as quality, data quality people, as business users that we understand that the developers are being told that all this enter prisiness is something that is just being done to stop them from getting work done. Lots of comments about that. So out of agile came Scrum which is just based on the rugby analogy. And it has also very nice value statements. You know, only working on a few things at a time, working well together and doing excellent work and we do valuable items sooner. Love it. That we're going to have courage to undertake greater challenges. Love that. Open us. We work together, we tell each other how we're doing, what's in our way and concerns so they can be addressed. Wonderful. We have great control over our own destiny. I want some of that. And we are more committed to success. That's all great. And we respect each other. Now this is one of the things I've noticed is that while I still have conflict and conflicting point of views on the Scrum projects I work on, I do find that these values bring even more to the table to an agile, Scrum calls itself an agile framework for working within the agile principles and I do find that Scrum projects are a little bit easier to work on. But some of the concepts that happen with them, there are usually no project managers, that's the official term, there may be Scrum masters who help guide and do the planning. We have a concept of parking lots where if we run out of time or we're going to run out of time we can rip out functionality or leave in defects and deal with them later. So we write those down and we parking lot them. We're not going to deal with them now because it might get in the way of other successes. Then the concept of backlog, every requirement and every user story starts as a backlog. So our whole goal on a Scrum project is to clear the backlog. Even though in the real world and enterprise there will always be a change request or something but because we're working in sprints, there's another analogy pulled from the running world. We're working in so that we can work very hard and very fast and get to the end of a time period, a specific time period and a specific goal as opposed to starting an IT project that we know is going to take a year or three years and feeling like we have a whole lot of time. I mentioned the word stories. So there's a little bit of sort of contentious issues about stories. Other people see them as assertions from the business or the goals the business want and that we still need to refine them further. So one of the contentions in the Scrum world is how detailed do you get in the stories and do you worry about exceptions, all those things that data modelers do. There's a daily Scrum which is also called a standup meeting. I know a lot of people have adopted standup status meetings. You literally are asked to stand which is great and even though these people appear to be arguing, most standups are part reporting even though Scrum says they aren't. But reporting what I did yesterday, what I promised to do yesterday, did I get it done, what are roadblocks for me getting other work done and what I'm going to work on today. This brings a bit of accountability to the, I guess I'm not supposed to call it project management but I still see it as managing project and resources as well as getting help when you're blocked as opposed to waiting for days and weeks if you're blocked. The last one is self-organizing teams. The concept of self-organizing teams is important in Scrum. It says that you put a group of these people together, the generalists, they decide who does what. They're not assigned to do specific things. In the real world, my self-organizing teams typically have data architects, DVAs, developers, business analysts, quality assurance analysts, testers, did I say developers. We have all these people who are known for their strengths in these areas and quite shockingly in self-organizing teams the DVAs are assigned to do DBA work and the data architects are assigned to do data architecture work and the developers do development and sometimes there's a bit of crossover but I don't see that crossover happening any more or any less than it did on traditional teams that we had in the past. So in this concept slide from the Scrum Alliance, which is your go-to place for Scrum information, you start with a backlog, you do some planning, you create the backlog for a specific sprint and then you spend, it depends on who you ask here. I think later in my slides I say one to two weeks or two to three weeks could be two to four weeks based on the complexity and how long something is going to take and then you make this delivery of something that could be shippable. So if you look at a conceptual Gantt chart, typically Scrum development happens, I'm optimistic, I have a data model there and that you have these short sprints and depending on this sort of sprint planning there could be two or three of these sprints that are done sequentially and then you go back and start again. Often the two to three sprints are working hard on specific deliverables, usually certain functionality but it could be other things in your project like security, like reporting, it could be just functions that need to be delivered. Typically from a data modelers point of view leads to a functional database application. But there's often something and depending on whose method you're using it might be called a recovery sprint or a catch up week or a fix or data quality week or sorry just quality week. The key here is that you have a few weeks of working really hard getting deliverables out the door and then you get not a week off but you get a week to sort of reflect, catch up, get ready for your next running, maybe do some defect fixes, it's a time when you're not running as hard and fast as you can. That's the key for it. So typical sprint from a data modelers point of view is that there's some sprint planning which starts the sprint backlog, some stories are developed which are often very, very high level like we must collect sales tax or we need to generate payroll for full-time workers or something like that. And you start and people start reading the backlog in stories and then you're expected to do development right away. And the problem with that is for me as a data modeler is those stories are so high level I have a million questions about when you say we have to collect sales tax what do we do about tax exempt customers, what do we do about zero rated customers, what do we do about grocery items that we don't collect sales tax on, how do we want to record that, which sales taxes do we have to support, do we have to support that, what do we do about input taxes, what do we do about excise taxes. You know living with a data modeler is like living with a three-year-old toddler who asks 30 questions a minute. That's what it's like living with a data modeler. And yet in a typical spread people think that the database design and therefore the data models are going to be delivered, developed and delivered as the result of reading these stories not talking to any users and the developers are told go start coding. And so we have our two developers up there at the top saying where the hell is our database and me as a developer. Hold on, I have a bunch of questions. Well guess what, that's perceived as being really dysfunctional. So the developers develop and the data modeler I scramble, scramble, scramble, scramble, I get them some data models, then I work with the DBA to get them some tables and some columns and maybe some indexes and some constraints so that they can start developing but it's not at the beginning developers were told go start coding. There's a little bit that they can do in a sprint before they have all that. But in this world of the data is why we're building the software, it's very annoying for them, it's very annoying for the data modeler that we don't have those things together. And then we get to the end of the sprint and then another sprint starts and we repeat this all over again. But while I'm working on the sprint, I'm trying to maintain the data model and the database parts that were delivered in the previous sprint because of course not everything was perfect. I'm going to talk about that. So the first thing that agile projects get wrong is expecting data modeling and database to be designed to be completed in an instant at the beginning of a sprint. And this conflict is probably the number one reason why data modelers get escorted out the door after the first couple of sprints because not only is this physically impossible, it's logically impossible and we tend to not say nice things about the agile process that expects us to do this sort of wizardry. So how could we fix it? Well, one of the ways to fix it and the easiest way at first is to start delivering the data model when the stories and the backlogs are being considered. And this is a hard thing because everyone thinks that the data model is just the database design and therefore should only be done when the sprint is being done. But it's not physically possible because we data modelers don't have enough requirements and aren't typically giving access to the business users to ask all these questions. But there's even a better way to fix it which is to do the data modeling for a sprint long before, like two or three sprints before it's going to be needed. That forces all the backlog sprint planning and story development to start a lot sooner than most people are taught how to do it in scrum class. And then you start getting accused of being a fan of big upfront modeling. But I don't think this is big upfront modeling. I think this is working at the story at the business level when the business people are being asked about their requirements which is the perfect time to get data requirements out of users not when the coders are coding. So this is my number one fix it for the dysfunction of thinking that data models can be done in an instant. The other thing we have to deal with on enterprise projects is the fact that enterprise applications are complex. They aren't self-contained websites. They aren't just a team of people coming up with requirements. We have all kinds of things we have to do there. They involve enterprise solutions, these complex applications and databases. And we have data modelers that already understand the data, that know what metadata is available, that know enterprise tools are complex, vendor packages, external data, no SQL data. And the metadata we have at our fingertips, there's Michael's comic, Michael's sword is on the call. We have all these extra requirements about the data that a business user might not mention. So we have that at our fingertips. So that brings us to doing it wrong. Thinking that the concept of just enough documentation which is an agile extension means we don't use existing models. So we see a lot of people saying okay, we need to design a database, I'll go get a pen and paper. We're going to fix that by knowing that there's a wealth of data in the enterprise and that we need to use it and metadata and that the data professionals in the enterprise know where it is how to use it and can do that very quickly. We also know that enterprise projects are integration projects. It's so rare to have an enterprise project that's just all self-contained one database, one application server, maybe mirrors of them for availability. They're complex. So getting it wrong is expecting enterprise data modeling and database design to be completed quickly but by generalists who actually don't want to do a data model and don't like doing database design. I've seen this actual conversation on a project. It's your turn to build the database. I don't want to build the database. Let's get Mikey. That to me, if I were a business user, should make me should make you shudder at the thought of someone doesn't want to build the database being forced to do it. The other thing that happens is the sprint planning is often done by the scrum masters and some developers without data professionals and they make tough decisions that impact the father's ability to get work done. Like what should our first sprint be? Let's start with payroll. It's just reading some data. The problem with starting with payroll is how many entities and tables will you need to build in order to get enough data in its finalist, in its close to final form when, you know, if that's the first thing you do. Building out payroll tables and all the work tables and employee tables, I don't know, 100 tables, 200 tables, not a great sprint to start with. Then there's all kinds of other infrastructure things in the enterprise that go into sprint planning that no one considers to be part of the sprint. So things like building the servers and having the network there and communication to all these other things and existing UI standards and corporate policies on what we do to encrypt data, all this other stuff is an input to sprint planning and yet no one expects all that work to be done at the start of the sprint. We need to get everyone to understand that the data model and the existing data models are all kinds of other infrastructure things that can be an input into it. So how is the thing that gets agile projects in trouble is thinking of the data models and the DDL as just other code or just documentation. So I've heard this one all the time before. Are you going to get the data model done? I don't have time for documentation right now. If you think of the data model as just after the fact documentation, I agree, not very agile and in theory not going to be used on the next project. I would consider that dysfunctional data modeling. So we fix this by getting people to understand that data models do fit the profile of just enough documentation if it's done by professionals who can do it fast and reuse existing structures. We want to use data professionals who know where all the data is and the data models and the metadata and how to do it fast. If you put a generalist on there who has never done a data model and has used a tool once in a college course, it's going to take them a long time and they're going to think it's just documentation. We're able to build better database with existing models than if we had none at all. We know that. That's been proven. That to me sounds very agile and fits all the principles that we've talked about and that we can build faster databases and faster data models with existing models also sounds very agile to me. We talked about sprints. We talked about the timing of sprints. We talked about recovery sprints. There can be special sprints declared to go in and clean up a lot of defects. Again, contentious issues about that. In my mind, because it's sprint, it's like running intervals. One of my running gurus is Jim Galloway who started the whole run, walk, run method that I use to run very long distances. This sounds very much like the agile features. Continuous use of a muscle will result in quicker fatigue. Sprinting, sprinting, sprinting, working on a death march project makes you tired faster. The longer you run a segment, like a sprint, the more fatigued you will. Run, walk, run is a form of interval training. It conserves resources. It allows you to recover quickly from working hard. There's less stress on the weak links. The weak links aren't the people in the agile process. It's the parts of you that can no longer work 20-hour days, nine days a week. It also, run, walk, run lets you be, enjoy more of the work that you're doing and can reduce your core body temperature, which I'll just say is a great euphemism for getting upset at people. You notice how that really matches with the whole concept of agile sprints. The reason I bring this all up is that on a dysfunctional IT project, agile project, the data modeler who's being expected to run a sprint for every single week and take care of all the pre-existing sprints because usually there's only one data modeler or two allowed on a project, we get tired and cranky. So managing data wrong six, expecting data modelers to sprint an entire marathon. Now I know there are people do this. I know that there are people who run 26.2 miles and are attempting to break the two hour barrier for running 26 miles. By the way, that's well under my half marathon time, not anywhere near my full marathon time. But even those people that can run a two-hour marathon or close to it do not run a two-hour marathon every day. And when they train they don't run 26.2 miles every day. Sometimes they only run a few miles because they sprint to train for speed. So I've had this discussion when is our recovery sprint between two data modelers? And my friend said I heard maybe some time next year. This makes us cranky and unproductive. Basically data modelers are being asked on agile projects to be a death march project so that our other team members will benefit. So unfair. Seventh wrong. One of the things we saw I think three times in the manifesto about embracing change even late in the project. One of the things I've noticed is I get this request all the time. No more changes to the database, okay? And I say yeah, let me go find some waterfalls to play in because I get this all the time. A lot of people who embrace change, embrace change their own deliverables and actually despise having to do rework because something else changed else in the solution. This is a natural human trait but it is kind of funny that I am usually asked near the last 25% of the project can I please fix all these bugs and make all these changes to the database without changing anything and that's a dysfunction. So iterations are awesome except for those that are iterated upon. You should just practice saying that. But the other thing is as we data modelers, some that I work with are notorious for just continuously fixing up things because they don't have the perfect design or the perfect data model, that's dysfunctional on an agile team. Yes, you should put the columns in a logical order. You don't get to do that just because it bugs you. So change management and iteration is important but we have to make cost benefit decisions on making changes. We do have to collaborate when changes are needed and we need to plan for the delivery of that change. We can't just foist it upon team members who are working on sprints. We have to agree during sprint planning when a change can be incorporated. Why is there all this conflict though? I think the biggest conflict is the same things that we see on traditional projects, magnifying a million times, a Hubble telescope, millions of times more because most people have been taught that data models are done for documentation and that they're only used as mechanisms for generating DDL to get to a database. They have nothing to do with requirements, that logical models are just pretty pictures and boxes and lines. Most data models are way too stuck on working the traditional development methods, which is getting an entire data model done and they don't have skills on being able to deliver iteratively and making changes and getting the right parts of the model right to help support iterative development once. Let's not be overly stuck to traditional methods when we're trying to work on agile projects. Most people in the company think software is the most important, most complex part of IT and I think we've done a lousy job of letting people know that the data is important and it's very complex. Most people think data models are just the ERDs, just the boxes and lines and they don't understand why they take so long. Most people have never seen a productive, iterative, responsible, flexible model driven development process and that's true for both modelers and developers. So some tips. We personally need to stop using the word documentation when talking about data models. I do that all the time. I need to generate my documentation. I need to deliver it. I need to stop using that word because that word, even though the manifesto talks about just enough, that word is evil. We need to just say models, generate the models, generate a picture, whatever it is. Modelers need to get scrum training. You need to be asking for it. Even if your organization isn't doing scrum now, you're going to start adopting parts of it and probably already have. You should even ask to be certified in it. Learn the lingo and then start using it. And then you need to push advocate lobby, educate ramp until everyone, business users, the CIO, the CEO, all your team members understand that data models are gold filled resources for agile teams. They are not blocking. You need to get data models and the DDL or database tasks moved sprints ahead of when they're required and you're going to get an enormous amount of push back from that so you can start doing it during the sprint planning and then just keep moving it further and further forward, like back earlier in the process. Don't get punished or pushed into sprinting a marathon. You can't do it and it's not agile when people ask you to do it, just keep saying that's not agile because they're going to be using those words for you all the time. And you don't back off from the teams even if they're being hostile, physically hostile, yes. But a lot of the hostility and conflict is because they don't understand what we have to offer and we rarely understand what they're trying to do. Data models don't be a road block, get ahead of the sprints. Finally, we should practice agile techniques on our own deliverables. That means doing more paired modeling so there are fewer mistakes in our models, do our own test-driven development. That means generating your model, generating the physical model, generating some DDL, even if you're not delivering it, the worst thing you can do to an agile team is deliver DDL that won't even run in a database. That's unacceptable. We should be creating backlog lists of data modeling functionalities. If we didn't get time to put indexes on anything, we need to make sure that gets put in the backlog. We need to be able to let go of the beautiful things we'd like to do with the data model, like laying them all out perfectly or having no cross lines or having nice graphics on them. Those are all important things but we need to be able to parking lot them in order to meet the goals of an agile project and we need to be able to change to know our tools well enough and to have tools that allow us to do continuous delivery. So that was our plan for today. I'm going to be looking at the questions in a minute. I did the 10 tips. Let me look at the questions really quick or the chat. I was on a data warehouse project where we started modeling six months before they require the dimensional data. We know they need because they're reporting what they want to do. That's great. It's heartening to know someone else's heard the conversation. And then good point, agile to one person may not seem agile to another. Minimizing one's work is a way of setting unrealistic expectations for sprint deliverables. Data model being a sprint ahead is challenging because how do you know which stories to model. That's what I mean by it. It needs to be part of sprint planning and you need to work with the sprint planners to get that moved there. And that does mean stories need to be done sooner as well but everyone kind of benefits from stories being done sooner because sometimes you get a couple of developers who can look through the stories and say hey, we're going to have questions about this. And yes, it's the whole thing about, you know, even the concept of backlogging and parking lotting things works when there's one product owner and it's a self-contained system but I've been on projects where it's contracted work and someone has a contract to deliver that functionality with not just your end users but with a customer of your end users. You can't take functionality out. You don't have that flexibility. So let's see, we need to talk there. Lingo tech spike is one way I use model as sprint ahead of the devs. Yeah, I'd like to know all of those sort of information and what that lingo is. So we've come to the end of the official time so I just want to close to the slides and then we can end the recording and we can continue the discussion both on Twitter and chat and Q&A. So as a reminder, Shannon brought up that enterprise data world is coming up so I don't know it'll be my 16th or 17th one. I'm not sure. I'm really looking forward to it because it's in D.C., love going to D.C. I'm doing a workshop along with Joey Dantony on architecting and modeling columnar data stores. You should come to that just because I can't say the word columnar so it's funny. I'll also be leading the ER studio and data modeling special interest group. Thanks again for sponsoring today's webinar. And then I'm doing a double session with Tom LaRocque on data modeling and database design turd down where I argue with a DBA and tell him why he's wrong most of the time. Okay, not so much. And likely lots of other fun things will be going on there. So thank you all. You guys were great. We'll be doing this again next month. So watch for that. You can tweet me at data check and you can continue the conversation at heart data. So Shannon, back to you. Sharon, thank you so much for this great presentation and thanks to our attendees as always for engaging so incredibly with everything that we do. We just love the participation throughout the webinars. And again, you can meet Karen in person at enterprise data world 2015 and Washington DC as she had mentioned and hope everyone has a great day. For now I will turn off the recording for you, Karen, if you want to have a little after hours chat.