 That is the question. Whether to suffer the slings and arrows, the outrageous errors, or take arms against and by opposing end them, my boss decided to have a little fun with Shakespeare and now I have to deal with his poetry. So my name is Sean Dunn. I am an enterprise agile coach with IHS. I work for Todd Little, who is VP Product Development at IHS. What I'm going to talk about today is some research that we did on estimating, and specifically the value of estimates versus no estimates. In doing so, we collaborated with Chris Verhoff from Gris University in Amsterdam. I was actually just a minor player in this whole act. However, Todd was unable to be here today, so I am taking his place. So I'd like to start off with a little bit of audience participation. How many in the room today consider it part of their job to either create estimates or deliver on someone else's estimates? Hands up if that applies to you. I think that's the majority of people in the room. So this will be very interesting or important to you as it kind of determines how successful you will be in doing that. So I fell into software and software engineering because I like solving problems. I like exploring the unknown, solving things that haven't been solved before, and that's basically how we add value. And it's sometimes difficult to try and there's this quote, it's difficult to get a man to understand something when his salary depends on him not understanding it. But that's exactly the problem we face in software. We start off not knowing something and then having to solve the problem. And so when someone comes up and asks you how long is this going to take, we hear this all the time, how long is this going to take? How long will this take? What answer do you give? There's what everybody wants to hear, which is this nice, if this is actual versus estimates, this is highly deterministic. We're really certain that this is the answer. There's not going to be a lot of risk involved. But what is reality taught most of us? That there are a million or an infinite number of things that can go wrong because it hasn't been done before. So reality is that there's this really long tail. There's kind of a limited short amount. We can only make things go so fast, but there's an infinite number of things that can potentially go wrong, that can extend things for long periods of time, longer than we thought. So our actual versus estimates actually tend to follow more of this group here. And this is actually roughly approximated by the research by Todd and DiMarco. So when someone asks you this question, what answer do you give? Do you give them what they want to hear? Do you give them the median, the number that occurs most often? Do you give them the mean or the 50% tile? Or do you give them the high confidence, the P90, the 90th percentile, the number we're more sure about? And when you give an answer, do you know this is the answer you're giving? Does a person who hears it know that that's the answer we're receiving? Do we have that conversation? Or do we just give the one and only number without any context of what it means? So there is risk in software development. There has to be. Without risk, there is no value. If we were building something that had already been built before by someone else, we would know exactly how long it would take, but there would be absolutely no point in doing it. So this uncertainty in risk is this inevitable part of product development. So what do we estimate? What are some things we estimate? Any suggestions? What are some things we estimate? Size, right? So we estimate in time, we estimate in size, and we estimate, at the start of a project or start of release, we'll maybe try and gauge the size of that, and that might be in duration, effort, or cost. We might estimate features or epics, and we often do this in story points or t-shirt sizes. Stories, we can estimate in story points or other measure story points are fairly common. And if stories are broken down into tasks, often they'll be estimated in hours. So for the purpose of today, I'm largely going to be talking about stories. So stories and story points. And why do we estimate? There's... Sorry, what's that? How is that useful? No, so you said to know the project's release date or forecast or release date, why do we do that? So why is that useful? Make commitments? Okay, so order the backlog, so we have to make some decisions, right? So estimation gives us information. Information with uncertainty, but it gives us information. Information is useless unless it is used to make decisions, right? What's the purpose of having information if it is not going to influence any kind of decision? So that's why, you know, yes, it's used for forecasting, but what decision is that going to influence? If we have to do something because it is so vastly important that the cost doesn't matter, then estimates don't matter in that case. So why do we estimate? What kind of decisions do estimates drive? Well, do we start something at all? Do we even begin? Knowing how big something is, may help make that decision. What should we work on next? What priority of things? Should we stop work and work on something else? Maybe we have some new information going through, or as we start working on something and realize that maybe we need to stop this now that we have more information and work on something else. That's a very valid decision. You know, should we swarm or add more resources, more people to something? Should we get help in other ways? Should we split something into smaller segments of value? Should we reevaluate our technical approach? And so these are all types of decisions that estimates may help us make. Right. Absolutely. So help make compromises. Yes, estimates help give you an idea of cost. So therefore, how do we make compromises? And I think that goes kind of to priority in all of these questions here. So I think it was Woody Zool who either coined or promoted the term no estimates, the hashtag no estimates. And there's others like Vasco Duarte who was at the conference last year who's written a book on no estimates. And it's getting a lot of kind of popularity on Twitter these days and heated debates. So Woody describes no estimates as the hashtag for the topic of exploring alternatives to estimates of time, effort or cost for making decisions in software development. And that is ways to make decisions without no estimates, without our traditional approach to estimates. So it's acknowledging that yes, we have decisions to make. What are those decisions and are there alternative ways we can make those decisions differently than we've done in the past for estimating? So one approach suggested by no estimates, this no estimates movement, is stop estimating story points. Stop doing it and instead simply count the number of stories per iteration throughput. So why would you want to do that? Well, what are we trying to do with story point estimates? We're trying to get a forecast, right? When are we going to forecast a release? Do we need to make some of these hard trade-off decisions? How do we prioritize things? So if we have another metric, another number that can help us do that just as well, then that could potentially work. So the idea is if we just use story count, the suggestion is that is actually just as good if not better than using story points. Now this has been quite the religious war in Twitter and do we have any data scientists in the room? A few? All right, so they'll be happy we actually decided to collect some data. And it's not big data unfortunately, but it is some data and so what you'll see next is some charts and graphs, but we actually wanted to see is this true? Can you just use story points? Is that any better or worse than using story count instead of story points? So we collected some real and analyzed some real data. We got data, it was provided by Vasco who from 55 projects across nine different companies, 37 projects came from one organization so yeah, we understand that it's from one organization but if you look at the data even across organizations and within the organization, it's all very consistent and the data is openly available. You can go to the website and you can download the spreadsheet yourself and run your own analysis and write a paper on it if you wish. A couple quick definitions before we get started. So velocity, I'm defining as instantaneous velocity is story points delivered in a specific iteration. Average is the average story points delivered across all iterations over a period of time. Throughput is the number of stories delivered in an iteration and average would be average number of stories delivered over all iterations. So why are estimates useful? Well, we talked about making decisions. Decisions to stop something, start something, prioritize something, get help. And one useful tool we have for doing that is the release burn up or keeping the flow diagram and so if we use story points on it, it looks something like this. And so what we've done is we've normalized time and we've normalized scopes. You'll see it goes from zero to one and zero to one so we can compare all projects on the same scale and what you'll see here is these projects essentially the burn up chart beginning till completion for three projects out of the several dozen we had. And so this is a burn up chart and you can see actually all of these 50% of the projects actually had this kind of 2-12% hardening phase at the end which suggests they might have been doing water scrum fall a little bit. So things slow down at the end, velocity lowered. So now the question is, well, wait a second, if we're trying to produce a burn up chart to forecast, if we just count the number of stories, does that produce something that works just as good? So the first thing we did is just, that's just overlaid over top. So same projects, blue with story points and green with story count and from the very beginning we can kind of see that hey there might actually be some correlation here. Story count might do something that's pretty similar. We might be able to generate burn ups but I'm just a story count. So before we delve further into the numbers I just wanted to introduce an idea of the P90 to P10 ratio. And we use this a lot and I come from energy so we use this a lot in petroleum engineering and measuring the amount of oil in the ground. So P90 is your 90% confidence, right? Your 90% confidence that you're going to meet that deadline. So 10% confidence means you're only 10% confident you're going to meet that deadline. So for example, if your P10 is six months that means you're 10% confident you're going to meet that project date of six months and if you're 90% that means you're 90% confident you'll get it done in 12 months or less. Now the ratio of these two is this P90 to P10 ratio and it'll give you a gauge of kind of your variance, right? So if you've got a large ratio P90 to P10 that means from the most optimistic to kind of the most pessimistic there's a two times gap. So in this case for this project it would be two. So we actually use this P90 to P10 ratio to look at some of the numbers in the data. So the question we wanted to answer one of the questions was do we get better at estimating the further we go in a project? Another way of stating this is does our velocity become more stable over time? Some people say velocity becomes more stable over time. If the variables remain the same? So we've got some people who believe that we can and this is a very common belief that we get better at estimating we understand more, we can make velocity more and more stable so let's actually look at the data. We look at all these different projects to see hey did any of them actually get better at estimating over time? And we discovered that they don't and you can see here the blue line is the P90 the green line is the P10 and if you take that ratio they start off with this ratio of four which isn't really great to start with that's like four times difference between your P90 and your P10 and over the course of lots of these projects if you kind of average it out it actually tended to get worse. This is an interesting discovery. Velocity predictability does not get better over time or that's what the data is suggesting to us anyways. So we have to think about well why that might be and we'll get to that later. So next question that we used with the data is if we just use the throughput so the story count to force cast completion is any more or less accurate than using velocity story points. So if we want to look at our burn up and forecast into the future if we had just taken the story count is it any better or worse than if we use story points? So we looked at the data again and what you can see here is this bottom line here of throughput versus velocity because it's close to one it's basically saying that regardless of which one you use they're giving you similar forecasts for a release date on a burn up. So with the data we have this suggests that actually story count would be pretty much just as good. So next we decided to do a simulation. So we did a Monte Carlo simulation we simulated a thousand projects each project had 50 story points so if you're not familiar with Monte Carlo it's basically you get the computer to simulate it and generate random variables and randomly generate the size of stories and their actuals versus estimates and run a project in the computer and see how it works out and we were able to vary certain variables to see how they impact the results. So we varied the story point distribution that was one thing we changed or experimented with so in some cases all the stories were very near in size in other cases there was lots of small stories and lots of big stories and we investigated how that might impact the results. We also changed the estimation accuracy so we experimented with let's assume some teams are better at estimating than other teams does that make a difference? We also experimented with the bucketing approach. We have different ways of bucketing story points you can use Fibonacci you can create you can use powers of 2 1, 2, 4, 8, 16 so do any of these help? Do they make things better or worse? And then we also looked at this hardening effort at the end to see what difference that made. So what did we learn from this? The first thing we learned is we actually can run a computer simulation that gives us results that are fairly similar to the ones that were collected. You can create probability distributions for your stories and your story size distributions that actually give you numbers that I kind of see here in these lines at the bottom that fairly closely replicates what we saw in the data we collected. So what are some other outcomes of the simulation results? We found that there was fundamentally no difference between velocity and throughput projections. Only about a 6% difference between them. Now there were cases where velocity showed an improvement over throughput when not an entirely big surprise, but if story point distribution was very large, if I've got massive stories and some little stories and they're all mixed together then having some kind of determiner between big stories and small stories big surprise does happen to prove estimation or improve predictability. And also if the team was good at doing estimation then I'm really good at doing estimation that can help. Changing the buckets didn't really make much of a difference. So whether you do Fibonacci powers of 2 or even power of 4 didn't significantly impact the simulation. How many? So if we use powers of 4 and 16 like there's large when story point distribution was large so when there's big stories that actually multiple orders of so the analogy is estimating mixed nuts if you have a bowl of mixed nuts and they're all roughly the same size then the exact size of them doesn't really matter that much. However it does make a difference if you have big coconuts mixed in the bowl which isn't really a terribly surprising conclusion. So what does this tell us about estimation? So what kind of conclusions or what kind of things can we take away from this? When it comes to decisions that steer towards the release velocity and throughput are equally good and bad predictors they both have flaws but they're also about equivalent and being able to produce results provided that all your stories are roughly the same size to start with and then you can get in this argument well if they all roughly have to be the same size to start with then isn't that a form of estimating and probably yes but the idea that you can actually use that and you can actually just use throughput provided your stories are the same size. So what other decisions? Decisions to help with managing iterations. I've got a question. How many people use decomposed stories into tasks? How many people do tasks estimate in hours? So majority again. What decision does that drive? How is that information used? So whether you can fit that story into that into that iteration. Any other reasons? That's not a decision. That's this graph. You generate the graph. So what decision does that drive? So the story might not get completed in that iteration. So what decision are you going to make with that information? So I heard a decision there. If two ones have similar priorities this information might help us choose one if it costs less. So I'm just trying to get the idea of how are we using this information? What decision is being driven by that information? And so when it comes to task estimates and you're doing burn downs like this which are actually this is an interesting one. Because look progress, progress, progress, progress, progress, progress, progress then everything kind of gets delivered. All the story points actually get delivered on the last day. I've looked at lots of teams within our own company and this is surprisingly common. It's not like you get what you measure. So if you get this or if you want this perfectly linear burn down you'll sooner or later get it but that's not the same thing as delivering stories. So that's actually a pretty So delivering tasks isn't the same thing as delivering stories. So what does this tell us about estimation at the project sanction level at the start of a project? Well, some level of macro estimation of costs is probably likely necessary in order to make business decisions. That's not. What did I do? Is likely necessary to make business decisions. The company probably wants to have some idea of how big or small something is. But the risk is that we potentially could spend more time on cost estimation than the benefits that that estimation provides. We see this in the data because there's a large degree of variance in the estimate and so if you spend lots and lots of time trying to get more and more accurate more and more accurate, more and more precise estimating, you actually can waste a lot of time for not a lot of benefits. If you're looking at your project portfolio management and you've got this idea generator and then ideas kind of get rejected out of the filter and then as you go along some projects get canceled in progress and that's okay too and then finally some of them hopefully make it through this filter into operation. So throughout this pipeline estimates are useful in making this decision making throughout the pipeline but how can we use our knowledge of this variance in estimates to make sure that we're not making decisions on false precision. So how did these results compare to other research that's been done on this? So there's actually been a few studies done over the years on software and software estimates and the accuracy of these estimates and so I'll just review some of them and show how these results compare to others. The first was done the first was done by the one I'm aware of is Landmark Graphics and so in this case there's plotted this is an actual versus estimate plot so actual on the vertical and estimate on the bottom and so if we have perfect estimation all of the points will lie along this this 5 degree angle line we're perfectly accurate estimates are always equal to actuals regardless of the size of the story and so we collected data this study collected data from Landmark Graphics company and this shows you the distribution of actual versus estimates so I think I believe this line here was actually like the 1, 2, 3, 4 yeah so this is like the 4 times line and then there's another study by DeMarco and you can see that's the red points there which had largely largely similar results so what's this showing us is that how many of these points are actually grouped along this ideal line and how varied are they some of these projects or some of these stories up here were I think 6 times with their estimates were alright so how about some other research so this is done from Steve McConnell and it's in his book and in this case he's got its perfect accuracy line it's the same chart but just another study and on average the target was 22 days but the actual delivery was on the order of 56 days and he measured this in days here so very consistent results right these ones these ones here were done were done in time like they were actually measured and estimated so yeah these ones were measured and estimated in time the story point estimating in Scrum basically compensates for systemic errors in your estimation like if you assume that ok we're going to be wrong but all our stories are roughly going to be wrong by a factor of 2 well once we start measuring velocity well that factor of 2 now we've measured of what that factor is and when you forecast so it kind of that's how Scrum gets around gets around that so here's here's the histogram of the actual versus originals so I think only 10 to 20% of the stories were in the one times category like within really close to their estimates 50% were 2 times or less it's only half the stories were under 2 times their estimates and then 80 to 90% of the stories fell within the p90 interval so that's saying 80 to 90% of your stories you actually have to go out to a 4 times your original estimate to actually collect the 90% of 90% of your stories for which is a huge it was actually a fairly huge range I mean that's basically saying if you've got if you estimate a project so I'm going to take one month yeah 4 months yeah so if you do 100 projects and you estimate all of them to be one month you know lots of them are going to take up to 3 or 4 months or you have to go at least that far out to collect 90 of them as we heard of this study from Jorgensen I pronounced that correctly in 2013 this was a really interesting one so they actually got research funding and so they put a software development project on bid there's this online marketplace V worker so they actually said hey we want someone to build us this piece of software we want companies to bid on it and they had I think they had fairly well defined requirements so it was very clear what needed to be built they received 16 bids from these companies the companies didn't know this was a research project they think this is real honest to goodness work they reduced down to the 6 bids with the highest online rankings so people thought these were good companies from their previous work and then the 6 bidders went on to actually build the software so they actually paid all 6 companies to build to build exactly the same thing and none of these companies knew the other companies were doing it and they didn't know they were part of a research experiment and so they were trying to see okay well what was the variance in the actual versus estimates and so the highest estimate was 8 times the lowest the actual versus estimate range was somewhere between 0.9 to 3 so the best team came in at 70% what their estimates were the worst team came in at 3 times their estimate and then if you actually just compare the actual performance like how was the fastest team overall to produce the product versus the worst team the fastest team got it done 18 times faster than the worst team I think this one is really interesting like the huge variance when it comes to both software productivity and estimation no this won't show you if there's a correlation between if you're good at estimating I'm actually good at delivering the software I don't know if they actually have that information in the study it'd have to go back and look but it is an interesting question does that mean if you're really good at estimating were you actually good at delivering I don't know I have to go back and see how does our results compare to this and I think it's largely consistent and here's another here's another study that shows I think it's more data to show this estimation ratio over time do we get better at estimating over time and you can see largely this graph is consistent over time it's basically showing your actuals versus estimates and as you go through the length of a project you're just as bad at estimating as you are at the beginning so this is a big surprising discovery I think we started off at the beginning people saying as we go through we're going to get better at estimating our velocity can get more predictable why is it that's not the case why does the data not seem to show that well if you think about it there's the four times line there if you think about it it kind of makes sense if you look at the invest criteria we want independent user stories stories that will be independent of all the other user stories that means each user story kind of has its own degree of uncertainty so appears all our different stories here well if I work on this one it's going to be different than I work on these ones and so we get two stories done, three stories done four stories done but each of these because each of these stories is independent they have independent risk so it really shouldn't be any surprise that as we go through we have we don't actually get any better at estimating so for everybody who said at the beginning that it's a major portion of your job to either estimate or deliver on your own estimates or someone else's estimates are you all worried now are you screwed so now the question is now what can you do I think there's lots of data from this study and the previous ones I mentioned that show that there is a large degree of uncertainty in our estimation so what can we do about it and one model to look at this is the Knepin framework that Dave Snowden is popularized and he's got this idea of we can have simple projects and if you look at your probably your actual versus estimates you'll have like a very narrow thin line this is what people want to see in software projects we have a high degree of certainty there's no risk or distribution or variation and sometimes these projects come about things that are very simple, well defined more often than not we have something that is at least complicated where you get some kind of uncertainty there's some kind of distribution and sometimes that can be narrower sometimes that can be wider and then you go into the complex terrain where you have a very long tail there's a lot of risk lots of things that can go horribly, horribly wrong and that's where you need a complex mechanism which Scrum provides honestly to iterate over and then chaotic is just all over the map I think the major thing is understanding what category you're in you can't apply methodologies that require a high degree of certainty if you are in a complex domain waterfall requires a high degree of certainty because you're creating all these dependencies the sequence of events that have to be followed in order, you basically need to be working on a simple project for that to work effectively waterfall requires a lot of predictability so step one know what domain your project is in the other technique which Todd little has often talks about which I think is an effective one is once we know how bad estimates can be that ranges, that up to four times well so much about softwares managing expectations so why would we say that we can get this scope of work done or create the expectation that this amount of scope of work can get done when we know that there's a good chance because of uncertainty we can only get about 50% of that or 25% of that so what's the best method instead of using moscow must, could, should familiar with that we must have this, we could use this what happens if you don't deliver on a should it's not a must, it's just a should but if you don't deliver on a should you feel bad it's bad, we should have had this in here but you didn't deliver on it so his approach is using ABCs instead we created an expectation for we have created commitments for and it must be completed to ship the product and the product owner knows that they are aware that they will slip the shipment date to make that all the A's it's so important that they've made the conscious decision they will move the release date in order to make all the A's B is now wished, we wish to have it in sounds much nicer than should it's wished to have it in but it's not an expectation and see it's not targeted and only A features are committed and if you devote more than 50% of the planned effort is allocated to A things, the project's at risk consider that don't commit more than 50% of your, your plan's scope don't create the expectation of getting more than 50% so you know if this is our target delivery date here you know 50% of that time we're creating that expectation that okay we're fairly confident we can get these A's in and then the rest of the time you know we wish to have them in and then maybe some like lower priority things and then what'll happen is often times because of uncertainty we'll actually deliver just A's right, that one times estimate turns into a four times actual and what happens is you get nothing but A's delivered and that's okay but because we managed expectations throughout this is a perfectly normal thing to happen we still got the most important things done but we've now hopefully improved trust with our customers by not creating false expectations and a false sense of precision so in conclusion the original question was to estimate or not to estimate going back to some more Shakespeare the answer that I think what I'm up with is that there's not one's not right nor the other you can use story points you can use throughput they both have their benefits they both have some costs there are many ways very similar but know what it is you're doing and why so instead of you know let's try to avoid a lot of these religious debates that go on about estimates versus not estimates that have a deep understanding what decisions we're needing to make let's be honest with ourselves about how accurate estimates really are and accept that there is a large degree of uncertainty and risk in software development because without risk we really wouldn't be creating any value and the software development teams can't be alone in accepting all that risk we talk about partnerships and working together with the business this has to be a shared risk in product development and discovery so and I'm missing my last slide apparently I had some contact information which isn't there but those aren't even my kids I'm borrowing this laptop yeah no I've got my own kids I don't need anymore so yes the information that's available is the paper that has much more information I talked about today is available on Todd Little's website as well as this presentation and it's ToddLittleWeb.com he's easy to find so if you view that paper you can get the data, the raw data and you can do your own analysis and draw your own conclusions and we are easy to find on Twitter and LinkedIn and all that kinds of things if you have any further questions alright thank you for the time and I think we have a few moments left to ask maybe one or two quick questions so the question is if I understand correctly make an estimate and find out it's wrong because there were some assumptions made at the beginning so what do we do about it do we just say we don't bother estimating at all or do we so I think that's what I personally like about that is you're communicating some degree of confidence some degree of confidence or risk I like to encourage teams to do that when they do have to have these conversations not just here's give me the one number it's here's you know we're fairly confident we can have it something by this date, less confident by that date I think that having that conversation around risk and uncertainty and getting their partners and our customers thinking along those lines I found that to be helpful keep in mind a lot of the work I do is internal applications within our company the other thing is we are going to be wrong we are going to be wrong a lot and that's okay it's just now that we're going to be wrong on small scales and then let's find out about it and iterate over it so if it takes an iteration and a day vice an iteration minus two days then that'll happen from time to time I don't think that's the end of the world but are we able to get the feedback and I think that's the main thing is we're talking about empirical process control and getting those feedback cycles one more point and that's a terrific point Eisenhower quote plans are useless but planning is essential and even within our own teams and our company when you ask the question why do you estimate they came up with the same point it's like it is a mechanism for helping us explore the problem space we get information out of simply going through the activities which you're absolutely right isn't represented in this in this data at all one more question I won't say anything conclusively because I don't remember the exact numbers but I'll say somewhat tentatively based on what I remember of looking at the data is that the bigger things get the less certain it is there is a benefit to seeming to break things down smaller that four times uncertainty goes to something larger when you get to bigger things and I think it goes back to the breaking things down that activity helps us explore the problem space better but there's a cost trade off to that how much time do you spend doing that in this case small projects you can spend more time wasted in estimating when you can just actually start work tentatively alright thank you everyone appreciate it