 Welcome to Friday of Agile India, it's a pleasure to be here. I love coming to this conference, I've been coming for years now. The very first talk I ever gave was in Agile India 2005 or something like that. It was, wasn't it? Also run by Narish. Huge thanks to Narish for continuing to do this for so many years. It's a labour of love and you've done an amazing job all this time. Mae'r ddweud o'r ddaf yn gweithio. The co-author is continuous delivery. I'll be going to his talk on acceptance testing a bit later as well to see what I can pick up. So, who works in an organisation of more than a thousand people? OK, most of you. More than 10,000 people. More than 100,000 people. More than a million people. No one working for Indian railways? OK. Felly, mae'r lluniau y ddau'r ystod yn ymdwylliant rwy'n mynd i'r cyngorol. Mae'r clwydd meddwl yn gwisio'r gwaith a'r gweithio'r rhannu, mae'r cyngorol yn y ddweudio'r rhannu ymdwylliant a'r lluniau yn y ddweudio'r cyngorol, mae'r cyngorol yn y ddweudio ar gyfer I.T. Oparadog Martin Ffaller ond mae'r cyngorol yn y ddweudio ar gyfer I.T. pan yn y cyfnod ymdwylliant, mae'r cyngorol yn ddweudio ar gyfer I.T. Ynw'n drwy'r tyn nhw'n gweithio, roedd ymweld y cyfrifoedd o'r cyfrifoedd o'r dyfodol. Mae'r bysrach yn ddiweddol i'ch byd i'ch beth sy'n gallu bod allanol yn ymweld. Dwi'n gyda'r ffordd o'r llunion yn y gweithio? Mae'r gweithio yn gwneud o'r dweud. Yn gyfnod, rwy'n cyfrifoedd y gweithio, oedd ddim yn gweithio'r gweithio. Dwi'n cyfrifoedd, roedd y cyfrifoedd yn ei gweithio'r gweithio'r gwrthau, after a particularly fine round of golf with an idea for a new product they want to create. And then months pass while we go through the budgeting cycle, we do very detailed analysis and some project plan lands on some poor project manager's desk and some massive pile of paper. And then the team goes off and spends many months, sometimes years, building that thing before it gets tossed over the wall into operations, we wire it up, we make sure it works, bing. And then the team that built it goes off and does something completely different instead and the thing they built lasts forever. So this is... Anyone have a friend who works in an organisation that works like that? OK, lots of you, yes. So the answers to this obviously is that we're going to go agile. So everyone goes on the two days, Scrum training course or the newer scale courses which are the same length but cost more money. And they come back and now we're taking orders from management standing up instead of sitting down. And that huge backlog of work we can't ever complete is now prioritised and estimated. And now we're agile. Yay! How does the rest of the organisation react to the exciting news of the agile transformation? Are they filled with joy and happiness at this new state of affairs? Not normally the case in my experience. The business isn't that happy about it because... I mean the whole point of agile is better collaboration between the business and technology people, right? But that means the less time playing golf and more time talking to the engineers. So that's awkward because often the engineering floors don't have the nice artwork and the light-filled windows. And the engineers don't like it very much because the whole reason people become engineers is so you don't have to talk to other people. So awkward all round. And then IT operations doesn't like it that much either because instead of some nasty pile of crap coming over the wall once a year, now we have software coming over the wall all the time. And IT operations response to that is, broadly speaking, please stop the madness. We don't want this. The engineers are like, what are you talking about? We're practising TUD. We're using the solid principles of good OO design. And IT operations response to that is, broadly speaking, that's lovely, shame it doesn't actually work. And it's important to bear in mind that IT operations has a natural and logical response to this problem. And that response is to create a barrier which is known as the change management process. And the job of the change management process, of course, is to make sure that nothing ever changes. So that's where we find ourselves in many organisations that have adopted agile is that the developers are working in sprints, but the actual lead time to go from an idea to a measurable customer outcome still takes months or years. It hasn't actually had a substantial impact in the overall lead time to get software delivered. So the question you've got to ask yourself is, can we do better? And the answer is, yes, of course we can do better than this. I should probably change the next slide because I've now got a job with Google, not Amazon, but you know, there we are. Amazon, and this is like five, six years ago now. Who's seen this slide before? Okay, well, a decent number of you. So this is about 2011, I think. This was, so seven years later, and a lot of people have seen this slide, but not everyone. Amazon is deploying to production, on average, every 11.6 seconds, up to 1,079 deployments an hour. On average, 10,000 hosts receive that deployment, up to 30,000 hosts receiving those deployments. It's important to bear in mind a few things. Number one, Amazon is a publicly traded company. That means that Amazon has to follow Sarbanes Oxley, so it's highly regulated. Amazon also processes the occasional credit card transaction. They have to follow PCI DSS, so highly regulated organisations still able to achieve this. If you go and call a Dave Farley, he will tell you about Elmax, which was, at the time, the world's highest performance financial exchange. Again, they already have you regulated, and they were able to make many changes per day in that environment as well. So it's entirely impossible to do this. Sorry, it's entirely possible to do this, even in large, highly regulated organisations. It's hard. None of this stuff is easy. And if you're not doing this, it will take you years to achieve this. Amazon spent four years re-architecting their entire system to a service-oriented architecture between about 2001 and 2005 in order to be able to achieve this. So it's hard and expensive, but it is possible. So me and the co-founder of DevOps Research, Dr Nicole Forsgren, who is the principal investigator on the State of DevOps report that we work on, she and I have been researching how you build this capability, how do you build high-performing technology teams? And I'm going to tell you a bit about that research. So our research is scientific, and when you're doing scientific research and you're talking about predictions that something predicts something else, you have to meet one of three criteria. Either you have to do randomised experimental design, which is very hard to do in software. This is the standard that we use for testing drugs. We do randomised control trials. You can't really do that in software because there's a lot of variables that are very difficult to control. Most organisations don't like having control group teams. It's also somewhat unethical. So we don't do that. It's very hard to do randomised control trials in software. Or you can do longitudinal where you study teams over time. Again, that's very expensive. It takes a lot of time. We don't do that. The other thing you can do, which is what we do, is called theory-based design, where you have a theory that's been set out in the literature already. That theory makes predictions, and then you can test those predictions. And if when you test the predictions, they're validated, you can say, well, that's evidence that the theory is true. So that's what we do. We take theories that have been shown in the literature. We design surveys which ask questions that test predictions of those theories and find out whether or not the data tells us that they're true. If we have worked in this way, we talk about predictions, that this thing predicts high performance. Otherwise, we're only talking about correlations. Correlations, you have to be a bit careful. One of my favourite sites, if you Google spurious correlations, you'll come to Tyler Wigglin's spurious correlations website where he shows some quite interesting correlations. So one of my personal favourites, the fact that per capita cheese consumption correlates with a number of people who dies by becoming tangled in their bedsheets. There's another number of other fascinating correlations that you can see on that website. So all this to say, you've got to be a bit careful with correlation. That's why we always look for prediction, where we can talk about something predicting something else. So with that said, on to the data. So we've been studying firms of all different sizes for five years now, we're in our sixth year now, and we've studied thousands of organisations worldwide of all different kinds, including very large organisations and organisations in different domains, including financial services, telecoms, government, healthcare, highly regulated domains. And these results that I'm going to tell you about hold across all those domains. So I often come to organisations who are like, well, that sounds great, but it won't work here because we're this, we're big, or we're regulated, or whatever. And I'm here to tell you that makes it hard, but it doesn't make it impossible. And it's absolutely possible to do all the things I'm talking about, whatever the size of your organisation, whatever domain you're working in, you can do this stuff. So the first result we found is that software delivery matters. We've been told for years that software delivery doesn't matter, but it's not a competitive advantage. Our data shows that's not true. What we find is that firms with high performing IT capability were twice as likely to exceed commercial goals such as profitability, market share and productivity. We also find that it's important for non-commercial goals as well, including things like operating efficiency, customer satisfaction, your ability to achieve organisation and mission goals. So these are outcomes that matter to all kinds of companies, both for-profit and non-profit companies. The next question that will be burning in your minds, I'm sure, is how do we measure software delivery performance? And there's lots of terrible ways to measure software delivery performance. Anyone still measuring lines of code? Anyone? Yeah, there's some people out there measuring lines of code. Still today. That's obviously a terrible idea. If I could write something in a thousand lines or in two lines of bash, which would you prefer? Two lines of bash, of course you would. And say that that's the obvious floor in measuring lines of code. Ideally, the best thing you can possibly do if your organisation is delete lines of code wherever possible. That's my favourite thing to do and I can make a change that deletes huge swaths of the code base by simplifying a process, for example. That's great and very satisfying, but it's always a bit awkward when you put minus 5,000 lines into your productivity measure and then your manager comes to talk to you and you're like, but this is way better. What are you talking about? Some organisations even measure lines of code as an asset on the company balance sheet. So you could actually get into trouble for deleting lines of code by destroying company assets. That actually happened in real life, by the way. Michael, Mike Nigard, tells a story of a company where that happened. So companies like to do lots of horrifying things and that follows just from accounting. A lot of organisations capitalise software development and say you've got to amortise the cost of that over five years and so that does follow from accounting regulations that lines of code are a company asset, arguably. So lots of terrible ways to measure software delivery performance. Who uses velocity on their team as a measure of productivity? Okay, don't do that. That's a terrible idea. Velocity is not designed to measure productivity. It's designed as a capacity planning measure to work out the capacity of your team so that you can work out how much work you should take on. It's not a way to measure productivity. What happens when you measure productivity is you're comparing teams based on their velocity and then, oh, well, you don't need it at four points. What happens next week? Well, we're going to pad the estimates. Oh, that's definitely a five-point story. Oh, it looks like a six-point story. But then when you pull it off, your velocity's really high and you're like, well, look how much more productive we are because we padded our estimates. So terrible idea to use velocity as a productivity metric. It's just people game it. It's really easy to game. How can we design a metric that is effective at measuring productivity? You need something that is a system-level outcome, which means it can only be achieved by people working together. And you need something which measures different aspects so that people don't make bad trade-offs when they're optimising for it. So the measure that we came up with, which turns out is both valid and reliable over the five years that we've measured it, is comprised of these four measures. There's two throughput measures. So lead time for changes. How long does it take for changes to go from version control to in the hands of users, and then deploy frequency, how frequently we deploy, and then to stability metrics. When you have some kind of incident in production, how long does it take you to restore service through some kind of remediation, and finally change fail rate, which is a measure of the quality of the release process when you make a change, what percentage of the time do you have to remediate? So two throughput measures to stability measures. And I think if you take only one thing away from this morning, it should be this. High performers are not trading off throughput and stability. They're not going faster and breaking things. Instead, what they're doing is they're going faster and producing more stable and reliable systems. This has been reflected in the benchmarks. Every year, last year was no exception. We found that our highest performing group, the elite performers were deploying on demand. They could restore service in less than an hour. Their change fail rate was really low. And then our lay performing group were releasing between once a week and once a month. It took them between a month and six months to get changes out. Time for a store service was between a week and a month as well because it's just really hard to get changes out in that kind of environment and their change fail rate was pretty high as well. So high performers are not trading off throughput against stability. They're doing better at all of those things. And again, this was true for large organisations, highly regulated organisations. We found elite performers in large, highly regulated organisations. We found low performers in startups. I've certainly got a friend who's worked at a startup where they were definitely not in the elite performing group. So, you know, I think it's a mistake to think that you can't do this because we're a big organisation. The data says that's not the case. If you look at the ratios of those different groups, we found actually that 48% of the people who responded to the State of Devote survey last year fell into that high performing group. So that was pretty big. The industry is getting better at this, which is terrifying if you're in the low performing group because what's happening is the industry as a whole is starting to get better and better over time. That's consistently been the case. So what that means is the low performers are typically the people who aren't getting better. So they're just going to get stuck there while the rest of the industry accelerates away from them. That, in my opinion, is the defining characteristic of high performing teams, is that they're always trying to get better. They're never satisfied with how things are. Process improvement is part of everyone's daily work. We make capacity available for process improvement work. We dedicate resources to it. We're always trying to get better. You don't hear, that's just the way we do things here. Or we always have done it like that. People are always trying to get better and the organisation dedicates resources to helping teams get better. We measured availability for the first time last year. So that's a measure of the extent to which you're able to keep your service up and make promises about how much of the time your service will be up. And again, what we found is that elite performers were 3.5 times more likely to be doing well in terms of availability. So high performers, again, were not making trade-offs. They're able to achieve high rates of throughput, high rates of stability, and high rates of availability. So they're doing well at all of those things. Obviously, if you're a low performer, you want to work out how to become a high performer. So the next piece was working out how to get better. You probably won't be able to read this at the back, but you can get it on the DevOps Research website. It's in the back of our most recent book Accelerate, which I think is on sale out there somewhere if you're interested. But this diagram basically looks at those predictions that we've discovered in the course of our research. And broadly speaking, what you can see is, on the far right here is organizational performance. That's the ability to achieve high rates of productivity, market share, and so forth. And then non-commercial performance, those are the things I was talking about, about high customer satisfaction, ability to achieve mission goals, and so forth. So software delivery performance, the ability to deliver with speed and stability, predicts organizational performance and non-commercial performance. So getting better at software delivery predicts higher levels of all performance. And then broadly speaking, there's three groups of things you can do to improve performance. Number one is a set of technical practices, which broadly speaking are known as continuous delivery today. So that's a whole bunch of things, including automation, using version control, doing trunk-based development, having a loosely coupled architecture, shifting left on security, a whole bunch of stuff that I'll talk about in a little bit more depth. Secondly, a bunch of management practices, which we normally talk about as a lean management, so that's things like limiting work in process, visual management, and so forth. And then finally, a bunch of product development practices, like the lean startup kind of practices, which means working in small batches, allowing teams to experiment with ideas for new features, gathering, and actually implementing customer feedback, and then making the flow of work visible. So broadly speaking, the way you get better is by implementing the practices. So implementing the practices makes you better. In order to do that, you need effective leadership. So on the far left of this diagram are effective leadership practices that we call transformational leadership. So this is having a clear vision at the leadership level of what you want to achieve, making sure that people are intellectually stimulated in their jobs, making sure that you can communicate effectively in a way that inspires people, make sure that leadership supports people as they overcome obstacles and giving people personal recognition. So leadership is what enables you to actually implement these capabilities, and these capabilities make you better at software delivery performance. So moving on to the technical practices. Last year, 2018, we actually measured a set of new practices, including monitoring and observability, building security in, making database changes in an automated way, and continuous testing where developers and testers work together throughout the software delivery cycle rather than having testing in a separate phase after Dev Complete with a different team. All those things together we call continuous delivery. We find that that improves performance. We also find that it reduces team burnout. So it actually makes teams happier and more productive, and it reduces deployment pain as well. Actually, I shouldn't have said more productive. Productivity is something we're looking at this year, in fact. So I'll have the results for that later on this year. But it definitely makes teams less likely to be burnt out. It also improves culture, and I'll talk about culture at the end of this talk. I want to talk about a couple of these things. Firstly, I want to talk about architecture. We found that architecture was very important. In the 2018 responses, it was a very strong predictor of the ability to achieve continuous delivery. People think a lot about cubanetties and microservices and all these buzzwords. What we found is, it's not the technologies that you're using that's important. It's the outcomes those technologies enable. What we found was that what was important was being able to answer yes to these questions. Can my team make large scale changes to the design of its system without the permission of someone outside the team or depending on other teams? Can my team complete its work without needing fine grain communication and coordination with people outside the team? Can my team deploy it and release its product or service on demand? Independently of other services, the product or service depends upon, can my team do most of its testing on demand without requiring an integrated test environment? In other words, can I run acceptance tests on my local workstation and end up with a good level of confidence that my software is releasable without having to wait for some very complex integration testing environment in order to be able to do acceptance testing? Finally, what for me is the most important measure of the outcome we're trying to achieve with continuous delivery? Can my team perform deployments during normal business hours with negligible downtime? So a quick show of hands. Who is still doing releases where the releases happen outside of normal business hours? In other words, evenings and weekends. Okay, that's about half of you. So that for me is a, I take that as a sign of personal failure. So Dave and I wrote this book in 2010 basically because we were sick of doing deployments at the weekends and we didn't want to do that anymore. So obviously we need to go back to the drawing board there. Dave hasn't worked out. But that's really what it's all about. You should be able to perform deployments and then push a button away during normal business hours. And if you can't do that, that's a sign that something is wrong. Something's wrong with your architecture. Something's wrong with your organisational culture. You should treat that as something that makes you sad and that you really need to fix. It shouldn't be treated as inevitable. You can come to me and Dave with pretty much any system that you're working on and we will be able to give you an example of someone in the same situation who found a way to fix that problem. They are problems that can be solved and you should absolutely work on that. So the important thing about this is you can do this with mainframes. Equally, you could build a microservices architecture on Kubernetes and not achieve these outcomes. And then you wasted all your money. It's not the tools that are important. It's the outcomes that the tools enable. And tools are great. Kubernetes is awesome. I love messing around with serverless stuff. All these tools are great, but it doesn't matter unless you actually achieve the outcomes, that's what's important. We wanted to look at cloud adoption. Who here is running services in production on public cloud? Not very many of you. Okay, all right, keep your hands up. If you've got your hands up, keep your hands up. All right, so quick test. If developers are not able to self-service the resources they need without having to raise a ticket on service now or send an email to someone, put your hands down. So if you have to raise a ticket or email someone to get a virtual machine or some kind of networking configuration change, put your hands down, otherwise you can keep them up. Okay, how many people have we got left? One, two, three, four, five, six, seven, eight, nine, 10. All right, so you've got ridd in most of you. Broad network access, you can access your cloud services across public internet and intranet, all different kinds of devices. If that's true, keep your hands up, otherwise put them down. All right, resource pooling, which means that you have multiple VMs to one physical server, so you have the illusion of infinite resources. If that's true and you never have to worry about being able to get more VMs if you need them, put your hands up, okay? Rapid elasticity, you can scale up or scale down your service on demands, just even without having to configure it, or if you do need to configure it just by clicking on something, if that's true, keep your hands up. All right, and finally measured service, you only pay for what you use. All right, so there's how many hands up do we still have? One, two, three, four, five. All right, thanks very much. So what we found, so this by the way is from NIST, the National Institute of Science and Technology, is that right? Anyway, it's the US agency that defines a lot of standards that are used, National Institute of Standards and Technology. It's the US agency that defines standards for all kinds of technology services. This is the definition of clouds. What we find is that only 22% of teams that say they're using cloud are actually using cloud, and it's very common for organisations to adopt the cloud but then put it behind service now. Then you don't have a cloud. What you have is some very expensive virtualization that you treat the same way that you would have treated physical infrastructure. You've got to actually change the way you do things, otherwise you're not making any effective use of this capability. What we find is that teams that could say yes to all these five definitional characteristics of the cloud were 23 times more likely to be in the elite performing group. So clouds is actually a huge force multiplier. It's a game changer, but only if you use it right. You've got to change your processes and your practices in order to make use of cloud. You can't just use the same processes you've always had, but have a cloud behind it instead of a data centre. It doesn't make a difference. So, yeah, only 22% of teams that say they're using cloud actually meet these definitional characteristics, but they're much more likely to be high performers. You can do this in highly-regulated environments. I have a talk this afternoon about implementing clouds in the US federal government. Come along to that, there's a white paper on our website about how to do clouds in highly-regulated environments. I have a talk at 2 o'clock this afternoon on that as well. We looked at monitoring and observability. What we found is that if you have comprehensive monitoring and observability in place, you're 1.3 times more likely to be an elite performer and having effective monitoring and observability solution in place positively contributes to your performance. These are our definitions that we use of monitoring and observability. Broadly speaking, monitoring is about having predefined checks in place or monitors in place. Observability is about having tooling in place that allows you to go and investigate your production systems in ways that you haven't predetermined. So being able to go and gather data and correlate that data and analyze it in ways that you didn't originally think of. And what we found actually, which was interesting, is that the people who took the survey perceived monitoring and observability as being the same thing. So that was interesting. An example of the kind of things that surveys turn up that are weird, this was back in 2014 when we were looking at effective test practices. We asked people to strongly agree or strongly disagree with a bunch of practices. We had some hypotheses about which ones would work out to be true and which ones wouldn't. See if you can guess which ones ended up being true and which ones didn't. So it turns out that the ones in red in statistical terms, they didn't load, which means that they didn't predict anything. And the first two, you can kind of see why that would be the case. Having QA primarily creating and maintaining acceptance tests isn't very effective because what happens is one of the points of doing TDD writing your test first and having developers write your tests is that testing actually is an important input into the design of your software. The point of TDD is to create code that's testable. If you take code and try and retrofit tests onto that code after it's written, what you'll find is that those tests are really expensive to maintain. And that's why having testers create and maintain acceptance tests, that tends to create expensive brittle acceptance tests because there isn't that feedback loop that encourages developers to write testable code. Having tests primarily created and maintained by outsourced teams is the same thing, but even worse because the feedback loop is even slower. We thought that developers creating on-demand test environments would be a good thing. Turns out it doesn't load. So that was a surprise to us. We don't know why that is. When you do surveys and gather survey data, it never tells you why something happens. It only tells you that it happens, and then you have to go and do qualitative research to find out why that happens. So we don't know why that's the case, but it isn't. But we always find surprises like this every year. There's always something surprising that we find, and that's a good thing. If you're never surprised, you're probably not doing science. Last year we looked at continuous testing. We found, in addition to the previous practices, the ones that weren't read, we found these things predicted high performance. So continuing to look at your tests and change them over time, to curate them basically over time, to keep their cost and complexity under control, having developers and testers work together throughout the delivery lifecycle, including manual activities like exploratory testing and usability testing. Doing TDD, writing tests first before you write the code that makes the test pass. Who's doing TDD in this room? Writing the test first and then writing the code. OK, so that's about a quarter of you, which has been consistently the case when I've asked this question for the last 10 years. It's about 25%. And then finally being able to get fast feedback from your tests in less than 10 minutes. That's really important as well. So that's the testing practices. I want to switch gears a bit and talk about management practices. We find that these three things on the left having effective working process limits that don't just limit work in process, but also require you to improve your process. If you have whip limits, but you don't fix any of the things that the whip limits demonstrate or constraints, then there's no point in having them. So you've actually got to do process improvement off the back of it. It turned out that on its own doesn't predict software delivery performance. So that was weird. Everyone knows that whip limits are really good. Why doesn't that predict software delivery performance? What we found is it only works if you do these other things as well. So you've got to do these three things on the left in combination, otherwise it doesn't work. So you've got to have the whip limits that have the dry process improvement. You also have to have the visual displays like the CI build on a monitor, other measures of productivity and quality. Those have got to be visible throughout the organisation. And using feedback from your production monitoring tools to make business decisions, those things all together, you have to do them together or they don't work. And they predict high levels of performance. This is the lean product management stuff. That there's a lot of things that are kind of obvious, but it's nice to know they're true. That thing on the bottom left is the thing that I still don't see enough of. A lot of organisations say they're agile, but developers can't change the specifications. So you're getting your stories in kind of the nice as a so that format, but if developers start asking awkward questions about why the specifications are that way, they're just told to shut up and carry on developing it. Who works in an organisation where the developers can't effectively change the specifications or the stories? Or maybe you've got a friend who works in an organisation like that. I mean, there's quite a lot of you. So that's not agile. If teams aren't able to change the requirements or try out new ideas, then it's not agile. It's still a big problem. So those things together all predict software delivery performance. And also there's a virtuous cycle. Improving a software delivery performance is what enables you to apply some of these processes. So you see these arrows here are going in both directions. That's because there's a virtuous cycle here. This is something where higher performance of software delivery enables you to take an experimental approach of product development which in turn enables better software delivery performance. And lean product management directly impacts those organisational outcomes that we're talking about on the right day. Finally, I want to talk about culture. What we find is that culture impacts IT performance and organisational performance. So culture matters. How do we measure culture? We use a model that was created by a sociologist called Ron Westrom who was investigating safety outcomes in healthcare and aviation. So these are domains where when things go wrong, people die, high tempo, high consequence domains. And he came up with this model to measure culture. And based on where you are on these six different measures, you're either working in a pathological or power-oriented organisation, a bureaucratic or rule-oriented organisation or a generative or performance-oriented organisation or, as I like to call it, a mission-oriented organisation. So there's six different axes here. Firstly, the extent to which people cooperate. Secondly, what do we do with people who bring us bad news? Do we shoot people who bring us bad news? Do we ignore people who bring us bad news? Or do we actually train people to bring us bad news so that we can get it as soon as possible and act on it before it becomes catastrophic? So who's worked on a project which is green for the first 80% of the project and then suddenly, about two weeks before release, it goes red? Anyone had that happen to them? Right. So that's a classic example of a pathological organisation where no one wants to bring the management people bad news because they know that what will happen is they'll either get shot or they'll get ignored and told. Well, we have to meet the day. So just work harder because we know that. That's very effective. So good management actually trains people to bring them bad news so they can act on it five minutes. So they can act on it. So thank you for bringing me that bad news so that I can act on it. So they can act on it rapidly before it becomes catastrophic. How do we deal with responsibilities? Do we avoid them? Are they defined narrowly so that we know who to fire when things go wrong? Or do we share risks because we know that we only succeed or fail as teams? Is bridging between different domains and parts of the organisation encouraged or discouraged? And then two things that are really two sides of the same coin. How do we deal with failure? Does it lead to firing people? Well, Dave was doing the deployment and the deployment failed and caused a catastrophic outage. So let's spy Dave. Well, guess what? Someone else is going to have that job now. And if the same things happen that happened to Dave, that person is going to have exactly the same outcome. So do we change anything by firing Dave? No, in fact, we probably made things worse because now we don't have Dave to learn from and improve the system. In complex systems, failure is inevitable. In complex systems, failure is inevitable. Thus, you have to try and find out, you know, human error is the start of the investigation. Why did that human make the decisions they made? How could we give that person better information? How could we give them better systems that would catch failures before they became catastrophic problems? How do we improve the system? And then the flip side of that is novelty. How do we deal with novelty? In an organisation where failure is punished, no one will try anything new. Nobility will always be crushed. You need to build organisations where failure is a source of learning and that's critical to enable people to actually do new things. So a great story about that comes from Etsy. So Etsy, back in the day, this is John Allsport on the left and Ryn Daniels on the right. They would give out a three-armed sweater at their annual engineering conference to the person who caused the biggest outage. If you go to Etsy and there's a 404, on the 404 page there was a picture of a sweater with three arms and so they gave a three-armed sweater to the person who caused the biggest outage. So this year it was Ryn Daniels and you can actually go, Ryn has written a blog post outlining the outage and what happened and they talk about how when the outage happened they went on slack and started chatting to people. I don't think it was slack, it was a chat room. And they say the immediate response from everyone around was to ask, what help do you need? And they stormed on the problem and fixed it and then they did a post mortem where they found how to improve the system and then Ryn didn't get fired, Ryn got an award at the annual developer conference. We find this in all high performing organisations. Google actually performs disaster recovery testing exercises annually where they actually build these whole scenarios that are executed over 48 hours where they do things like disconnecting mountain view from the rest of the internet or turning off data centres. So these exercises were planned for a long time by Kripa Krishnan. She says, for dirt-style events to be successful, an organisation first needs to accept system and process failures as a means of learning. We designed tests that require engineers from several groups who might not normally work together to interact with each other. That way should a real large scale disaster ever strike these people will already have strong working relationships. So in order to improve culture what you do is again implement the practices. The graph shows at the beginning that the inputs to culture are these practices on the left. So the way to change culture is by implementing the practices and the way to implement the practices is to have leadership that enables people to experiment and try new things. That's all I've got time for. I want to end by encouraging you to please take this year's state of devil survey. You can go to bit.ly-2019-sodr-survey or just take a picture of this QR code and it will take you there. Please take the survey. Please tell your friends to take the survey. That's what enables us to gather the data that enables me to come and tell you about these things and tell you from a scientific standpoint what works and what doesn't. It takes about 25 minutes. Really appreciate your time. Thanks very much. Do we have time for questions? Come to this time. Sorry, but I'll be around all day. That's my talk in circle. Thank you very much. All right. Thanks, Jess.