 This talk starts with one of the worst days of my professional career. About five years ago, I was sitting at my desk and I was not feeling particularly well. I was a little bit sick and I'd actually decided it was time to surrender, admit defeat, go home and get some much needed rest. When all of a sudden I saw a bunch of campfire messages firing off. It was five years ago and all of a sudden Google chat windows started popping up. A product manager came rushing over and everybody was asking the same question. What's wrong with the site? I'm logged in as Aaron. Aaron was the co-founder of the company that I worked at and we all definitely should not have been logged in as him. We looked and saw that we'd just done a deploy so we quickly rolled back and six minutes later everything was working the way it should again. But we were a pretty high volume site so in the 10 or 12 minutes that that was live, there were hundreds of purchases that were made and were incorrectly applied to Aaron's account. Dozens of people added credit cards to Aaron's account. It was not good. So I walked downstairs. I had a quick conversation with our chief legal counsel. I had a quick conversation with our security team. I threw up in a trash can and then I went back upstairs and got to work on figuring out how to prevent this. So I think it's helpful to start with what actually happened. And the feature that we were trying to roll out was integration with passbook. This was right before the launch of iOS 6 and our Apple rep had strongly implied that if we were ready with passbook support on day one then they would feature us on the homepage of the app store. And from prior experience we knew that this was worth tens of thousands of installs that we wouldn't have to pay and lots of new users. This was a big opportunity but it also was a short turnaround and had come about when people were already working on other things. So implementing this feature fell to a junior iOS developer not six months removed from college. This was actually his first Rails feature. And it's helpful to have some understanding of, you know, a very, very, this is not exactly how it worked but simplified version of what authentication looked like. So in application controller there was a method that would check your off cookie, verify that it was signed properly and you were logged in. The problem that our junior iOS developer had was that he needed an account that had purchases on it so that he could test the passbook integration. And we, in development, we would use a slimmed down version of the production database that filtered out everyone except for employees. And he looked through and saw that Aaron had a lot of purchases so it was a good test account. And he put this line in. Now I'm sure that there are people in the room who are saying, all right, that is just bad code. You shouldn't have done that. It was dangerous. And I would actually agree. Even in development, this has the risk of going to production so you should never write code like this. But it's understandable how it would happen. And it had solved the problem that he had immediately of being able to log in as somebody and have sufficient test data pretty easily. I also want to say the team wasn't lazy. They weren't reckless. They weren't indifferent to quality. So you might think, well, we have tests. This would never happen to us. Yeah, so did we. We actually had good unit test coverage around the authentication system. And if you look at what was added in there, it was a find by email which returns nil and it was an or. So unless the account and account with that email address existed in the test database, it would just fall through and work like it always did. And that email address was not in our fixed data. So it just fell through and worked like it always did. We also had continuous integration. The test suite was fully run on a machine that was not the developers. It actually gave us a false sense of security in this case. All right, well, if you reviewed the code, if you had people doing code reviews, you would have noticed that that was there. We did. We did have somebody do the code review. We had an extremely talented, a very conscientious developer actually look over the code and do a code review. But it ended up that this was the person's first code. It was on a tight deadline. And it ended up being a thousand line diff that was spread against 14 work in progress commits and that one line in a thousand line diff was missed. It's going to happen. People make mistakes. All right, well, if you actually ran it, then you definitely would have seen that that was there. And if you did any sort of manual testing whatsoever, then you would have caught that. Well, the developer who reviewed it actually did run it. But again, Aaron was actually a good test account. He had lots of test data. And so in a less dangerous way, that developer would often use Aaron when testing things. So he saw Aaron's name as he was testing and didn't think anything of it. So after we've gone and we've done all of the incident communication, after we'd spelunked through our logs and we're actually able to figure out where all of those hundreds of purchases were supposed to go and cleaned up all the data, I went home much later that night and was really struggling with how we could prevent this. And I remembered a New Yorker article I'd read a year or two before by this guy. His name is Atul Gawande and he's a surgeon and he had written about how surgeons handle complexity. And then he took that article and he expanded it into an entire book. So I bought that book and read it cover to cover that night. It's only 150 pages. It's not that long. And a lot of what I want to talk about today are the things that I learned from that. So let's start with another field that had to deal with increasing complexity and the limits of human memory and valability, aviation. This is the B-10. In 1935, this was the state of the art in the American military arsenal. It's the first all-metal single-wing plane that had ever been produced to completely revolutionize the design of large aircraft. But this is the early days of aviation and things were developing quickly. So not a year after this plane was introduced, the Army Air Corps announced a competition for its successor. They wanted it to be able to have a longer range, to be faster, and to be able to carry a larger load. This was the hands-down favorite. It's the Boeing Model 299. It's the first plane that ever had four engines. It was the largest land plane ever produced in the United States. The Model 299 was head and shoulders above the other competitors for this contract. It had twice the range. It could carry twice the load. It flew 30% faster. The Army was so excited that after the first test flight, they already entered into discussions with Boeing to purchase 65 before the competition had even been completed. So in that light, in October of 1935, the two very senior test pilots, an Army major and a senior engineer from Boeing, got into the plane at Wright Airfield in Dayton, went down, the runway took off, flew up 300 feet in the air, turned sharply to the right, and crashed. Both test pilots were killed. The Boeing could not complete the evaluation and therefore, legally could not win the contract. And upon investigation, what was realized was that the cause of the crash was that the pilots had failed to disengage the gust locks. Gust locks are what keep the flaps from moving around when you're sitting on the runway so that they don't get damaged. And to release them, all you need to do is flip one switch. It's a very simple task. Now, this wasn't dropped because of a lack of expertise. Again, we had two of the most experienced test pilots in the world flying this plane. It wasn't dropped because of carelessness. If there's ever a time that you're going to be dialed in on the things that you need to do, it's when you're about to take off in the largest experimental aircraft that has ever been produced and your life is literally at risk. It was a simple step, but it was one of dozens of simple steps. This was the most complex plane that ever been produced. This is the A3. So just five years before the Model 299's fateful crash, this was the most advanced plane in the American arsenal. And there's a lot that's going on up here, but I can see how an expert who is trained could look at this and keep this in their head and understand it. In contrast, this is the cockpit of the Model 299. What we're dealing here with here is not a difference of degree, it's a difference of kind. The level of complexity involved in flying this plane is fundamentally different than the planes that came before it. And after the crash, there was a real concern that this plane was simply too difficult for people to fly. The Army was still intrigued by the capabilities of the plane by its range, by its capacity, by its speed, and they figured out a way through a contracting loophole to place an order of 419, which gave Boeing time to figure out how to successfully fly this. One thing they could have tried to do would be to reduce the complexity, make it easier to fly. But given the state of technology at the time, all the controls were necessary. This was necessary complexity, not accidental complexity. They couldn't remove it. Instead, what Boeing did and the test pilots did was they figured out that they were running into the limits of human cognition. And they produced a checklist of all of the steps that needed to be done before common operations. So before you start the plane, these are the things that you need to do. When you're starting the engines, this is what you need to do. Before you take off, this is what you need to do. When you're getting ready to land, this is what you need to do. And armed with this checklist, this plane that was too complicated for two of the most expert pilots in the world to fly became manageable, which is a good thing. When World War II broke out, the B-17s long range ended up proving essential to the Allied campaign in Europe. Over 13,000 B-17s were produced and they dropped 40% of the bombs the U.S. dropped in World War II. It's not a stretch to say that the B-17 and the capability to safely fly it was instrumental in defeating Hitler. And since the B-17, checklists have been a key part of aviation security culture. When a U.S. airways flight took off from JFK in New York and flew into a flock of Canadian geese destroying both their engines and managed to safely make an emergency landing in the Hudson River with losing no passengers, they did that armed with checklists on what to do when you lose engines, how to safely ditch fuel, how to safely make a water landing and evacuate passengers. All right. So a checklist, huge improvement in aviation security or aviation safety. What about another high stakes field? Medicine. Our friend Dr. Gawande is a doctor, so let's talk about medicine. This is a central line. A central line is a tube that gets inserted into a large vein, often around your aorta, so that doctors can administer medicine and fluids directly into the bloodstream. It's also an extremely common procedure. In U.S. ICU's alone, patients spend over 15 million days a year with central lines inserted. But it's also a leading cause of blood infections which are incredibly serious. So thousands of people a year die from blood infections and it causes billions of dollars of additional cost. And those infections are preventable. In 2001, there was a doctor at Johns Hopkins in the ICU who decided to try to solve this problem. So specifically he wanted to improve the level, well generally he wanted to improve the level of care in the ICU. And specifically he wanted to reduce the rate of central line infections. And like I said, these are preventable and we know how to prevent them. So he created a simple checklist with just five things. Every time a central line is being inserted, doctors will first wash their hands with soap. They will clean the patient's skin with an antiseptic. They will put sterile drapes over the entire patient. They will wear a mask, a hat, gown and gloves. And they will put sterile dressing over the insertion site once it's in. Like I said, these are pretty simple things. And you would think that in a hospital like Johns Hopkins that is one of the best in the world, you'd be confident that they were done. But before rolling out the checklist and making sure that these things were done, he first asked nurses in the ICU to spend a month observing when central lines were being imported or being inserted and reporting on the results. And what they found was that at Hopkins, one of the best hospitals in the world, in the ICU where the most critical patients are being cared for, in over a third of patients, one of these steps was skipped. So he got together with the hospital administration and together they empowered nurses to stop a doctor if they saw that they were skipping one of these steps. They also asked nurses to every day check and with doctors if there were any central lines that could now be removed. And in the year before this checklist was introduced, the 10-day line infection rate hovered around 11%. In the year after the checklist was introduced, there were zero infections. The rate was zero percent. The results were so good that they didn't entirely believe them. They ended up monitoring for another 15 months. And in that entire time, there were only two infections. On an annual basis in this one ICU, they calculated that they prevented 43 infections and eight deaths that represented $2 million of cost. All right. So we've seen two different fields where there's pretty massive impact from introducing checklists. But I'm willing to wager that there's a fair amount of people in the room who are skeptical that this would translate to software development. And I think one objection which I somewhat agree with is that the two examples I've given so far are largely around making sure that repetitive rote tasks are completed. And we have a solution for that. We automate things. If when we're deploying, we need to restart our rescue workers, we make that part of the deploy script so that we're not relying on somebody to remember to do that every time. But checklists can help with more complex problems than just making sure that simple things are done. So I want to talk about one more example from medicine, which is surgery. Now, I have a healthy pride and fear of the complexity that we deal with in building things that work on the internet. If you think of all the systems that need to work, all the machines, all the networks, all the software programs, to be able to do something relatively simple, like place an order from any commerce site, it's sort of a miracle that anything ever works. But given that, I still have to acknowledge that there's nothing that we do that is anywhere near as complex as cutting open a living, breathing human being, going inside of them and fixing something. Surgery makes what we do look trivial, and it's incredibly varied. There are thousands of commonly performed surgical procedures, and every patient is different. Every team, surgical team is different. Tiny errors can have a massive impact. A scalpel half a centimeter to the left and antibiotic administered five minutes too early or five minutes too late. One of hundreds of surgical sponges missed and left in the body cavity can lead to literal life and death consequences. So in 2006, the World Health Organization came to Dr. Goande and asked him for help. They had found that the rate of surgery had skyrocketed across the world. There were over 230 million major surgical operations performed in 2003, but the rate of safety hadn't increased along with them. Now we don't have perfect statistics, but the best estimates say that somewhere around 17% of those 230 million surgical operations had some sort of major complication. And Dr. Goande was tasked with leading up a working group to generate recommendations for what is an intervention that we could do that would improve the standard of surgical care globally. That is an incredibly difficult task. Again, thousands of different procedures being performed. The conditions that they're performed in are wildly different. And what they eventually settled on was coming up with a general surgery checklist. Here's the checklist that they produced. And I'm in all of this document. It's simple. It's just 19 steps. It fits on one piece of paper. It takes about two minutes to run through. But it had the potential to improve safety in all of those 230 million annual surgeries. And the first thing it does is that it creates three pause points where key actions will be checked and important conversations will be prompted. So before anesthesia is administered, before the first incision and before the patient leaves the operating room, the surgical team will come together and make sure that they've taken care of the simple stuff and they've had a conversation about the things that they need to talk about. If we look at it again, so what this does is the structure gives highly competent professionals the space to do their job. But then it also pulls them back together to make sure that the simple things don't get missed and the conversations that need to happen need to happen and that they've thought through how they're going to deal with likely communication or likely complications. So the simple stuff. There's one of the steps here is to make sure that the antibiotic had been administered not more than 60 minutes before the first cut is made, but it's in the bloodstream beforehand. That's been shown to have a huge impact on potential infection rates. There's communication that is working on improving. A simple thing is that before the first incision is made, it makes sure that the entire surgical team has introduced themselves and they know who is working together and what their roles are. That's not something that would always happen before. And then it helps with planning. One of the steps is that the surgeon reviews. What are the risks in this surgery? What are the possible complications that we can anticipate? And by talking through those things in advance, if any of them come up, the team is going to be more likely to be able to respond to them effectively. So like I said, this is a lot to be done on one page in 19 simple steps. And the scope of the problem is massive. So what would it work? So after doing a few trial runs in a single operating room to iron out the any issues, the WHO did a pilot program in eight hospitals around the world. They were varied. So there were hospitals in the U.S. and Canada and the U.K. and hospitals in remote Tanzania, in the Philippines, in Jordan. And before introducing the checklist, they sent observers to monitor the standard of care so that they could have better statistics and be able to measure what eventual impact introducing the checklist would have. So they spent three months observing over 4,000 operations across across these eight hospitals. And in those 4,000 operations, 400 people developed serious complications and 56 died. They then introduced the checklist and monitored for another three months after that. And in those three months, the rate of major complications was reduced by 36 percent and the rate of deaths was reduced by 46 percent. All from a single page, 19 steps can be done in about two minutes. How can that work? So checklists work because what they do is that they make sure that the simple but critical things aren't missed. And then they also make sure that the right conversations are happening while also empowering experts to make decisions. It's not about reducing the job of the surgeon to ticking things off on a list. It's making sure that the right people are talking and planning so that they can respond when things inevitably change. What makes a good checklist? So the first thing is you want to know your goal. So is it a task communication or a task list, a communication list, a combination of both? The aviation checklist that we looked at were primarily task checklists. Make sure that these things happen. The surgery checklist was a bit of a hybrid. There were some tasks in there. Make sure that the antibiotic is administered at the right time but then there also was a communication aspect to it. Note the structure that you want to do. There's two main forms that you can do. So if you're watching, if you're thinking of rocket about to take off, people are calling out controls, guidance check, etc. That's reading and then doing an action. That's a read-do checklist. Or you can do have a do confirm checklist, which is what the surgery checklist is. Let people go do their jobs but then make sure that that actually happened before it becomes too late. Specify who is going to do each step. So traditionally in an operating room, the doctor is God. Surgeons have a well-known God complex but the doctors' hands are busy. They are surgically scrubbed in. So the responsibility for the checklist ended up being given to the circulating nurse. That's the nurse who isn't scrubbed in, which made sure that there was someone who their primary task was making sure that these steps were happening. Specify when to do each step. So think of the pause points where there is a natural opportunity to be able to validate that these critical things have happened and that the conversations have happened. Don't try to be comprehensive. There are a hell of a lot more than 19 things that happened during a surgery and if you tried to spell them all out, it's going to be too arduous to do. It's going to take too long. People aren't going to use it. It's not going to have any value. And iterate. You're not going to get it right the first time. You're going to need to be able to adjust to your specific circumstances but take the time to get that right. All right. So let's go back to a couple years ago after I've had this epiphany and seen these examples of how successful this could be in other fields. How can checklists apply to software development? I was thinking and I wanted to think through what are those pause points where we have the opportunity to institute something like this. And I thought that there were three natural ones. Before submitting a pull request, when you're reviewing a pull request, and before deploying. So before submitting a pull request, this is an individual checklist. This is working with yourself. I asked that everyone on the team, each submitter should ask themselves the following question. So I actually looked at every line of the diff. Am I sure that everything that is here is intended to be here? Is there anything in this patch that's not related to the overall change? So am I conflating or refactoring with a feature change? Should I separate those out into two different units of work? Have I actually structured the commits to make the reviewers job easy? Somebody's going to read this. What you're doing is you're communicating with somebody else. So again, 14 work in progress commits makes it a little hard to figure that out. I've actually ran the code locally. You'd be shocked. Sometimes I've done it myself. You're making a really tiny change and it's so obvious that it's going to work. You haven't actually gone through and run it. Before you submit a pull request, you owe it to the people that you're asking to review to have actually run and tested it yourself. If you have a formal QA team, is this something that merits formal QA and somebody else taking a more thorough look at it? And the pull request that you're submitting, does it explain what you're trying to accomplish and how to verify that the feature is working? Then when a reviewer picks up that pull request, there's some questions that they should be asking themselves. Do I understand the goal of this change? If you don't understand what the pull request is trying to do, you have no chance of being able to effectively review it. So that's the first thing you should be asking. Have I looked at every line of the diff between the branch and master? Again, one line and a thousand line diff that caused an incredible amount of pain. If you're reviewing something and signing off on it, you probably want to make sure that you've actually looked at the entire thing. Have I used the code locally? So have you made sure that it actually works and runs? Do I think this merits additional QA, whether it has or has not already? And are there sufficient tests? And then how will we know if this change works? How do we know whether it accomplishes the goal that it's set out for? And then after we've had sort of the individual steps of the developer submitting the pull request and the developer reviewing the pull request, they actually come together and we now have a team working on something which is it's time to deploy. So I asked that before deploying the submitter and the reviewer would have a quick conversation and ask what are we worried about going wrong? Like is there anything here that are there performance concerns that we're worried about? What are we going to be looking for? Is there anything different about this change in production versus in dev or in staging? For instance, are you using a third-party service for the first time that needs different production credentials? It's now the right time to deploy the change, like in the day. Is it the right time on the clock? If you're deploying something that's going to mess with your company's major purchase flow, you might not want to do that at the hour that you have the most purchases. Similarly, you don't want to deploy something when nobody's around if things go wrong, so you don't want to push everything off hours. Is it possible or desirable to roll this out to a subset of users? Can you push it out behind a feature flag? Can you roll it out as an A-B test? Is there another way that you should actually be pushing this out? And then what specific steps will each of us take once it's deployed so that we can verify that it's working? You deploy, you see that things go into production. How do you know that the change is working the way that you intended to? And finally, if something goes wrong, what will we do? Is this a change that's safe to just immediately roll back, or is this one of the changes where you've changed database schemas and you can't just roll back to a previous version of the code so you have to be extra careful? Now, I did not do a controlled study. I didn't roll this out with valid randomized control. We didn't even monitor the exact number of issues that happened before and after. What I can tell you is we absolutely caught things that we wouldn't have by doing this. We found times where we were about to push out a new third-party service, and we didn't have the production credentials in the credential management system already, and that would have caused us a problem. We also had issues that did get deployed to production, but that we were able to respond to and recover from much more quickly because we talked about those risks before deploying them. I'm obviously a convert. I am confident that Checklist can be a great help to being able to deploy higher quality software faster and with much less stress. But I want to leave you with one more thing which is that one of the things I love about Checklist and this very lightweight idea of process, like there's no sign-off, you're not actually filling out a checklist, there's no paperwork that's being created, is that anyone in this room can begin to introduce this practice into their own development without having to ask for permission. If you're a developer working completely by yourself, you can still think, what are the things that I need to make sure that I think through every time before I push something out, and start to get into the discipline of doing that. If you're an individual developer working within a larger team, you can do that, but then you can also start to ask some of those questions of your teammates as you're getting ready to deploy things or at certain other critical points and by modeling the behavior make that become part of your culture. And if you lead a team, you can obviously introduce your team to some of these examples, some of the research around this and help introduce that practice. So thank you very much for all of your time. I am Patrick, my personal blog, which I very occasionally write at, is at Pragmatist. I am the director of engineering at Stitchfix, where we are hiring, so please come talk to any of the friendly Stitchfix people who are here for the limited time that we have. I'm on Twitter at KeeperPat, and our team writes interesting stuff at multithreaded.stitchfix.com. Thank you very much.