 So, my name is Paul McMurray, my goal this morning is to put you to sleep. So, I've been traveling the world this past 10 days. Over the past 10 days, I've caught red eyes from starting in Austin, one of the most awesome places in the world, in my opinion. Thank you very much. We got at least one fan. Red Eye to London. I caught a red eye to South Africa, which is an amazing place to speak at Africa DevOps days, where the largest banks in Africa were banding together to learn how they could combine software development with infrastructure engineering with all this great open source technology that we're talking about today. I caught another red eye back from Africa to London and then here to Berlin to have an opportunity to share my experiences and my learnings about the effects of sleep deprivation on my organizations with all of you and also to learn from each and every one of the experts that are here today. Now, if you pick up something in the statement that I made right now, there's almost a pride of staying up late at night. There's a pride of saying, hey, I caught this red eye. I stayed up till 3am to fix this bug. So, I'll ask, how many of the people in this room in the past year have pulled an all-nighter, let's say, past 10 o'clock at night to get code out? What about the last six months? Okay, what about three months? What about the last week? Who has been up in the last week? Well, what I'm here to tell you today is this behavior, this heroic behavior that is encouraged through many of our teams, individuals, and organizations could actually be destroying individuals, teams, their emotions, their health, and their productivity. So, there's an epidemic going on today. There's an epidemic of depression. There's an epidemic of feeling of disconnection from their communities, from their workplaces, from their families. This epidemic is real. In the DevOps culture, we've seen this weird mashing of infrastructure engineering, systems, storage, networking. In this side of the world, it has been a normal expectation to stay up at 2am on a Sunday night to do the work on the foundational elements that support our applications to do the work when no one's using it. Now, what's interesting as modern software development techniques with these advances in open-source software, as these have combined and been applied with Agile 2 infrastructure to create DevOps, well, while we thought and we created these unique ideas of applying Agile to infrastructure, what we didn't do is leave those habits of that midnight cut over behind. We brought forward this habit and applied it into our day-to-day habits of software development. Now, this is good in some cases. You could say you're going to get people working more. But on the flip side, this has actually caused suicides in our industry. There's an individual, very prominent in our DevOps community, Carla Flores. Last year, he experiences these challenges from overwork that created depression, that created a disconnection from himself, his community, and his friends. And tragically, his life was lost. So what I want to do right now is tell you a story about a challenge that happened in my own organization. Now, luckily, no lives were lost. But it did, however, affect our interpersonal communications. It did, however, affect our ability to deliver code on time. And it did, however, affect our ability to contribute to key open-source projects. So I'm going to tell you a little bit about the team. So this team wasn't a junior team. This team is really, really awesome. Some of the finest individuals that I've had a chance to work with. They specialized in SDN abstraction layers. Basically, when you do network engineering now, that everything's defined by software, you basically have to write middleware. If you use Ruby, does anyone use Ruby in this room? Okay. When you connect to OpenStack to Ruby, you actually connected through libraries that this organization wholly and solely contributed to the foundation. If you've taken the Red Hat's OpenStack courses, we actually co-founded OpenStack training where the certified OpenStack administrator came from. We contributed to key projects, OpenStack, Open Daylight, Open V-Switch, Puppet, Ansible, LibCloud. This is a high-performing team. This is not a junior team, yet we've ran into these problems. So in this specific case that I'll talk about, we were writing some middleware, connecting some SDN controllers, like if you look at Open Daylight, NSX, Alcatel Lucent, connecting them to different forwarding planes in different both OpenCloud and closed enterprise virtualization platforms. We had released a version of our software to our partners where it had worked and worked very, very well. It worked so well, in fact, that we were asked to go ahead and expand its capabilities, to expand its features and functions. But like every partnership, you never get the same amount of time you had before. No, we were asked to accelerate our development and bring in a deadline. We were asked to add many more features in the same exact amount of time that we had originally committed to. Now, has anyone faced this challenge where you were completely at 100% and then you were asked to do 120%? Exactly. It's very, very common, especially with individuals and teams that can do so much. Individuals that are sitting in this audience today. Now, what did we do? The first thing I did is I actually said no. Oddly enough, in business, no doesn't stand up on itself. You have bosses. So what did we do? We bulked up the organization. We brought in more contract developers. I personally don't believe in offshoring as a good strategic goal to a team. I believe that the communication between individuals is key to our ability to execute. So we bulked up. We brought in more developers. We brought in more QA resources, more test-driven development resources, and we worked on spinning them up. We kind of hit that race and got ready to drink from that fire hose. Now, for a time, it worked. It worked so well. I was really impressed with all the individuals and the team, with the quality of the code, with our test coverage, with our ability to actually meet the goals. Now, one of the challenges, however, and when you're working in this next generation of networking systems integration, is that when you think about building a middleware or integrating a system, is that you're really dependent on many organizations. In this case, we were dependent on the software development organization on the controller side. We were dependent on improvements and updates within their APIs. We were dependent on the forwarding planes teams outside of our organization to update their APIs. We're working as a community to move this forward. Now, what was interesting, and the challenge in this, is it doesn't take much to cause basically a series of events that step on top of each other. And this happened to us. We started to see challenges in our test coverage. We started to see with this acceleration and development our test harnesses covering less and less of the features and functions, which for me is worrying. And what happened was things went horribly, horribly wrong. The worst thing that could possibly happen happened to us. A peer partner organization reported a critical bug to us that we could not reproduce in our test harnesses. Now, this was very, very frustrating. This was frustrating to the individuals who had given so much to be able to make the software work. This was frustrating to me as a leader who was trying to communicate to many different organizations what the heck was going on. My partners in this were very, very unhappy. They had their own deadlines, commitments to their executive teams, commitment to their largest customers to see this come out. And what was starting to happen, and what we started to notice, was interpersonal communications dropping, anger happening, frustration happening, finger pointing happening, all the things that should not occur in a healthy, thriving team or community, things that I don't want to see. So what I did is sat down, and it's kind of bubbled into a, kind of bubbled ahead one night. I was in Florida. I was taking a leadership course on using increased emotional intelligence and listening skills to create high performance organizations. And I remember I was supposed to go to dinner and I'm sitting in the back seat of a car on the phone with the senior executive of that partner organization, getting my butt chewed about why we couldn't fix this bug, getting just nailed to the floor. And luckily I decided that, you know what, screw it, not happy anyways, I'm going to take a stand. And I decided to take action. It wasn't the action they wanted. The request was to work 24 hours a day, continue working 24 hours a day to fix this bug. And we had been working 18 hours and seven days a week all through the holidays. So I put the line in the sand and put the team first. Something I hope each and every one of you would do in this exact position. Put a 72-hour stand down. So told everyone to go home, close their laptops, you are not allowed to log on. If I see anyone checking into our source code repositories, you will be put on leave. People wanted to work very hard. The reason they were working this hard was not because of fear, it was because they wanted to see this project succeed. And this happens in open-source communities. I've seen this happen in open daylight. I've seen this happen in open stack. We push ourselves so hard because we care. The next thing that I did is I negotiated with our partnering organizations that between 11 p.m. and 7 a.m. that this organization would not be checking in code, that our developers were off limits. Enforcing through contracts that people needed to get their sleep. Now, when we had everyone go home and get some sleep, well, we decided when we came back Monday we were going to try to address this with a fresh mind because something wasn't happening, we were not thinking out of the box, something that happens when you have exhaustion. And as Edward Deming says, it was kind of a God in the DevOps movement. You know, in God we trust. In all others, bring your data. So we started looking at our systems. First we looked at the get check-ins. What is going on? Looked at Launchpad, looked at Jenkins, looked at all the data, looked at Slack. Now, for me as a leader in this organization, I was trying to figure out what the heck is going on. You could tell, I could tell the core cause was my best and brightest people were working so hard that their ability to think out of that box was being stopped. Finally, I found it in the emails. So I had been CC'd on a couple emails in the middle of the night, and I noticed a trend. I noticed a trend that key members of the organization had been engaging at 3 and 4 a.m. in the morning. Now, normally I'm used to these key members kind of wanting to work late at night and having to combat that, be like, hey, go to bed, right? But what happened was one of the peer organizations that we'd been integrating with, when they had expanded their development to hit their goals, they had not done it in the same time zone where we'd been working before. They had chosen to off-short. They had chosen to work with their Indian teams. Now, the results of this was a couple members of our DevOps organization basically working all hours of the night to be able to communicate with these teams because they wanted to do a great job. Now, if you look at this in the context of how do you make an organization fly? So in this organization, we had a DevOps organization, so DevOps and test-driven development bundled together, as well as SDN development. Now, they worked in the same buildings. They worked in the same cities. They communicated in the same time zones. But the effect that had happened with key members of our DevOps team working later and later at night to do the integration with the partnering teams is that the opportunity for communication between our DevOps team and our Dev team had shrunk down. Now, if any of you have actually done the math and put a value stream map together for your organizations, any time that you limit the communication between work centers, you limit your ability for optimal flow. You limit your ability to make code. So let's talk a little bit about the results of our little experiments of going through the data. So first things first, we went ahead and did the stand-down. So 72 hours later, needless to say, if you send people to sleep, they are happier. You send them with their families. You allow them to reconnect with their kids, provide them and encourage them with that emotional connection to their support system. Second thing, so four days in, so this is about 24 hours after the stand-down, we had an idea. We were able to get the team together with REST, with a clean perspective, and we're able to build a, basically pull in a reference VTEP, virtual tunnel endpoint, open source VTEP, do an integration with the controller. And this is something where the team was not able physically to think about it. Exhaustion breaks your brain. Right? We had this idea to expand our test harnesses, to pull in a reference controller. Now three days after that, we completed the development work on this, on the expansion of the test harnesses, and we found something that was extremely pleasing to me and to the members of the team who were feeling so defeated for six weeks, they could not get this working. We found the bug was actually an appear controller. Oh yeah, was not our fault, which is one of the best things you want to hear when something doesn't work for a long time, right? Yeah. I was extremely thrilled myself. There might have been some air turning blue and like, hell yeah. So 14 days later, the team had normalized, people were working at their normal working rate, had gotten back to the normal levels of communication, people were smiling, people were high fiving, drinking probably a little too much coffee, but that comes around the territory. So let me share some of the lessons that I learned out of this. And the reason I do this is that it's really easy inside at these conferences to come up on stage and talk about the awesome things you did. When you see this all the time, here's this great project, here's this great technology, here's how it's going to change the world. But if you look at internally, you know, as part of our agile practices, we're supposed to have build fail cakes in retrospectives. We're supposed to learn from our failures, right? And as a community, we have the opportunity to share our failures. So I'm going to be a little vulnerable. I'm being a little vulnerable with this and sharing my failures with you and the community in the hope that, well, y'all won't make the mistakes that me and my organization made. So what was the first lesson? One, that even if it's not your bug, mistakes cost money. This costs $300,000 out of my development budget. Second lesson is that it only takes a few individuals, especially DevOps skillsets. DevOps skillsets are one of the most constrained skillsets that we have in this industry today. An expert in infrastructure engineering and an expert in software development is very, very rare. And there isn't really a defined process to create them. It's almost witchcraft. Next lesson that we learn is that, and this was an important lesson. I had taken a communications course a while back about interpersonal communications. And one of the interesting lessons and findings in this is that as humans, there's a reason we congregate at the water cooler. We are herd animals. And that for every one hard conversation that we have, for us to feel that we have a positive connection, a good relationship with our peers, we actually have to have seven conversations that are normal. They're, hey, good morning. How are you doing? How are your kids? How's the weather? What's your favorite sports team doing? These are really, really important to have. If we don't have these, if maybe we're working from home and only communicating with our teams via about bugs and faults, then we have a bad... We start to form a negative feeling about how people think about us. Now, if you actually go through Carla Flores' tweet streams right before he died last year, you can see him going through this exact same process. This is something that we can fix. Next lesson that I had was scaling an organization and balancing it out is extremely important. And that if you cannot properly scale up the supporting sides of your organization, say, the DevOps organization or the test-driven development experts, as well as your developers, that you may get yourself into a bind. This is something that is a big challenge right now. Our DevOps skill sets are incredibly depleted in this industry. It's a small pool that everyone's grabbing from. And yes, it drives salaries up, and that's great for the individual. But the flip side is those individuals are incredibly overworked. The next lesson that I learned from this is that data is great. The data was there in the systems. The data was there, but what I didn't have was both a process for checking on this, like one-on-ones. It wasn't incorporated then in my daily processes with my managers to say, how many of your employees are working outside of work hours, but also that didn't have a dashboard so that no matter what, this was in my view in the leadership team, and I can present this to the executive team. So the challenge that I want to give to you today, and hopefully you can bring back to your companies, your peers in the community, and your friends, even your children, right, is that I want you to join me in challenging this culture of the midnight developer, changing the culture of the midnight cutover for your DevOps engineering and your systems engineering. I want you to help me in addressing the causes of burnout in this industry. So hopefully we don't have another Carlo Flores. We don't have the Japanese culture of working till death. Now, if we can join this, not only can we provide benefits for the individuals that are in our industry, but we can also accelerate the projects and the businesses that we work at. It's not only the best thing to do for us as people, but it actually makes us release code faster. It's greater for our businesses and communities. So my name's Colin McNamara, and I hope that you'll join me in closing your laptop tonight and good night's rest. Thank you.