 So, if you're like me and you like to read the end of the book first, the transcript for this is available online, as are all the slides, that's the URL. Feel free to skip ahead, I will not be offended at all. This is a story about an ops team that learned how to be a dev team. Specifically, it's a story about my team. So I'm Frank. I'm one Frank guy. I'm on Twitter, on GitHub, and in real life. And this is my team. We all work at Rapid7. And when I think about my team and what it is that we do, a pattern starts to emerge. We build tools that let one person manage many products. And manage is kind of an important word there because the products that I'm talking about are things like Cassandra, Elasticsearch, Chef, Jenkins. These are all products that we didn't create. We didn't build them. We're an ops team. We operate products. One of the products that we operate is called Cancaso. And Cancaso is a service for taking your job application and delivering dynamic properties to it. So if you have a Jetty server that's got eight threads and you decide you need to scale it up and give it 16 threads, Cancaso gives you a web interface where you can go in and you can adjust that number that properties then dynamically passed on to your job application and it automatically reconfigures itself. Cancaso has a nice restful API, it's backed by my SQL database. It's a pretty neat little product. But about nine months ago, we ran into a problem, right? Cancaso was falling over. We were running too many Java services and it just couldn't keep up with the load. So we're like, what are we going to do? How are we going to fix this? And we realized we had a decision point. As a team, we could fix the traditional ops way or we could take a new approach. So the traditional ops way is scale up, scale out, right? Add some more Cancaso servers, grow that database, make it a little bit bigger. And not my bias double the capacity, but what happens three months later when we have to double it again? We're six months beyond that. When now it's like we need eight times the capacity, right? They didn't see it to be an end in sight for that. And we realized that it was the architecture, that single point of failure that was the problem. So we needed to level up. We needed to move from a team of single people managing many products to many people creating one product. That product is PropsD. PropsD is a microservice that runs adjacent to a Java application, provides the same interface as Cancaso so it would be backwards compatible. It delivers from storage services like Amazon S3 and HashiCorp's console that are scalable so there's not a single point of failure. But when we started this, we were like, we're an ops team. We don't know anything about development. We don't have to develop together. Where do we begin? All right, let's have a meeting. Let's have a meeting and let's design this product. We've never done products before, but we had done servers. And as an ops team, if I have nine servers and I need to update SSL on all of them, I can split that work across my team. I'll be like, you take three servers, I'll take three servers, she'll take another three servers, we'll get it done. I said, this pattern is great, this pattern is known as divide and conquer. We use it all the time in ops. We know how to do this to servers, let's do this to design. Let's take divide and conquer and apply it to design. So we did. Split the team up. People wrote individual pieces of the design. I wrote the design for the HTTP interface. Somebody else wrote the design for the storage layer. Somebody else wrote the design for the way property files are actually defined as JSON objects in Amazon S3. But when you do that, you take that divide and conquer pattern and you apply it to design. You end up with this anti-pattern. It's called tangled ideas. And what happened was we got into arguments. I was talking about an API as a restful API. Dave was talking about an API as a JavaScript function. We were using the same word, API, to talk about two different things. So we would have these meetings, design reviews. We would think everybody is on the same page. Everybody's got it. We know what we're building. And then we'd leave the meeting. We'd have a little sidebar conversations and we'd realize, we don't have a clue, right? We're not on the same page. We're using different words to talk about different things. We're using the same words to talk about different things. We don't know what we're talking about. We don't know what we're building. So it turns out in design, there is a better pattern. That pattern is the benevolent dictator. Instead of trying to design by committee, have one person do the design. Have a unified design, a cohesive design. We didn't do that. We had tangled ideas. And so eventually we got to the point where we were like, we just can't. We can't talk about this anymore. We're not making any progress. Words are really hard. Let's write code. Right? We all know how to write code. And maybe we can come to some consensus by writing code, by talking in code instead of in prose. So there's this other pattern that we see in arms. It's 3 a.m. and the server broke. And it pages somebody. The person that you want answering that page is somebody who can SSH into that server, need a Java stack trace, figure out that it's the network that is bottlenecked, diagnose that there's a Python script doing a health check that's running and bottlenecking the network, open Vim and make an edit there to fix that Python script, log a Jira ticket to edit the chef cookbook later on and fix it the right way, and then go back to bed. That particular pattern is a lone wolf. That is somebody that is totally capable of operating in isolation with limited information, making decisions and getting things done, solving problems. Everyone on my team is a lone wolf. Everyone on my team does this, excels at this, thrives in this kind of an environment. And when you take a bunch of lone wolves and you say, okay, collaboratively code together, you end up with a different pattern. You end up with an anti-pattern. That anti-pattern is called knowledge silos. And that was how we coded, because we didn't know any better. So there are whole chunks of the props decode base that one person knows really, really well. There are chunks of the props decode base that are written in an object-oriented style. There are chunks that are written in a functional style. Some parts of it have a really great testing. Some parts of it had no testing. Some parts of it are documented where the documentation and the code match up. Some parts of it, the documentation and the code don't match up at all, because that's what happens when you have tangled ideas and lone wolves writing code, end up with a very messy code base. There's a better pattern to coding. If you have that benevolent dictator creating a unified design, you can apply divide and conquer to the active coding. And if you do that, you end up with individual units, work that's broken apart, and when it comes back together, it's all going to work, because the original design was unified. We didn't have that. We have a messy code base, because lone wolves worked with tangled ideas. And even today, the code base is still messy. We're still having conversations about how does it get better? Eventually, we got the code to a working state. All the messy ideas came together and kind of gelled and it worked. Props to you went out into the wild. We replaced concaso. And everything was pretty good. We had a few bug fixes, right? People reported some problems. But we fixed those. We chip a new release. And as far as we were concerned, everything was operating really well. How many people have ever spotted a problem in production because you were tailing a log or you happened to look at a dashboard at exactly the right time? Yeah, I've done that too. We do this all the time in operations. It's called monitoring. Pretty much take it for granted, right? Get that somebody is going to be looking at stuff or that we're going to have systems in place to notify people when things break. If you do monitoring for your release process, which is essentially what we did when we shipped props to you, we said, yeah, there's a releases page on GitHub. If you want to be notified, GitHub has a notification system, opt into that. But monitoring only works if people are listening. Nobody's actively watching that dashboard or somebody's configured pager duty to notify them. If nobody is on call, nobody gets that alert. And so what happened was we wound up with this cone of silence where we were shipping releases but our users didn't know. So we saw this a lot. People would report bugs. We would fix them, ship a release, and then they'd be like, hey, that thing that I reported two weeks ago, how is that gone? Like, yeah, we fixed that two weeks ago. Didn't you get the newest release? No. It turns out that our customers, internal to Rapid 7, the development team that we were supporting, their primary communication medium was email. It was a mailing list. Because we didn't know that because we were operating under the ops world where monitoring was the communications medium, our customers never knew that we had fixed their problems. Our customers never knew that we had given them new features to make their lives easier. So four months ago, props did kind of settle down to the point of, all right, it's pretty stable. We're not getting a lot of new feature requests, not fixing a lot of bugs. We did a retrospective. We said if we want to continue this as a development team, what do we learn, right? How can we get better at this? Let's try again. Product 2.0 is called Tocandy. Tocandy is another microservice that lives alongside Popsity, and it essentially gives Popsity the ability to provide secret configuration. So you can have something like database credentials that are encrypted and have those delivered securely to individual servers. We made a decision with Tocandy, and we said we're going to make a conscious choice about the patterns that we use here. We're not going to fall into autopilot mode. We're not going to fall back on all those operational patterns because they didn't work. We're going to pick new patterns, and we're going to pick them now at the start. Because that's the thing about a pattern. Once you're in it, once you're in the middle of tangled ideas and you're having that argument, you can't get out. You're stuck until that pattern runs its course and goes all the way to the end. Patterns run on autopilot. The point when you can interrupt a pattern is at the very beginning. You can make that conscious choice to say, what pattern do I want to apply here? So Tocandy, we said, John, you're going to be the benevolent dictator. You're going to do the design. One person does all the design. Even I got to be code monkeys. We got to execute on divide and conquer and split up the work. It worked out really, really well because it allowed us to step in and step out as time allowed and collaborate and understand that, yeah, I can work on this in isolation. Dave can work on his piece in isolation. Because John did the design and John knows how it's all supposed to fit together in the end, it does fit together in the end. We also set up an internal mailing list to make sure that people would know when we shipped new software. We're still not quite sure what to do about external notifications. Both of these products are open source and we don't have a good community outreach around them yet. So when we started this, we were an ops team that was learning to be a development team. What we realized is that we were really learning to collaborate and communicate as a team, which ultimately means we were learning to DevOps. Thank you.