 A lot of material in the next 40 minutes or so, and I've actually posted notes and some other materials on my website that you can check out. It's got some overflow material that I couldn't fit into this talk, so it's pretty good. If you want to be internet friends with me, you can find me on Twitter. I'm also hanging out on Mastodon more because I'm kind of a fan of decentralizing the web a little bit more, and probably my website is at the bottom there. I work right here in LA at a company called Carbon 5. If you've never heard of us, we're a digital consultancy, and we help companies small and large make digital products. We've got offices in LA, San Francisco, Seattle, Chattanooga in New York, so if your company wants to work with us or if you would like to work with us as an employee, please come talk to me. All right, let's get started. It feels really exciting to work at a company that's growing. There's a certain energy things have when there are tons of new hires every week. It feels like there's always some new milestone that's been hit, but between the celebrations, you're starting to notice that things feel a little bit different when you're growing. You start having days or even weeks where it felt like your team didn't get anything done. You have resources that you only dreamed of before, but you're underperforming your past scrappy yourself. If there's something that makes it hard to get things done in a big team, then it stands to reason that there must be something that makes it easier to do things in a small team. I'd like to explore some of the characteristics of working on a smaller team to see if we can learn some things from that and maybe carry it over to working in a big team. When you're in a small team, it feels like you get more done, and maybe you're not doing more work in the small team, but you get more done when you're in a small team. You are personally responsible for shipping major features of your product. And on a small team, there aren't as many people writing code, so you have less of it. You're probably also familiar with a bigger chunk of that code base. And when you have a small code base on a small team, everyone has a better shared understanding of how it works, and that makes for quick and productive technical discussions. Everybody on the team is a jack of all trades. Your team isn't big enough yet to have specialists. There's no marketing department. There's not even a marketing person, probably. But everyone picks up the slack, and they help out with the marketing work. You don't have a PR person, but when you ship something new and exciting, you'll excitedly share it with the tech blogs. If you're browsing Reddit late one night, as you do, and somebody's complaining about your company because of a bug they ran into, you're going to message that person yourself to find out more about the bug. And then the next day, you'll come in and you'll fix that bug and probably push it up to production the same day. It feels really good to be empowered to just see a problem with something and take care of it. In a small team, you don't have meetings. You have conversations instead. And your code's architecture is simple. Your deploy procedures get pushed Heroku master. And if something is wrong, it's just as easy to roll it back. Or maybe you'll just figure out the fix quick and push it out. Deployees are cheap and they're risk-free, so you do them all the time. When someone writes a line of code, it's not very long before it's actually in production and a customer is getting used out of that code. And that lets you learn very quickly from what you shipped. That tight feedback loop is very healthy. Siri thinks that I'm talking to her, sorry. All right, no one's writing documentation and that's okay. Your team is your documentation. If you need to figure out something that you're not sure of, you can just ask someone quick. Writing docs isn't worth the trouble at this point, even if you wanted to. When you've got a generic problem that needs solving, you're going to pick an off-the-shelf solution and run with it. It's kind of crazy to think that things like CI and even source control used to be these luxuries that only the biggest companies could afford. And today they're commodities. And so with all these great tools, you don't write much generic code. Every line of code someone writes on your team is specific to your company's problem domain. And so it's bringing a lot of business value. There's no dedicated team providing customer support, so everyone pitches in. You know what your customers are struggling with. And then you're incentivized to fix it so you don't get interrupted anymore by customers calling you. And your customers notice how fast their issues get fixed, and it makes them feel cared for. It's a really virtuous cycle. You can do a lot as a small team. Being a small team gives you all these great structural advantages. But if you want to stay working as a small team, you have to really be smart about what you're committing to doing. And that can be really difficult. It's difficult to know that you can do something better and just say no to it anyway because you know that you need to focus on your core features. Small teams have less overhead per person, and so your per capita productivity is at its peak. Every extra hire that you make is one extra person that you have to keep in the loop on things. And with each extra person on the team, you're making it that much more difficult to maintain those small team dynamics. And that's why I recommend putting off those extra hires as much as you can. I'm a natural procrastinator like that too. Instead of trying to grow your team, invest in your team's productivity first. And when you invest in making engineers, you're more productive, it pays really great dividends because engineers get to spend more time creating value for customers. So look for those inefficiencies or those opportunities to make things work a little bit better, even if it's just a little bit, because every minute that you save adds up. So get more CI servers so that you're not waiting as long on builds. Pay for that faster internet. If there's an engineer that's been bugging you to get a faster computer, buy it for them because it only needs to save a few minutes a day to pay for itself. And these are literally going to be the easiest kinds of problems you're ever going to solve because you can just throw money at them and they'll go away. So take care of those right off the bat. And don't try to compensate for being a small team by working longer hours. I know that occasionally you're gonna need to hit a deadline and you'll have a late night, but if that becomes a habit, it's gonna hurt your productivity. I have looked far and wide and I've never found a study on knowledge worker productivity that says there are any long-term gains to be had from working more than 40 hours a week in the long run. And even if you could get more done by working long hours, that's not a good way to live and work. It's not a good way to treat your team. So I refuse to buy into this culture of workaholism and I don't wanna be involved in making the problem worse by creating a work environment where it's accepted. Eventually, if you keep these practices up, you're going to have picked all your low-hanging fruit and you're gonna reach a point where the demands on your team exceed what your current team can handle. Now it's time to scale the team. And scaling a team starts with a recruiting process and it sucks. The typical recruiting process is severely lacking and if you've ever got an email from a recruiter, you know exactly what I'm talking about. I picked these from my inbox. The recruiting process is fundamentally broken. I'm not gonna even explain why. And it's just a terrible experience for engineering candidates. But there's an upside to it, which is that if you can recruit better, then you can really differentiate yourself from other companies. And I say that if you wanna be good at exactly one thing in your job as a leader of engineers, let it be building a great team because you can be completely mediocre at everything else in your job. But if you've got an eye for talented people and you can convince those people to come work with you, you're gonna do all right. So don't delegate the task of contacting potential candidates to a recruiter because this is your team to build. If you wanna do it well, you need to be hands-on with it because a recruiter can't sell the opportunity in the way that you can. And that doesn't mean that recruiters aren't valuable or that you should fire your recruiting team. They can be very useful. Like, they can help you find potential candidates and they can help with negotiations and other parts of the hiring process. But when it comes to initially reaching out to engineers and getting to know them, that should be you. Because if you're looking at your inbox and you've got a generic looking form letter from a recruiter sitting there and right next to it is a personally written email from an engineering manager who's genuinely looking to build out their team and they thought they were interested in what you're doing, which one are you gonna wanna respond to? Even if you're not looking for work, which one do you wanna respond to? So I say, go meet candidates for coffee, learn about their career ambitions and figure out what they're looking to do next and what kinds of projects they like to work on. Find out what programming languages they're excited about. A recruiter can do a pretty good job of building a network of connections, but when you go out and meet engineers and talk to them face to face, you can build relationships. And if you can get to a point where engineers will wanna leave their current jobs because they wanna work with you, you're setting yourself up for long term success throughout your entire career. It's also a good time to start thinking about how you interview people. When you're in a period of rapid growth, every hire that you make is really important, but each person that you hire is still just gonna be one person in your team. So each one has some incremental importance, but it's just one person. But when a candidate is trying to pick a company, they wanna go work for, they're picking the only company they're going to be working for for a while. So it's a higher stakes decision for them. And I think it's really important to keep that perspective when you're interviewing people. Help the candidate imagine themselves working at your company. When you bring them in for an on-site interview, everyone should get lunch together as a group and you should chat with each other and joke about the stupid things that you joke about. At Carbon 5, we have a saying that cereal is a salad and we've had deep discussions about that and what a salad means. Just stupid things like that. Spend a good amount of time pair programming with the candidate on some work that's gonna be representative of what their day-to-day work will look like. That's really helpful for you and it's just as helpful for the candidates so they know what they're getting themselves into. Think of the things that make working at your company great and make sure that the candidate gets an opportunity to see those things when they visit you. The things that you're looking for in candidates is gonna need to evolve as your team grows as well. As your team composition changes and as your technical stack changes, your tech questions are gonna have to evolve and I trust that you're gonna have a good handle on that. But as the team size grows, the non-technical parts of a job start becoming substantially more important. Writing good documentation isn't really that useful in a three-person team where everyone's an engineer but when there are lots of teams and when those teams are building APIs for each other, that does become an important skill. Other things like gathering your requirements from a non-technical stakeholder might not be a very common task when it's just a few of you and you're all engineers but when you start hiring non-technical employees who have oversight over other things, then that is an important skill. And you need to be watching for personality traits that might indicate that the person is gonna be a bad fit at your company. I'm not gonna enumerate all those different things that could be a problem. I have a few in the notes that I share at the end but it's just good to keep in mind what might be important to you and your team in terms of personality and demeanor. You might be inclined to think that you can forgive some bad personality traits as long as they're a talented programmer because they're doing good work but in a larger team, you can't rely on that because a big part of the job is actually interacting with those other people and if you're looking at someone and they create friction wherever they go when collaborating with people, that disqualifies them from doing the job even if they are a great programmer. Interviewing is in and of itself worthy of an entire talk so I could go on about it but I'm gonna have to stop. But just keep in mind that hiring people is one of your biggest opportunities as a lead in a company to actually shape its culture. It's very difficult to change the culture with the team that you have because you already have that team but if you're looking to shape something and like shape your future, the interviewing process is how you do that. And also don't forget about onboarding. I've definitely been relaxing at home on a Sunday afternoon, thinking about the week ahead and then I suddenly realized that the next morning there's gonna be a new hire waiting for me and I completely forgot to prepare. It's kind of unfortunate. A new hire's first week is a very important one I think. It sets the tone about what they can expect from working at your company. If you do it badly, it might just leave a bad impression but if you do it really badly, they might actually just leave that first week and then go accept another offer that they got because they don't wanna take the risk. So it's important to get this right. And the upside of that is that onboarding is a process that you can standardize pretty well. You can find a lot of repeatable steps for new hires. So if you can figure out those different steps, I recommend that you figure them out right now, make a template in your to-do list app and next time a candidate accepts an offer from you, you put those items in there and you set a reminder for the week before they start so that you don't forget. When they come in, make sure their desk is ready. Have it complete with their computer and all the equipment that they requested. Have lots of company swag and make sure that the company swag is in their size because if it's not in their size, it sends a message that you don't feel like they should be part of your company or that they're not included. If you have swag that you can give their family and friends, that's also great. And you're gonna introduce the new hire to the team but I also recommend that you set up a series of one-on-ones so the new hire can pair with people throughout your company to kind of learn about the inner workings of it. You might want to pair them with a site reliability engineer on your team so that they can talk to your new hire about deploys and production support and how you do that at this company. You might want to pair them with a designer so that the designer can tell them about the collaboration between designers and developers and what developers can expect from designers for things to implement. Before you know it, you're gonna have a really massive team and it's gonna be time to start changing your work habits to match that you don't know about this. And I really wish that I had some fancy clever secrets to share with you and I've searched for years to find them and I haven't found one so I'm sorry about that. You're just gonna have to focus on the rudiments but I do have a few recommendations that I think will help you be productive as a big team. So let's work up an example. Let's say that we started an internet of things company and this company is building Wi-Fi connected cameras that go in your refrigerator so that you can see your refrigerator's contents from far away. We're gonna call it ChillFlix and Net. And obviously this is a great idea. It's gonna go gangbusters. You're gonna have the hottest VCs throwing money at you. Stores aren't gonna be able to keep this camera on their shelves. It's super popular. So you start a higher display and before you know it, that really small tight knit team that you had is huge. At ChillFlix and Net you've got a software stack that's got several components and I'm gonna go over a few of them. First off, those devices need some firmware so they can do their thing. That firmware is gonna have to talk to a backend API and those cameras are gonna also need some video servers to stream to. We're gonna need mobile apps. We're gonna need a web app and those apps need a backend API to serve them. And then we're also gonna need an internal management app that our support team can use to help troubleshoot and that sort of thing. And when you've suddenly gotten really big as a team, I think a really common first instinct is let's take each of those code bases and assemble a team around them. And that does have some nice advantages. One thing I really do like about this approach is that it gives each team ownership over a specific code base. And it might work really well for you but it also works off the assumption that for a particular thing that you wanna ship to your users, you can do it with just a single team. And that's often not how features work. If you've got a team that maintains your mobile app and you've got a different team that maintains that backend API that powers it, it's not gonna be very long before someone comes up with a new feature that can't be done without both of those teams doing some work and you're gonna end up in a situation where one team is reliant on the other getting some work done to ship a feature. And maybe you can get this to work fine too. Maybe you can think of exactly what new APIs the backend's going to expose and then in advance you can figure out exactly how that interface will need to work and then the API team can build that API first and it's gonna work perfectly and then afterwards the mobile team will just add the feature to the app using that API. And that's really tricky to pull off. It's just, it's very hard. The process of building things reveals the things you didn't understand about the problem in the beginning and you'll probably need to deviate from the plan. And it's also common for two people to read the exact same description of something and they might both think that they have the same understanding and they're thinking of the same thing only for it to turn out that they have completely different things in mind. And when you've got the people building these two different components where one's dependent on another and they're on separate teams, this can really delay things getting shipped out because you have to keep throwing things back over the wall when you find out things don't work the way they were supposed to. One way that I think might be a good alternative to combat these issues is to build out cross-functional teams where there are members of the team that can work on many different code bases. And then to decide what each team should be focused on doing, I recommend organizing these teams around your company's high-level business goals. So let's kind of look at how that might work with ChillFlux and Net. One thing that we want to be doing is building out some new features for users. For instance, maybe we want to have a feature that allows for motion alerts so that you can get a push notification on your phone when someone opens your fridge so you can keep a closer eye on it. Being an IoT device, ChillFlux and Net has been processing a lot of returns for their devices that were working fine, but the customer just had difficulty getting the device set up and connected to their Wi-Fi network. So they gave up and they returned it to the store. So let's have a team that's focused on making these devices very easy to install, and they're gonna visit all parts of that install process, from the mobile app to the firmware, to how it connects to your network, and just all those different things. At some point, this might be a solved problem and the team will be able to disband and work on other things. But for now, it's a big enough problem for your company that it's worth having a dedicated team that's focused just on that problem. And finally, let's have one more team that is working in close collaboration with the customer support team. So when someone calls customer support and the support person gets stumped on the issue, they've got someone they can talk to to get an assist. And because this team is helping out the customer support team, it's kind of a natural fit that this team will also add new features to this internal support app. So when the customer support team needs new features or they're having problems with that app, they can get some help with that too. And by doing this, you're giving each team its own independent backlog of stories that are focused on these problems. And each team is empowered to solve these problems themselves in whatever way they seek that. We've essentially taken these teams and we've turned them into miniature startups within the startup. They each have a sense of mission because we've centered them around specific business goals. And as individual teams, they're small, so we can start to get back those small team advantages that I was harping about earlier in the talk. Now, as we're getting bigger, it's just a natural thing that we need to talk about meetings. As you start to have more teams, you're gonna start having more meetings. And I think it's a good time to start thinking about how your meeting culture works. And meetings get a lot of hate and I've been thinking a lot about all the different things that I dislike about them. But I think I've been able to figure out just two reasons that we hate meetings. I really do think I got it done too. The first thing that people hate about meetings is that it yanks you away from whatever it was you were doing. So you could have been doing some really great work. You might have been in the zone. You were focused, you were solving this really complicated bug and you've had all this context in your mind. And then suddenly it's time for a meeting just because someone put it on your schedule and you're getting yanked away from that. And now you need to go talk about something else that's completely different. And sometimes it's a mild annoyance, but sometimes that can be really aggravating. I've certainly had days where I got almost nothing done because there were just a couple of meetings scheduled just the right time when I was just about to get into my flow state. The other thing about meetings that I think makes people hate them so much is that for what they cost you in person hours, they usually don't deliver in value. If you gave me four quality, underrepresented hours of time to just do some work, I could get some pretty good work done. But if you gave me a one hour meeting with four people in it, good luck at anything close to that done. It's just hard. One company that I've looked to who seems to have a very healthy meeting culture is Amazon of all places. I haven't worked personally for Amazon, but I've read a lot about their meeting culture and I've talked to people that I know that work for Amazon and I checked to make sure that this is actually something that they do. And they said yes, this is something they do. So first rule that they have for meetings is that they can't be too big. Their rule of thumb is that you should be able to feed everyone in the meeting with two pizzas. And in meetings, Amazon doesn't use slide presentations. Instead, somebody's gonna prepare a narratively structured memo that's usually about six pages long. You're not using bullet points or anything like that. This is actually six pages of written prose, like actual text. And Amazon refers to this type of memo as a white paper. And a lot of work will get put into this white paper. You don't just put it together like an hour before the meeting and run with it. You write it and you rewrite it and you give it edits repeatedly and you share it with coworkers to get feedback from them and eventually you're going to just kind of set it down for a couple of days and get away from it. And then a couple of days later, you'll look at it again and you'll make more edits. By the time you're done making this paper, it's a very highly polished document. And then for the first 30 minutes of that meeting, everyone's reading it together in silence. And I think that's a really clever way to conduct a meeting. One thing I think it's very reflective of Amazon's culture. They started off as a bookstore and it stands to reason that they value reading a lot. But having that rule that you have to have a white paper to have a meeting like this means that it's a lot of work to call a meeting. You can't just open up Google Calendar and just grab a bunch of people in a room. You need to write out that memo and have your plan fully thought out because it's pretty hard to write six pages of fully formed narrative text without the idea being coherent. And if you want to be booking several person hours of time to have a meeting, I think it's good to have that burden. And another thing I really like about this format is that you're building the prep time into the meeting itself. I've certainly held a lot of meetings that I've invited people to where everyone kind of shows up and it's really obvious that no one has read any of the resources I shared in advance. And so my agenda kind of gets derailed because instead of making the decision I wanted to make in the meeting, I'm instead explaining what was in the materials that could have read already. And I can get mad at people for that but the truth is that people are just time constrained especially if you're inviting executives to your meeting. So instead of just getting mad at people for being time constrained, let's actually adjust the meeting format so that it's actually impossible to come in unprepared because when you come in you're gonna have that preparation time right up front. And I also really like that everyone's reading the materials together. It means that everyone's getting prepared at the same time and it means that once that 30 minutes is up everyone is like literally and figuratively on the same page and that's a really good way to start having discussions. I think that good meeting culture begets good productivity culture overall in your company. So it's really worth thinking about how your company should do it. And let's walk through a scenario where this might come in handy. So as I said before like when you're a really small company you're pushing things to Heroku but eventually you're gonna become a big enough team that Heroku may not meet your needs and you might need something more complex. And getting new things out is no longer a single deploy. You might encounter a situation where you wanna do something really simple but actually would take a lot of different steps. So let's talk through a scenario that Chilfix and Net might have to deal with. So let's suppose that right now they're mobile apps to make authenticated requests they're gonna use a really simple static auth token. And the security team found out about this and they don't really like that very much. So you talk with them a little bit and they said let's use OAuth 2 instead because with OAuth 2 you've got a rotation of those tokens built in so you don't have like one token that's good for weeks on end. And as luck would have it Chilfix and Net actually has an OAuth server because someone added it a couple of years ago when the company was really young and they need to do a quick third party API integration. But the catch is that it's handled by a Rails gem called Doorkeeper which is mounted in the Rails app that happens to hold your company's marketing site. And that's just not gonna work anymore because Chilfix and Net has millions of mobile app users. So if we suddenly just switch those mobile apps to be using OAuth, then that marketing site which is supposed to be used for marketing site things is now serving 90% of its requests to give your mobile apps tokens. And we don't want that. So we're gonna have to extract that Doorkeeper engine out into a separate app. But that's not completely straightforward to do because there are currently people like third parties using that OAuth system where it is right now. So we need to think of a good plan to make sure that we can get this moved over in small, incremental steps. That's gonna involve multiple teams and code bases. And it needs a coordinated plan. So let's make a white paper for it. The goal for making a document like this is to create a common level of understanding that you can start a discussion from. The audience for this document could be basically anyone that's involved with pushing out these designs. So people who are familiar with the code bases that are involved would be involved with this. You might have site reliability engineers involved with it and any managers or executives who are interested in this might need to be able to read it. So as you're writing it, make sure that you start from a very high level background. Don't make too many assumptions about what people already know so that it's possible to just like kind of read through this document. And once you've read through it, as long as you are proficient in the art, you kind of know what's going on. When you're running at scale like this, it's not enough just to describe the architecture you wanna end up with. Because you need to think about the steps it's gonna actually take to get there. And for apps that serve a lot of traffic, you need to break that down into steps that won't disrupt any users. You can't just stop the whole thing, change everything and then push it back up because that would take too long. So you might actually find that for the final thing you want it to build, there's not a very practical set of steps to get there from where you're currently at. So you might actually need to take a different approach. So here's kind of like the list of steps I'm making. Don't worry about reading the actual details of this. I'm gonna post a link to it at the end of the talk if you're actually interested in the details. But the specifics don't really matter for the purpose of what I'm going to say too much. The important thing to note here is that we've been thinking about individual steps where each step is not disruptive to users until it needs to actually change something important. And we can do each of these things as separate deploys. For instance, the first thing that we're going to deploy is a copy of this extracted standalone app that's a new Rails app. And we're going to point that to the existing database so that we can use it and we can create tokens and we know that it is working at sharing state. But if it doesn't work, that's okay because all the traffic that the third party production apps are still using is pointed to the original site still. We haven't broken anything yet for anyone. The steps as they're written in this document are meant to give you an understanding of what's going on, but keep in mind that you don't need to actually write down every single deploy step and every command type into the servers. Really, this is just about trying to find a good share level of understanding and it gives you a point to have a discussion from. And once you've decided that this is how you want to move forward, then you can actually produce the final steps that you're going to use here. I personally used an app called Dropbox Paper to build the stock. I like it because it lets you use Markdown syntax, which I'm a big fan of. It's got a pretty clean design and it's also easy to pass a link around to people and they can collaborate on it in real time and they can add comments in real time. Google Docs is also good. The specific tool that you use to make the documents not as important as it is just to have the practice of making this document. And this migration that I just described is loosely based on a real one that I performed at an old company that I worked at. And if you look at the changes in the document, there's nothing about it that's gonna be technically astonishing. And that was really the point. I wanted it to be very straightforward. The other really interesting thing about that approach is that it's not at all the architecture that anybody wanted. No one wanted to build a brand new app that was going to serve a lot of traffic that was running on rails because as we all know, rails can't scale, right? It took a little while for me to get by and with the rest of the team, but I ultimately broke it down by saying that taking the steps that you take with this reduces risk. It involves using things that we already know how to use. We know how to use Postgres, we know how to use rails, we know how to make a database replica from an existing database, and then we know how to promote that replica to a new application. These are all things that our team is really familiar with. So there was not much risk that we would deploy something and then not know what to do. We also reduced risk by continuing to use the same doorkeeper rails engine in the new app. There's no new code, so that also reduced the risk that we're going to end up in this scenario where you've got this new application and it actually doesn't quite behave the way the old application behave, which would have also been catastrophic because it would have broken the contract for existing users of that API. When you're doing projects like this, especially with a very big team, it's really important to keep in mind that it's not just about what you want to end up with. It's really very much about the steps and if you don't put as much attention into those little intermediate steps along the way, you may not be able to ship that final product. So that process for extracting the auth server into its own app kind of looks high stakes because we're kind of like making a production app, working in a brand new home, but the steps that we took helped ensure that it wasn't a high stakes operation. We took lots of steps to mitigate it and that's one thing that I really love about programming as a practice. You get the opportunity to make all your stupid mistakes in a safe place. You get to do it right at your desk and there's no embarrassment. You can take as many practice run-throughs as you need to outside of production and only once you're sure you've got it right then you can actually do it in pride. There are other jobs that lack the luxury. Surgeons are always editing code in production. That's just the nature of how performing surgery is. And the teams that perform surgery take lots of steps to make sure that the surgery itself will go smoothly, but at the end of the day that surgeon is holding the knife up to the patient's body and they can't slip up. It's a lot of pressure. We get to design the way our work is to some extent. Let's not make our deploy procedures look like surgery. Let's instead spend a lot of effort into making sure that everything we do is as stress-free as it possibly can be. Google publishes a really good handbook on site reliability engineering and I think they know a thing or two about that discipline. And the interesting thing when you read through it is that a lot of the things they talk about sound like these kind of mundane things that you wouldn't necessarily think of when you're thinking about running at scale, but actually turn out to be pretty important. One thing that they dedicated an entire page to was this topic of sanitizing and validating configuration for your applications. And they talk about, they have a lot to talk about, somebody asked me, they talk about validating it to make sure that it's in the right format. And they talk about making sure that a new configuration input is not super different from the old configuration input because if it's very different, like if it's much smaller than the previous input was, then there's a good chance that it's wrong. And if it's wrong, the app should continue running the old values until someone intervenes. And that's so much rigmarole for something like configuration, right? But something as simple as configuration is big enough to take a big company like Google down. Back in 2005, they had an incorrect DNS entry for Google sites for six minutes. In 2009, they had another WipC where the entire web got marked accidentally as malware, which, I mean, maybe philosophically, they might be right, but they probably didn't mean to do that. I really recommend reading through this guide and I'll post a link to it at the end of the talk. It's really good for developing a healthy mindset about how you think about your systems. One other thing that they talk about that I really like is this mentality of having an error budget. And the way that works is you take your guaranteed uptime and you figure out how much room that you have for errors with that. So if you guarantee that you've got 99.99% uptime for a particular application and you know that application process is 50 million requests a month, that means you have a budget of 5,000 requests a month that can have an error and you're still within your SLO. And you can spend that budget with maintenance or pushing out new code that might not be as stable. Or if you're having a bad month, you might have just spent it already because you had real problems or downtime. When you get to be a big team and you start having SREs, you're gonna learn that there's this natural tension between software engineers and SREs because it's a software engineers' prerogative to be shipping new things all the time. And it's the SREs' prerogative to keep the system stable and always changing stuff is not the way that you keep things stable. So they're kind of at odds with one another all the time. And I think something like this error budget, it's simple but it becomes a really useful mediation tool because instead of talking about emotions and just like how you feel in your gut about how to do this stuff, you could actually just use data and numbers and actually make a more sound decision about whether it's smart to deploy this and how many servers you should deploy it to. And when something really does go wrong, it's very important that you're able to find and address the root cause. And I recommend using the blameless postmortem as a tool for accomplishing this. A couple years ago, Amazon had an outage for S3 which is, it was very surprising because S3 had been around for many years and it's usually a very stable system. And I was reading through their postmortem document and what they had found that happened was a team member was entering commands out of a playbook to take some servers out of rotation for maintenance. And they made a typo and they took way too many of the servers offline and it caused a full blown system outage after a bunch of cascading failures. And so you might be reading through that and you think, okay, the root cause is that engineer made a typo. But that's not a root cause because if that's your root cause, what do you do to mitigate that in the future? Do you just like tell the engineer, hey, make sure you don't make typos when you're deploying changes to S3? Like that doesn't work. People make typos sometimes. The real root cause here is it was possible to take down S3 with the typo. And when you know that's your root cause, that gives you a path forward to actually adding real safeguards to making sure that doesn't happen again. And in this case, Amazon added some extra validation to the inputs, to those scripts to make sure that you don't accidentally take too many servers on a rotation at once. Again, validating configuration inputs, kind of a big deal it turns out. We're not surgeons and there's no reason for our keystrokes to be directly connected to the patient's arteries, especially when the patient is as big as AWS. I really like blameless postmortems because they let you truly learn from a failure. They create a psychologically safe work environment where people don't live in fear that they're going to make a mistake and get yelled at or fired. If everyone's in constant fear, they're going to break something, they're going to avoid shipping new stuff. And if your work environment isn't psychologically safe and you ask someone in a postmortem to explain their role in the failure, they're going to downplay it. And they're just going to kind of omit important information out of fear of retribution. And if your culture is preventing you from getting the facts right in a postmortem, then you're going to repeat your failures. It's not a random coincidence that a happy team that feels safe in their jobs is also a more effective team. So most of the things I've been talking to you about for the past half hour or so are really mundane. And I know that being on a growing team can be very frustrating when you start struggling to do really simple things that you know are simple. But it's important to keep perspective. Remember, things aren't as simple to do when you've got boards of magnitude more people involved. If you've been walking as a biped for all of your life and at some point, you just kind of start growing extra legs steadily, you might wake up one day and realize that it's really difficult for you to walk now. And that might be frustrating because you know how to walk, you've been doing it for years, but it wouldn't be unreasonable for you to struggle to walk when you suddenly had 100 legs. So don't beat yourself up over this when the team starts getting bigger and you start finding that you're feeling ineffective. It's important to remind yourself that you're a new team. Like when you add people, the team is a different structure than it was before. And you have to learn how to be this new team together. I think it's really beneficial to think systemically about how your team works. When I remarked earlier that it's really great when a small team can use Heroku as their deploy strategy, basically that meant deployers were free because you could do them just by typing a single command and there was no mental overhead to that. And because a deploy is basically free, you would do them all the time. That one simple thing, making deployers free created this economic incentive to do them a lot and it had a really deep effect on your company's culture and how you shipped things. When you think about things like that in terms of economic incentives, it can really help you understand why your team is the way it is. When something goes wrong, do you demand to find out who's responsible and then you lash out at them? Well, then don't be surprised when people aren't forthcoming with important information about a failure because they want to cover their asses. Do you love to obsess over data and KPIs? Eventually you're just going to start focusing exclusively on the numbers. And when you get to that point, don't be surprised when your team starts to build things that are just in service of those KPIs instead of building a great product. And if your team's doing that long enough, that can cascade into some pretty serious failures that are expensive and the mistakes are worth billions of dollars. And if you don't believe me, I've got a really good deal for you and I use diesel Volkswagen. You wouldn't run a mission critical system with just one main database and then get upset at the database for failing and taking your app down with it. You design it with the assumption that failure is possible and when it does fail, the system can degrade gracefully. Humans are fallible and if you think of your team as a system made of these fallible components, in an almost ironic way, you're better embracing the fact that you are a teammate of humans. It's impossible to scale without becoming more fault tolerant. That applies to your tech and it applies to the people on your team. Years ago, I was watching the documentary Objectified and they were interviewing Johnny Ive, who is Apple's industrial design lead. And he said something really fascinating to me. He said that the industrial designers at Apple spend more time designing fixtures and jigs that are used to manufacture their products and they actually spend on the design of the products themselves. And that kind of blew my mind when I first heard that because I had always thought that when you're Apple and you're working at Apple scale, like there are just people who can deal with that for you. I always felt those like small things were beneath them but they're not. Doing great things at scale involves doing all these really small things that scale really well. And like that reflects the things I've been talking about in the past half hour. Like I haven't talked about any new exciting distributed databases or any new blockchain technologies and I've even managed to not talk about how great Elixir is. Instead I've been raving about good meeting culture and nagging you to get coffee with your candidates instead of just pawning it off onto recruiters. That technology part is tangential because a company's success rarely hinges on just the technology alone. Twitter used to have the fail-well here constantly but at the same time they were the talk of Silicon Valley, everyone loved them. Microsoft made billions making mediocre software in the 90s and there were lots of other companies in the 90s that made really cool stuff that kind of faded into obscurity. And if you found a lot of these things to be mundane and you'd rather just be focused on the tech, that's fine but keep in mind that like leading a big team is not just about the tech, it's about a lot of these other things. If that's boring to you, you might wanna try to focus more on being a small team because when you are a small team you can focus more on the tech because you have fewer people to focus on. But if you do have a knack for execution and if the challenges of getting humans to work at scale is an interesting challenge to you, I hope that you get some thought to leading a bigger team. Even if you don't consider yourself to be a heavyweight when it comes to technology stuff, I know I'm not. As an industry we spent a lot of years thinking that the tech was the solution to everything and we've seen the very real consequences of that. And as we start to come around on that, the teams that we need to start having need to be more human centric and I hope that you wanna help build that future with me. So thank you so much. Thank you.