 Still kind of waiting for the coffee to kick in. Can you guys hear me? Is my thing working? All right. Awesome. My name is Charity. And I want to talk about bootstrapping an ops team. Now, there is so much ground that you can cover when you're talking about team building, interviewing, hiring, training, team building, culture fit. Like, these are all things that our industry is frankly really, really bad at. We could be here for hours having a group therapy session about this stuff. So the way I'm going to try and trim this content down to fit into what I've got like 25 minutes left is by targeting the message to startups who are just starting to think about what does operations mean? Do we need it? If we need it, do we need an ops? How do we get an ops? How do we not fuck it up? And to me, I'm old school. I use the word ops. And to me, that encompasses operations engineering, site reliability engineering, scalability architecture, and design, DBA, DevOps, anything related to developing and maintaining complex systems at scale, basically. I'm currently a production engineering manager at Facebook, where I work on Parse, which is a platform for building mobile apps. I have previously been called a sysadmin, a release engineer, DBA, and a really terrible software engineer. Before the acquisition, I was the first SRE type at Parse. I'd build out their infra from the ground up. Before Parse, I was the first engineer at Shopkick. Before that, Linden Lab, blah, blah, blah. So I have a fair amount of experience, like bootstrapping teams and growing both the infrastructure and the human side, which means that this is something that I have done repeatedly very, very, very badly. And seen done really badly, I personally have fucked this up in so many ways that I can't even count. So I have a lot of thoughts and feelings on this subject. There are really three parts of the story that I want to cover. Do you need an ops? Are you ready for an ops? If your answers are yes, then what qualities are you looking for? Like what makes for a good startup ops hire and how is that different from what makes for a really good big company hire? Because actually, those are two very different skill sets. And how do you interview and sort for these qualities? We'll start with the first question. Do you need an ops team? And then the era of dev ops, like some people are going to be like, hmm, you know, isn't this question? I don't think so. But you shouldn't assume that the answer is yes, right? Like if you're a small team of software engineers, like a lot of people who think that they need ops teams, don't. They just need someone to really care deeply about their infrastructure and make reasonably good decisions. But this does not have to be a dedicated role for most startups. And your software engineers should never feel like they're off the hook for operational excellence. So there's this particular anti-pattern that we see startups do over and over, where it's a tiny little startup, like in their mom's basement, like they figure out how to use AWS. They figure out how to deploy code, like this is not rocket science. But at some point, if they take off and they start experiencing growth, what happens is they reach a point where this is all they're doing is the infrastructure work. And they get very unhappy, right? Because this is not their core skill set. It's not what brings them joy. It's not what they decided to wake up and do every day. They're spending all their time firefighting and hacking around problems that they've created for themselves. And they start to feel burned out and frustrated. And this is when they're like, we need an ops team. And maybe they're right, but maybe not. Like, not to be totally heretical here, because I am an operations engineer, but let's be clear. Operations engineering at scale is a very specialized skill set. Like, it is not software engineering light. It is not getting someone to get paged in the middle of the night, so you don't have to. To do all the shit work of getting Jenkins to run and getting code to deploy and spinning up instances automatically, right? You're not hiring someone to take over all the crap you don't want to do. You're not hiring someone to get all the developers off the hook. If you're hiring an ops team, you're looking to hire people who know things that you don't about running reliable systems at scale. So do you really need an ops team or do your software engineers need to get better at ops? Like, Instagram had zero ops people when they got acquired by Facebook for $1 billion. They had zero ops people. They had, I think, eight software engineers, but those software engineers were disciplined at one thing, choosing very boring technology. There's a blog post out there about this. I encourage you to look it up. EngineX, stuff that's been around for 10, 15 years, they were able to build. This is an impressively rigorous operational mindset, because what operations engineers are good at is bringing order to chaos and spending their innovation tokens wisely. So you need an ops team if you have hard operational problems, right? If this is a specialized skill set, you only need it if you need these special skills. You're not going to attract top tier operational talent unless you're actually offering real, hard problems of reliability or scale. Let's talk a little bit about what it means to have hard operational problems, right? Like, one category is extreme reliability demands. Like, if you're in the financial sector, right? I remember talking to Square a few years ago when they were legitimately still a startup, but they were like, you know, if we drop a few API requests on the ground, we postmortem that. Like, we cannot afford to drop anything because they're dealing with people's money, right? Stripe is the same way. If they have an outage that, you know, where they're down for a minute, they tweet about it, they like postmortem it, this is super important. They don't have huge problems of scale. Like, their QPS is not what any of us would think of as like, huge, but it's a really hard operational problem to guarantee that much availability. Now, not everyone has these extreme reliability demands. Like, most startups fail, and it's usually not that you are down for a minute a week, right? Another category of hard problems is just your rate of growth. If you are 5X-ing or 10X-ing, even 3X-ing year over year over year, you effectively need to build a new version of your infrastructure every 10X, right? Google has this philosophy where they never design anything to scale more than 10X of what it currently is because they just assume that you will make too many wrong assumptions about what your needs will be at that point. And in my experience, that is absolutely true. So if you can only see to 10X where you are now, and you're hitting that every year, like, that's a terrifying and exciting set of problems, right? Fourth category of problem, third category of problem is extreme security. This is pretty self-evident. A fourth category that I actually think is related to a lot of these, but somewhat different, is solving some operational problem for the entire internet. Now this is something that Parse is trying to do, right? We're trying to build a mobile backend as a service to solve backend infrastructure for all mobile apps. We have, I think we're allowed to say we have half a million mobile apps hosted on Parse. So we are doing operations for half a million mobile apps. That's a hard operational problem. And other startups that fall into that category are like Datadog, you know, they're trying to solve monitoring for the internet. PagerDuty is trying to solve alerting for the internet. Anytime that you're trying to solve a problem naively for everyone, it is exponentially more difficult than solving it for one use case. So let's say, you do have hard operational problems and you decided that you need an ops team. Congratulations. You must be doing a lot of things right. You must have customers, revenue, funding, products, interesting problems. Go team. So now you're faced with the next problem. How do you actually start recruiting and hiring your ops team? How do you recruit and hire the engineers and how do you try to build a team? So the points that I wanna cover in this section are, you know, what qualities make for a really great startup, operations hire in particular. How is that different from big company hires? I also wanna talk a little bit about the special problem of your first hire, like the foundational hire for a team that supplies to operations or like any new team that you're starting. So let's say you're a founder, you've just decided you needed ops, great start. Now in your mind, you're running over the list of skills that you think are absolutely critical for the engineer who's going to fill this position. And maybe it looks a little like this. You want them to come in the door and like be an expert at everything, like Rockstar programmer, expert in SQL, no SQL, like networking skills, Chef, Puppet, Ansible, like service registry. Like you have in your mind what this perfect person looks like and it probably involves a laundry list. This is a bad thing to do. And while we're at it, like while we're like making our wish list as a startup founder, you also probably want someone who will like, you know, work for $50,000 a year, you know, 0.01% of the stock pool, free snacks. Hey, we have free snacks, right? You want a unicorn? And of course you do. Everybody does. Unicorns are amazing, but they don't exist. Like, you don't get one. Get that out of the way. You don't get a unicorn. You don't get this perfect person. You get an engineer, hopefully, with strengths, very specific strengths and weaknesses and specific experiences and skills that may or may not be relevant to what you most need. What this means is that, instead of thinking about what your perfect person looks like, you need to think about what is non-negotiable. What strengths, not what skills, what strengths will make or break your startup and hire for those, you know. If you need someone who cares about that fifth nine, you do not want someone like me, because I do not give a fuck about the fifth nine. I don't like that problem set. Some people do. And honestly, one of the key things that really great operations engineers are good at, it's learning whatever they need to to be successful at that given time on the fly. Most of us are like, we learn it when we need it and then we forget it, we drink whiskey, we forget it, we reuse those brain cells, like. Anyway, but I will say, like in my experience, a very strong predictor of success for ops engineers at startups is something that I've heard referred to as a T-shaped engineer, which means that they're more or less broadly literate, you know, they can talk across the stack and they've done a deep dive into at least one area. When Parse was interviewing me, I had literally never used any of their core technologies. I had never used AWS, Ruby and Rails, Mongo, Cassandra, Redis, Chef, like I couldn't answer any specific questions about any of their technologies. But I could speak broadly to the kinds of scaling problems that they were having and the challenges of solving problems naively on a platform, right? And I've been through the cycle of tenencing services under duress many times and that's what they cared about. I would say that they did a very good job interviewing me. Now in my experience, there are a few qualities that really great operations engineers do share. Like, they're allergic to doing the same thing more than once or twice, right? Great ops engineers honestly don't have to be superstar programmers, but they do have to be capable enough at writing code to do what they know needs to be done, to make a service resilient and self healing, right? Really great operations engineers feel personally offended when their systems break. They don't want to write a run book about how to solve a problem when it breaks, you know? That's like the most irritating thing of workplaces that do that, where they're like, wow, if you get a page about this at 3AM, here's the 50 page document that you followed to fix it. I'm like, what the fuck just happened? This is not what we do. This is not how we build a good system. I think that great engineers, and this is not specific to ops engineers, but really great engineers have strong opinions on a wide range of technical topics, but they're generally not dogmatic about them, right? There's nothing worse than a religious zealot. Like, I prefer chef to puppet. I can argue at great lengths, very passionately, about why I prefer chef to puppet, but here's the story. I know one company who shall remain nameless, who had a perfectly functional puppet infrastructure, then they hired a great chef developer, and he convinced them all to move to chef. Three years later, they're still running both on the same systems. That is ridiculous. They're still paying down the cost of that technical debt. Years later, even though that chef developer has long since moved on, so all great operations engineers strive really hard to simplify and reuse solutions. Complexity, exact, a staggering tax on your humans, and the really good operations engineers are always conscious of that tax. All the best ops engineers are amazing communicators, like full stop. They're not assholes, even in a crisis. Maybe a little bit. Maybe a little bit when they need to be, but it's like a tool that you want to wield. It's a powerful tool. You don't bring it up unless you have to, right? And empathy. Empathy is what the DevOps movement is about. It's not about hiring DevOps engineers. I really hate that. It's about navigating this constant tension between the need of the organization to ship things, the need of developers to ship things, get things out the door, build the products, build features, launch things, and the need of the people who are responsible for the infrastructure to keep everything stable and up and not burning their humans out. This is why operations engineers value process. It's not because anyone likes checklists or stand-ups, but it's because process is what keeps us from making the same mistakes over and over and over and over again. Things that do not predict great ops engineers. Like, what's not on the list, this list, right? Being really good at whiteboarding code. A team of software engineers trying to hire their first operations engineer will often make this mistake. They ask them junior suite interview questions and then they're like, hmm, this person doesn't, I'm just gonna assume that this person is kind of like a mildly stupid software engineer and asked them my low bar software engineering questions on the whiteboard. Not a great strategy. Yes, you should ask coding questions, but like, pair program and don't get upset when people Google things because that's what software engineers do all day, right? Also wanna talk about the big company pedigree issue. This one is kind of like venture capitalists love to fund teams that have ex Googlers and ex Facebook and ex whatever the big sexy company is. And in general, everyone just seems to assume that having worked at a big company is some kind of predictor of quality and it is for a very particular definition of quality. If you have worked and succeeded at a big company, it means that you are very good at working and succeeding at a big company. Often this means you are really good at taking, so here's a problem that someone else has defined and it gets sliced down from team to team to team to team and here it gets to you and they're like, here, do this, you know, and you do it and you execute and everybody's like, ah, you're amazing. That is not the skill set that is most valuable at a startup, right? In general, you need engineers who are better with chaos who are capable of looking at the whole picture and then deciding what small component to work on that day. Well, that is a very particular muscle to flex. I know some engineers who are amazing in both contexts, both at a big company and at a startup, but they are few and far between. Most of us are skill sets really gravitate towards one or the other. I have this, I know this amazing SRE at Google who's like, what's a lamp stack? I can't, I can't even. So it's not to say that they can't learn, right? They totally can, but they're starting at a remedial level and this will take some time to work out. And there's also a kind of learned helplessness that often sets in when you work at a large company for a very long time. Yes. That will kill your startup, right? If they can't overcome that. So, kind of to sum up, the qualities that make you highly effective at a big company, not necessarily the same as those that make you highly effective at small companies. Opposite a startup, like we like to talk about automating ourselves out of a job like being like not reactive and being like proactive, every job at a startup is a reactive role, basically. In a big company that's doing it right, that's not true. So, we've talked about what makes good startup engineers, what you're selecting for. Let's talk a little bit about the interviewing and hiring process. First, let me just state the obvious. This is hard, right? Interviewing and hiring is really hard and the entire tech industry is terrible at this. We are so much better at looking for reasons to say no to people than looking for potential, like reasons to say yes. 10 minutes left, all right. That said, I do have a few suggestions. If you're coming into this as a blank slate, like as a founder or a software engineer and you don't have these skill sets that you're trying to hire for, first acknowledge that, right? Ask questions from your friends and do the job yourself, you know? Spend a little bit of time, hook yourself up to the pager, like figure out what ops at your company feels like and what you really need from them. Even if you do it really, really, really badly, you'll be more literate about the problem set. And so I need to give props to Ben Horowitz for this part. If you haven't read The Hard Thing About Hard Things, the book, it's amazing. You should read it. Don't hire for lack of weaknesses. Like I said, look for the strengths that are going to make your startup a success and try to tease those out. If you're hiring for lack of weaknesses, you'll miss out on some really amazing candidates who can be amazing for you. If you're hiring the founding team member, the very first one, technical leadership is a very specific strength that you should hire for. You should look for someone who has encountered the category of problems that you're trying to solve, again, if possible, whether that's extreme scaling, extreme reliability, security. And this is actually, so when you're hiring a subsequent obstacle for your team, look for the weaknesses of your existing team and try to fill them, right, because every team and every person has weaknesses. Like I am terrible at monitoring and graphing and alerting. I hate it. I can do it. I will do it if I have to, but I hate it. So the first engineer that I recruited to come work with me was my friend, Ben, who's amazing at it. Like we work together really well. And this is a thing, like hiring for strengths instead of lack of weaknesses, this is a thing that big companies are terrible at because they have tens of thousands of people banging down their doors every year trying to get hired so they can afford to select one out of every 2,000 people that apply. This means that they're, I think this is greatly to their detriment because they end up selecting for a certain type of engineer, but this means that there are amazing engineers out there who won't get hired by Google and Facebook and we get them. Interview questions. I know, I'm running out of time. Good questions are very leading and broad and have many correct solutions that will let the candidate demonstrate their wealth of expertise and their context and their background. If you're giving them a coding test, I swear to God, I've seen people do this. I've seen people lock the candidate in a room with a laptop and no internet and ask them to write a JavaScript DOM thing for ops people. And they're like, oh, we can't hire anyone. I'm like, no shit, you can't hire anyone. Some sample questions that I really like are like, hey, our site seems to be slowing down. Our 50th percentile latency has doubled over the last two days. We have no idea why, where should we start? Lots of answers to that. You want someone who's cool under pressure so ask them about the worst disasters that they've encountered. If nothing else, this will be fun, right? And like, don't underestimate how amazing it is to work with someone who could have a sense of humor in the middle of the night when the site is down. That shit is gold. Ask culture questions. Ask how they felt about their former jobs. And if everybody can have one or two bad experiences, that's normal. But if their history is a litany of this place sucked, that place sucked, these guys are assholes, uh, what's the common thread there? This is probably not your hire. If someone complains about a past job, ask how they tried to change it. If they didn't try to change it, this is probably not your hire. Reid Hoffman said this a while ago. He said, if you told me pick one, you can either get references from people who have worked with this engineer before or an interview. I would pick the references every day of the week. And I completely agree. So, all right, I'm hurrying. You think they're amazing. You hired someone. What do you do next? Do you just like turn over the pager, breathe a sigh of relief and like, get on with whatever you think your business is? No, you do not, right? Your software engineers still have to care about the health and reliability of their services. They should still be on call. Operations engineers are a really powerful resource to help you build better systems. They are not your crutch. This means shared pager rotations. This means including your operations team in the development of every product from the very first design discussions. If Ops doesn't want to launch something, shouldn't launch. How to spot bad at Ops engineers. Too much complexity. Not wanting to share responsibility. If your Ops team is doing any of these things, like course correct if you can and if not, they're not your person. Here's how to lose good people. All the responsibility for keeping your side up and none of the power, no veto power over launching stupid features that can't scale, right? Don't assume that they will handle all of the minutiae of building infrastructure. Your software engineers still have to understand and do these things. Have a blameful culture. And frankly, you will lose good Ops to it if you don't have interesting problems. They should leave. This is a really strong argument for not hiring too far in advance of actually having those interesting problems. I'm basically at the end here, but I want to make just a pitch for treating your people really, really well. And I don't mean snacks and free launches and off sites and gyms and stuff, but I mean really caring about them as people, caring about their career trajectory, having these conversations with them, encouraging a healthy life work balance, telling people to go home, stepping in for each other when they've had a couple of sleepless nights. I worked at this one startup who shall also remain nameless, but where anytime there was a release or a launch, everyone was in the office until four a.m. And the founders would be like, oh, you guys are amazing. Oh, you're such badasses. This is so great. And guess what? Every time there was a launch, everyone ended up in the office again because the patterns that you call out and celebrate are the ones that get repeated. So pay attention to what you're praising people for and don't praise them for harmful things. Heroism is not what we're put on earth to do in the context of tech startups. Short-term heroics are fine, but this is not the goal. The goal is to not need heroics, to not need to burn ourselves out. And paying attention to your culture and your people is not a luxury. This is not something that distracts from your core mission as a startup. It is critical to your success in this industry. Most startups fail, right? None of us are going to be at the job we're at now forever. But if people love working with you and want to work with you again and again, like that can be the difference between success and failure, either at the startup or the next. Because it's like a superpower, right? Having people who will vouch for you to their friends and vice versa, that is really powerful. Like for everyone on my team, I believe that being here now is the best thing for them and for their careers. And when that stops being true, I will tell them, I will be honest. Like if it comes to a conflict between what's best for the company and what's best for the people, I believe in the long term, investing in the people is always the winning strategy for ops teams and every kind of teams. Basically all I have, some credit to people who have really influenced my thinking.