 So, yep, I'll be talking about developer productivity engineering. My name is Pantha Makos and you can find me on the internet at Pantha Makos. I work at a company called Strava. We're a GPS-based training site and social network for athletes and we have about 50 engineers working there. I've been there for about 8 years, but more recently I've been working in productivity engineering, which effectively means that my job is to try to make other people as productive as possible. And even though I'm technically a team of one, there are a lot of other people at Strava that spend some portion of their time working on productivity related tasks. So the takeaways from today can apply to any engineering organization or even to yourself and in your personal projects. So what does it mean to be productive? For me, productivity is inherently tied with happiness and that really means how engaged my mind is on a particular task. It means the more time I get to focus on challenging and interesting problems and the less time I get to spend on repetitive and mindless tasks, the more productive I am. Oftentimes that means that I'm using automation to try to get rid of that repetitive and mindless stuff and productivity engineering isn't just about automation, but automation is a big part of it and today I want to talk about automation because it often costs engineering time and effort to build automation and it's not always obvious how we prioritize for that or even make a case that it's important. So you may have found yourself in a case where you're battling with your automation or you just have so much stuff to automate that you don't even know where to be in or you are just working heads down on your next feature and you don't even have time to think about automation. And it's okay to decide that automation is not important for you right now, but it's a little disconcerting to feel like you can't really make that decision in a strategic way. So it's Strava. I've developed this framework that I'm going to talk about that helps you think in a more systematic way about automation and when it might be right to actually spend time and effort to automate something. It's called DPE or developer productivity engineering named after site reliability engineering developed at Google and Google uses site reliability engineering to apply engineering practices to the problems of site reliability and site operations. Developer productivity engineering uses engineering practices to solve the problems of developer productivity. It can be broken down into three steps, identify, measure and prioritize. And I'm going to tell you about each of these. So let's start with identifying productivity bottlenecks. So Sandy Metz said duplication is far cheaper than the wrong abstraction. And when we're writing code, that basically means we can't just go around deduplicating stuff because we've done it twice. And this because it introduces wrong abstractions and wrong abstractions have a high long-term cost. They're difficult to change and they're difficult to maintain. And the same concept applies to productivity engineering. Just because you've done something twice does not mean it's worth actually automating. So we don't want to end up in a state where the work to maintain our automation is more than the cost of just doing the thing in the first place. So there's a more effective heuristic we found and it's called TOIL. And site reliability engineering has this exact concept, but I think it applies very well to developer productivity engineering. TOIL usually means hard or menial labor, but that's not a very rigorous definition. So for our purposes, TOIL is going to mean that a task satisfies a set of approximately six criteria. And I'm going to go through each of these step by step. The first is that the task needs to be manual. And this might seem obvious, but if a machine is already doing it, our threshold for automation should be a lot higher, or improving that automation at least. The second criteria is the task needs to be repetitive. That means that it really needs to be ongoing. We need to either be doing this once or twice a week, once or twice a month, once or twice a quarter to actually make it worth investing in. The task also needs to be automatable. We at least need to be able to envision or have the budget for putting some software engineering effort towards this task. If we don't, then it's probably not worth automating. The task needs to be tactical and not strategic. That means that it needs to happen in response to something that is actually measurable, like CPU load or a site load or QA suite passing. The task should not really provide enduring value. Basically, that means that it should not provide a permanent improvement. If I do this task again for a similar or identical result, then there's no permanent improvement from that task. It's probably a good candidate for automation. Finally, it helps to know if the task is going to scale linearly with growth or even faster than that. Basically, that means as we add more engineers, as we add more commits, as we decide to want to deploy more often or release our mobile applications more often, the task is going to get harder or crappier to manage. These are the six criteria. I'm going to give you a simple example from my own work. At Strava, we have a Ruby application, a Ruby CLI that we run on our machines to deploy the website and API. And we run it twice a day. And the deploy script does all the nitty gritty of changing the bytes on all the EC2 servers and restarting them. But developers still need to be present to make sure that everything's going OK and that we don't need to roll back. So the task is definitely manual. We have to type the commands into our keyboards, and we have to pull certain metrics or look at graphs to make sure that nothing went wrong. It's repetitive, we have to run it twice a day. It's automatable to some extent probably. So we could write a cron task that actually kicks off the deployment, and we could probably develop some service that actually pulls metrics and pushes notifications to us when something actually does go wrong rather than expecting developers to pull that information on their own. It's tactical because it happens in reaction to the QA suite passing and it happens in reaction to time passing. In other words, it happens twice a day. There's no enduring value to the deployment. This might seem strange, but if you think about it as all the product work and development that went into creating the code, that's the stuff that created the enduring value. Changing bits and bytes on a server does not create any enduring value. And it scales at least linearly. As we hire more people, which we are hiring, by the way, we're going to be adding more commits and our deployments are going to get larger or we're going to have to do them more frequently to keep up. So it's quite simple, but that's also the point. You should all feel comfortable assessing toil. Almost anyone in your organization should feel comfortable assessing toil. And the most effective way I found to do this is basically to talk to people, whether that's through retrospectives or weekly and bi-weekly one-on-ones. That's the most effective time that toil actually comes up. So let's talk a little bit about also measuring productivity. And I'm going to start with the obvious, which is you should not track your time. Tracking time sucks, and it's a horrible way to actually measure productivity. And one of the main reasons is that productive work is inherently highly varied, and it's very difficult to correlate some long-term productivity gain with some initial activity that might have created that productivity gain. So we've taken a more indirect approach, and that basically means that we focus on the bad parts instead of the actual productive work. So we focus on the things that are negatively impacting productivity and try to reduce those. The first thing we do is we measure toil. And the best way to do that is send surveys and talk to people. Basically, ask people to estimate how much time they're spending on horrible tasks or manual tasks or toilsome tasks. The other thing we do is we instrument all of our existing automation, because if someone complains that they are waiting on the deploy script, it pays to know how long the deploy script actually takes to run. So it's pretty simple, but a little bit maybe non-intuitive. I think it's kind of liberating to feel like you can actually measure productivity in some way and make a dent in it. It just, you need to reframe the problem a little bit. So let's talk a little bit about prioritization because you've probably now gotten to the point in your organization, maybe two months down the road, where you've identified a bunch of toil, maybe you've measured it, and you have so much toil that you just don't have any time to actually automate it. So actually picking what you wanna work on next is quite simple. It's a matter of calculating four different costs. The first one is the toil cost. And it's very important to frame this in terms of some recurring measurement like every month or every week we spend a certain number of hours doing this task because the task is ongoing. You also wanna calculate the implementation cost. How long is it gonna take you to actually automate a solution? And the more nebulous this is then the higher the toil cost should be before you actually were thinking about spending time doing it. And software isn't free. You're gonna have to spend some time maintaining it. You need to subtract this directly from the toil cost. So if you're gonna spend an hour a week maintaining the automation, you should be at least spending initially an hour a week on the toil process itself. And finally, the onboarding cost has to factor in there. I like this one the best. Basically if you get a new hire in your company, how long would it take that person to actually own this process? Because you don't want any one person to be the only person who's responsible for a particular process. So we've been using developer productivity engineering at Strava successfully. As an example, near the beginning of this year, we decided that our mobile release process was quite toilsome. We calculated that we were spending approximately 20 developer hours per week on it. And after a lot of automation efforts near the end of this year, we'll be saving, once you subtract the implementation and maintenance costs, we'll be saving upwards of 17 developer hours per week by automating the process away. So I think you guys can use this in your own lives as well, successfully. So thanks. I work at Strava. We are hiring. You can find my slides at this link and find me on Twitter, GitHub, et cetera. Thank you.