 Welcome everyone to this session. You build it, you run it, sounds great but it won't work here by Steve Smith. We are glad to have Steve Smith with us today. Thank you. Hi everyone, I'm Steve and I'm here to talk about you build it, you run it, sounds great but it won't work here because of complicated things we don't know how to solve but we could solve if we made them a priority. So I'm Steve, I'm the head of the scale service at Equal Experts. Equal Experts is a global network of expert technology consultants. I help organisations to deliver at pace with many teams and many services. I've listed some of my scale experiences here to give you an idea of what scale looks like to me. The most recent being going from one team to one microservice to 40 teams and 120 microservices in about two and a half years are very exciting two and a half years. And I've written a couple of books. They're on Lean Pub and on Amazon. They're listed there. Please buy them. My children are hungry. Okay, why are we here? We're here at Agile India because we want to learn how our organisations can survive and thrive in a world that's getting faster and messier every day. You need to build some digital services to deliver customer outcomes faster than ever before. Those services need to have weekly deploys or more, probably daily deploys, two nines of availability or more, a time to repair of minutes not hours and a learning culture in which teams constantly generate insights and implement improvements. And all of those need to be accomplished together for the long term. That's your new baseline for success. And there's a hard problem here. You've had a central operations team forever. You've got self-hosted COTS applications. You've got custom integrations you've built yourself. They might be on-premises. They might be in the cloud. But in either scenario, you've got a central operations team looking after them. They're deployed monthly. Availability hovers around two nines if you're lucky. So that would be 99.0%. Time to repair is an hour or more. That's fine for your foundational systems, the things you rely upon don't change very often. But that just won't cut it for digital services. If you achieve this new baseline and deliver your customer outcome sooner, your organisation will succeed in the marketplace and it will be like bunnies and meadows. If you don't achieve this baseline, you're going to lose out to your competition and the bunnies will be, I don't know, very sad, I suppose. All right, what do you need? You need, you build it, you run it. It's an operating model in which product teams build, deploy, operate and support their own digital services. At least that's how Beth and Tim describe it. It's underpinned by team empowerment, zero handoffs and clear incentives. A product manager is incentivised to prioritise operational features alongside product features from one backlog because they're accountable for reliability, not an operations manager. Product team engineers are incentivised to continually learn and build reliability into their own services because they're on call themselves for the things they build, not an application support team. You can think of you build it, you run it as insurance for your customer outcomes. I've co-authored a whole playbook at Equal Experts about you build it, you run it with Beth and it's a deep dive into the topic and it's available for free at this URL here. You build it, you run it as an interesting history. In 2006 Amazon CTO coined the term in an interview when he described how Amazon teams worked at the time. In the early 2010s the DevOps cargo cult talked about it. People said DevOps team, we need a DevOps team when what they meant was a product team doing you build it, you run it. And in the late 2010s the SRE cult talked about it. People said SRE team, we need an SRE team. When they meant a bunch of CIS admins we've rebadged as SREs, the small pay bump and they're still doing deployments for you, maybe you're a product team, maybe you do, you build it, you run it. I mean really, you build it, you run it is an imperfect name for an important thing but it's the de facto name. A couple of years ago a colleague said to me, did you realise that you build it, you run it would be the hill you die on, Steve? And yeah, it's probably one of my big hills, right after trunk-based development being so important and also people naming their teams after the things they're actually building. Last year somebody talked about a data platform team that didn't call themselves the data platform team. They named themselves after a boy band and I was so distraught I threw up in my mouth a little bit. All right, how do you know when to do you build it, you run it? This operating model selected from our playbook can tell you. So on the y-axis you have financial exposure on failure. There are different relative levels from low up to very high and they're mapped on to availability targets. So a high financial exposure is near the top and it's mapped on to three nines availability. Along the y-axis you have product feature demand. There are relative levels again from low on the left to very high on the right and they're mapped on to deployment targets. So very high product feature demand is on the far right and it maps on to daily deployments or more. And you'll see the more financial exposure you have, the more feature demand you have, the more you need to move from yellow to blue and adopt you build it, you run it. So use you build it, you run it for anything that genuinely needs four nines of availability. I don't see that very often. It's a huge investment in reliability engineering. Use you build it, you run it for digital services which need higher throughput and reliability. And use your operations teams still for foundational systems which need lower throughput. You build it, you run it doesn't mean getting rid of your operations team. I get really crossing people talk that way. You build it, you run it as a hybrid operating model where product teams and your operations team work on what they are best suited to work on. There's advice on this in our playbook along with how a product manager can estimate the financial exposure and feature demand variables you need to use this operating model selector. Okay, what do deployments look like? Well, for digital services, you've got product teams deploying and launching their own digital services. They report into change management afterwards via an automated audit trail. This gives you a higher deployment throughput. There's faster change approvals and deployments. There's a focus on customer outcomes over outputs. There's lower knowledge synchronization costs and there aren't any deployments scheduled in conflicts. For foundational systems, it's the same as before. You've still got delivery teams building and testing changes. Then they head over to change management for change approvals and then they head over to application support team for live deployments. What does incident response look like? For digital services, you've got product teams monitoring and supporting their own services, which means L1 on call during working hours and out of hours. This gives you greater reliability and a powerful learning culture. There's faster alert acknowledgment, faster instant resolution, more informative telemetry refined from live traffic and powerful incentives for developers when there is no one between them and an alert for a customer. They're encouraged to create an adaptive architecture that's able to gracefully degrade on failure. There's also higher quality insights from post-incident reviews, broader knowledge dissemination, and faster lead times to implementing post-incident actions. For foundational systems, it's the same as before. You've got an operations bridge team monitoring foundational systems and doing L1 on call. They hand over to the application support team for L2 instant response when necessary and delivery teams are sometimes sucked into L3 best efforts when things are really going wrong. You might have a single application support team doing L1 monitoring and L2 response without an ops bridge team. That's totally fine. It's all little. It's all the same. Whether you've got digital services or foundational systems, whether you've got product teams on call or an operations team on call, you've got your operational enabler teams like your DBAs, your instant managers, your network admins, and your help desk. I like these teams. They're really important. So I've spoken with many organizations in a bunch of different countries about switching from a central operations team to you build at your own. Here are some of the big concerns that I've heard from development managers, operations managers, heads of IT and more. And usually, confusingly, these concerns are often aired right after the person reassures me how much they like the idea of developers on call in principle. So we've got developers won't want to do it. It won't scale to lots of teams. Nobody would be accountable. There'd be no instant management, Steve. Developers will be firefighters. My personal favorite. We can't hire a DBA for every team. Nobody asks you to do that. Okay. Number one of six. Developers won't want to do it. They're DBAs. They just want to code in a corner, Steve. I've not spoken to them, but I know they won't do this. That sounds to me like you haven't given your developers what they need. First of all, do they understand the mission and how vital it is? Have you explained to them that without making these changes in operating model, your organization will fail to meet its goals and lose out to competitors? Next, have you actually asked them if they'd like to go on call and listen to their answers? I don't mean just hearing their answers. I mean, hearing them, talking to people, clarifying concerns, getting a deep understanding of concerns, I'm willing to bet it's not just money that they're worried about. And have you told people with concerns that you're going to put it right? Whatever you need to do, commit to doing it and give a maximum effort and maximum transparency. There'll be tricky organizational problems to solve here, and sunlight is the best disinfectant, giving your developers what they need. There's an increasing number of public surveys out there about on-call developers, which is great. The 2022 Atlassian survey of 2000 developers in four different countries, including India, if I remember rightly, showed 59% of developers surveyed were on-call, and there was a strong correlation between you-build-it-you-run-it and job satisfaction, despite more context switching. Surveys often mention similar adoption concerns from developers, and the number one concern is not money, it's impact on personal life. There's a good quote on this in the 2022 Instant.io survey, not being able to live life as usual, no drinking, no long bike rides, having to carry a computer everywhere. I've been on-call before in the UK. I remember stressing about 4G reception in cinemas, worried if I would be called when I was driving on a motorway, changing weekend plans so I could be near home and near my broadband connection. So what do you need to do? You need to acknowledge the impact on personal lives, share data on instant frequency, duration, and crucially how often incidents occur out of hours versus in hours, reassure your people there won't be a call out every night for the rest of their lives that they have to handle. They've probably not done on-call before, so they might not have got over that fear yet. You need to create time and space for on-call, onboarding, and training. Give your people confidence they'll go on-call with all of the organisational knowledge they need. For example, make sure your front-end developers are comfortable with runbook instructions to resolve a back-end problem and vice versa. Empower your people to prioritise failure design alongside product features, let them protect their digital services from brittle downstream dependencies with a cash, back pressure queuing, circuit breakers, whatever they think is best, and compensate them for the impact on their personal lives. Pay them to be on-call, not just for call outs. Pay them more on weekends than weekday evenings. On-call is a social sacrifice, and if you don't understand that, somebody else will. If you don't pay your people, somebody else will. The 2019 on-call at community survey showed a lot of variation in payment models, so do your research and figure out a payment model that works for your own organisational context. There is no one right answer there. Concern number two of six. You build it, you write it, won't scale to lots of teams. We'll have 20 teams, 20 people on-call. How could they ever be cheaper than one operations analyst, Steve? I can offshore that person to a cupboard on the island of St Helena, the British exile of Napoleon there. It's a really cheap island, Steve. That fictional ranting, although the island of St Helena is real, sounds to me a bit like you haven't figured out how to balance your financial exposure and on-call costs. Have you thought about an operating model as multi-cost insurance for customer outcomes? Recognise that different insurance policies offer different levels of protection and different premiums, and the more valuable the contents of your home, the higher the premium for your house insurance. In other words, come to terms with the run cost of the ability to run it being a bit higher than a central operations team, and if you measure opportunity costs and revenue protection as well, you'll see it's worth it. Have you mapped out all your teams, all your services and estimated how much money per minute flows through each service? The best way to get started on this is to dig up business cases and revenue estimates. Use those numbers at first just to get yourself going and then gradually refine that data over time with the financial impact of different live incidents. And have you let go of this fanciful idea that everything everywhere must always be on 24-7? You'll have a few services that are critical and they have a lot of money flowing through them all the time. You'll have a lot of services that aren't so critical but they're important. They have a lot of money flowing through them in the daytime less at night time, and you'll have some services that aren't critical at all. They're not important, they don't have much money flowing in the daytime and next to nothing at night time. You can plan your way around this. If you're doing your build at your run it, if you've got 20 teams and 20 people on call, you're doing it wrong, please contact me, I want to help you fix it. There's plenty of ways to optimize your run costs. All right, balancing financial exposure and on call costs. Here's one way to do it and there are other ways. So this is a graph that shows financial exposure on failure on the y-axis and as with the operator model selector we are mapping different relative levels of exposure low to high to different availability targets. And on the x-axis we're simply splitting between working hours on the left and on call out of hours on the right. This is a home improvement retailer I visited a while ago. Let's pretend that it was in South Africa because I like South Africa, I can eat seafood there and not get sick like I do in the UK. This fictional retailer that is actually a real company somewhere has a search team in a search domain, a store operations team in a store domain and teams for outdoors, painting and furniture all in a customer journeys domain. The furniture team alone has two services because I don't know, South African government is more complicated than other countries. It's okay, trust me, I'm a convincing expert in South African retail for the next 60 seconds at least. All teams here are on call during working hours for their own services. This incentivizes developers to design for failure and implement operational features such as business metrics for monitoring dashboards. Out of hours is where it changes from you build it, you run it as you might imagine it. Out of hours it varies by financial exposure. The store operations service has low exposure, two nines of availability which means no developers on call nor any operations team. This pushes developers to still invest in operational features so they don't have to deal with overnight incidents when they start work the following morning. The customer journeys services all have a medium exposure, two and a half nines of availability so the outdoors, painting and furniture teams all together rotate one developer from their three teams on call for their combined four services. Using product domains as affinity groupings encourages a focus on outcomes and a lower cognitive load. And at the top the cloud search service has a high exposure, three nines of availability so a developer is on call from that team to ensure the fastest possible time to restore. So, five teams, six services, everybody on call during the daytime but at night only two people on call for all of those services and all of those teams. This model isn't perfect but it can work and work well. In fact it's in use at company right now and they'll be doing this on call tonight. All right, concern number three of six, nobody would be accountable. It would be the wild west, we must have one operations manager to rule them all, one throat to choke, one person on the hook for a load of stuff they can't possibly understand. That sounds to me a bit like you haven't tried trusting your people to do the right thing. Have you explained to your leadership, your managers and your teams why shifting accountabilities is tied to your overall mission? Help people to understand that without making teams accountable for customer outcomes you're not going to succeed. Have you informed your senior leadership that the key benefits here would be product managers mapping custom outcomes onto operational features, not just product features. Give some examples like monitoring dashboards, visualization of on-call budgets and developers being given time to re-architect their services for a smaller blast radius on failure. And have you described to your teams how their new accountabilities and responsibilities will work day to day? Ask your developers to lend their technical know-how to your product managers in selecting availability targets, deployment targets and tracking on-call costs so individuals don't feel overwhelmed. Okay, trust your people to do the right thing by splitting accountabilities between different people and devolving responsibility down to their teams. Somewhere in your organization there's probably a racy model that stands for responsible, accountable, consulted and informed. Here's an example racy model I'd expect to see for you building erratic governance and this is a great structure to talk about on-call developers to your operations folks, to your senior leadership because it's a structure that they'll be very familiar with I suspect. Let's assume that in your organization you've got a head of operations, a head of product and a head of delivery. This will be a nice comeback for when someone complains to me afterwards. This talk was too simplistic and the examples were too contrived. For digital services the person paying for the product team is accountable for reliability and run cost as well as the feature set. If it's a new proposition it's funded by your head of product so they're accountable. If it's a technology upgrade it's funded by your head of delivery so they're accountable. Product teams are responsible for reliability either way and their incentives are strong. Product managers and developers alike will consider risk tolerance and engineering effort when choosing availability targets because they won't want to spend more than necessary in the on-call budget because it's their budget now. And for foundational systems it's all the same as before your operations manager is still accountable for reliability and their application support team is still responsible there's very little change there very little disruption. Now for disclosure it's tough to transfer accountabilities like this and it's tough to split a line item in one OPEX budget into line items in two different CAPEX budgets. It's tough and it's the right thing to do just because something is hard doesn't mean we shouldn't try. All right number four of six there'd be no instant management. Developers would abandon incidents and go back to sleep if they couldn't fix it themselves we'd have enormous chain reactions of incidents that nobody would know about. We bankrupt ourselves while we slept peacefully in our own beds possibly dreaming about bunnies. When I hear this kind of concern I think you haven't made instant management self service yet. Let's assume you've got at least one instant manager in your organization or at least one person who understands how instant management is supposed to work for internal compliance. Instant managers are super useful they've got lots of organizational knowledge communication pathways and stakeholder management skills that come in handy if you're dealing with a major incident and substantial revenue loss. Have you connected your instant managers with your developers? Make sure they know each other by name they appreciate their codependency and they have clear expectations of one another. You want your developers to know they can call an instant manager for help when an instant is incurring a significant financial loss and or the blast radius goes beyond their own digital services. You want your instant managers to trust your developers to phone them when they're going get stuff and you want them to wait and help out when that happens. Have you mapped out your instant management process as is and as is intended? Identify all the manual and semi-automated activities you currently have all the outdated spreadsheets currently in use and all the time consuming handoffs currently happening between different teams and plan to eliminate all of it. Buy a SaaS instant response platform if necessary. Page duty is very good so is VictorOps. They have bi-directional sync with instant tracking systems like ServiceNow. Instant managers and auditors just love those automated ticketing updates. And have you run some chaos days? You don't need to go all Netflix and create some state-of-the-art fully automated magic monkey in a box that simulates the loss of an entire AWS region. You can use a test environment, you can rely on human expertise, not a 100% automated solution. You can have your most experienced developers act as chaos agents to think creatively and simulate unusual failure scenarios. You will learn a lot from this. Use this as a foundation for everyone to gradually build up their confidence in self-service incident management. All right, automating self-service instant management. Here's one way to do it. There are others, you know the drill. I spent some time with a broadband telco in Luxembourg once. It wasn't really in Luxembourg, but I'll say Luxembourg because I went there once and gay-crashed the party that I definitely wasn't welcome at. At this telco, they used to have alerts directed to an ops bridge team, an L1 operations analyst in an offshore centre. And the operations analyst received the alert. They looked at an outdated spreadsheet of people on call. They contacted an instant manager back in Luxembourg to co-ordinate instant response. And then they kept phoning operations to go back in Luxembourg until eventually someone answered and confirmed they would take on instant resolution. During each incident, the telco endured 5, 10, 15 minutes of revenue loss right there. I worked with the telco to implement ability run it. And preserving the role played by the instant manager of co-ordination and communication was a key emphasis for me. Here's what self-service instant management looked like afterwards. An automated alert on the left is routed into pager duty. And pager duty is automatically configured to create a ticket and service now. That's the telco ticketing system, their system of record. It's configured to create an instant channel in Slack with the channel name with the instant ID in it. That's for discoverability. So people can easily find instant response as it unfolds and silently learn. And it's configured to connect the alert to a service to a team and to an on call schedule and an escalation policy. And phone the current person on call in the schedule. The on call developer responds to the alert on their phone. The average acknowledgement time is around 2 minutes. Both the instant response platform and ticketing system are magically updated as happens with further updates during the incident. But there's a special step here. There's an on call schedule and pager duty for the instant manager. And if the on call developer needs their help, they just have to add that team to the incident. And the instant manager is immediately woken up to a wealth of information in the ticketing system about the current situation and who the on call developer is that's looking for their assistance. This isn't hard to do. Once you've got instant managers and developers on the same page and secured some budget for a SAS instant response platform. All right, concern five of six. Developers would be firefighters. Developers would spend all of their time on BAU unplanned work, Steve. They'd be configuring things, provisioning things, fixing things all the time instead of creating things. Our ability to churn out new features to customers would grind to a halt and they'd leave us frack of pesters. The ones with TV adverts, the ones with catchy jingles. When I hear this concern about BAU unplanned work, it sounds like you're frightened of it, but you're not measuring it or eliminating it. I hear a lot of talk about BAU and it's usually a synonym for unplanned maintenance work, so it includes things like code review feedback, upgrading infrastructure capacity, fixing defects, patching security floors, improving telemetry, implementing support tickets, repairing broken builds, resolving live incidents and fixing intermittent tests. Have you learned to manage unplanned work just like planned work? Insist that teams track BAU work in your ticketing system and visualize them on your board just like planned work for any work item that lasts longer than half a day. Tracking and prioritizing BAU work items alongside planned product features from a single backcourt makes it much easier for a product manager to understand where team time is being spent and how to make informed decisions about what the team should work on next. Have you visualized the per team rework rate? Make sure that BAU tickets in your ticketing system are labeled as BAU so they can be queried alongside feature tickets. Calculate the percentage of time spent each week on unplanned work tickets and that's your rework rate. Thanks to the Accelerate book by Dr. Nicole Forsbrunn, rework rate has become the de facto industry measure for unplanned maintenance work and it's also a proxy measure for technical quality. Track it once per week over many months and you'll be able to spot trends in it. Teams will better understand how it impacts their ability to deliver planned features on time. They'll gradually become attuned to that all-important culture of continuous improvement of finding a small problem and fixing it permanently before it becomes a big problem with only a temporary band-aid on it. Have you built paved roads and I don't mean a platform, everyone's got a platform even when they don't have a platform they tell me they've got a platform and I don't like to say they think they've got a platform but they don't have a platform because they don't want to have their feelings. I mean paved roads as Netflix calls it, Golden Paths as Spotify calls it. Fully automated user journeys to boost developer experience, self-service, just push a button and your service is fully equipped with build jobs, monitoring dashboards, production alerts, uncle schedules like paved roads are BAU killers and I know that for sure because Equal Express is really good at building them. I don't think an organization can deliver at scale with many teams, many services without paved roads because otherwise there are indeed a lot of BAU fires to fight. Measuring and eliminating BAU work. Here's an example from a commercial television network I visited a while ago. Let's say it was in the USA because I got in trouble once in a New York bagel shop for demanding a British-sized bagel that was still enormous and I couldn't eat all of it and then I was late for dinner with Dave Farley and he laughed at me. The data here is over a two-year period. We've got a legacy network platform in yellow, it's COTS, self-hosted on-premise, so I call it a foundational system. It had a delivery team making changes to it and a central operations team babysitting it and we've also got a new scheduler service in blue. It's a cloud-hosted digital service and it's owned by a product team doing new build at Uranit. Let's start with the left graph which shows deployment interval. That's the number of days between live releases over a period of time. The scheduler service here in blue average daily deployments every day for a two-year period which is mighty impressive. Deployees of the legacy network platform vary from fortnightly to monthly to once every six weeks. There's no predictability and you'll see there are some gaps in deployments and of course those were change freezes. I love a good change freeze. Right, let's go to the right graph now. That's time to restore data, the time it takes to resolve an incident and the live ability to resume the customers. The scheduler service in blue had few incidents. One took an hour to resolve, the others hovered at 30 minutes or less. The legacy network platform in yellow had more incidents, they lasted much longer and they caused more revenue loss. Some incidents were over three hours long which is downright scary. Now let's look at the middle graph with rework rate data. That's the percentage of time developers handled unplanned work. You'll see that for the legacy network platform in yellow it was very high. Sometimes 75% of their time was spent dealing with defects, infrastructure capacity, misconfigurations, intermittent tests etc. For the scheduler service in blue it was really low around 10% and there's a strong obvious correlation here with more frequent deploys and fewer shorter incidents and their low rework rate. What's interesting here is what the scheduler team didn't have. They didn't have configuration issues or infrastructure problems because they built the scheduler service themselves using cloud-based self-service paid routes built by another team for them. They didn't have time implementing code review feedback because they practiced pair programming and code review happened continuously and they didn't have a high defect rate because they used the free Amigos technique and their tester was embedded in the development of each task. The delivery team working on the legacy network platform couldn't have done all of these things after all they weren't running what they built but they could have done some of them. What really set these teams apart was motivation because the scheduler team did you build it you run it. They were committed to measuring and eliminating BAU unplanned work because they felt the pain of it and they were empowered to do something about it. All right concern six of six we're nearly there and it's my favorite we can't hire a DBA for every team. DBAs are so expensive and so hard to find Steve we can't embed them in teams they'll have nobody to talk SQL with we can't let developers touch SQL the last time a developer ran a query the database crashed we've only got three DBAs and they've been here for years I don't know their names but I call one of them Katie I hope that's her real name this sounds to me like you haven't made repeatable specialist tasks self-service yet. This concern isn't just about DBAs it's about any small sentry team of specialists acting as operational enablers right back up to the start of the talk I described how you build your own it works an instant response and I said there were a number of operational enabler teams like DBAs network admins infrastructure analysts operability engineers there are others too. Have you understood that hiring many DBAs for many teams is not you build it you run it think about it as cross-functional product teams building and running their own digital services but nowhere does it say you must jam a DBA to every team you must force a DevOps engine into every team even though nobody can explain what DevOps is and it's been 15 years this is a logical extreme that's unnecessary have you entirely rejected the idea of embedding specialists in teams like discard the idea of hiring loads more DBAs and putting them into teams look I get it okay developers don't have DBA skills you don't want them debugging a live database with millions of rows of user data and at the same time a central DBA team will see their workload increase as you add more teams they'll have more priority clashes slower progress and burnout or all possibilities but there's a consistent scarcity of affordable DBAs in the marketplace DBAs will be stretched across many teams their workloads will see so between nothing and everything they won't feel part of a tribe they'll become lonely have you automated repeatable specialist tasks map out the work your DBAs actually do automate away repeatable tasks push as much onto your cloud provider as possible and free up your DBAs to concentrate on one-off expertise requests follow the general continuous delivery principle of automating the tasks the machines are good at so humans can concentrate on the tasks that they're good at okay how to make repeatable specialist tasks self-service and removing repetitive manual or semi-automated tasks from your DBAs here's an example mapping of the specialist tasks I'd expect a DBA team to work on and you can split the tasks into repeatable low value repeatable high value and finally ad hoc high value the majority of work here does not need a DBA it just needs a machine and the permissions a DBA has that should not be shared with developers at the top repeatable low value tasks just offload these to your cloud provider don't overthink it the more toil you can push into a cloud managed service the better for example you might have an on-premises Postgres database with one or more DBAs acting as baby sitters lift and shift into AWS Aurora and the AWS do all the babysitting for you if someone mutters about the migration cost show them the TCO for hiring more DBAs at current market rates and retaining them plus your server costs ongoing all right in the middle repeatable high value tasks turn these into self-service automated deployment pipelines for product teams to use themselves the easy examples here are crud like creating a database schema DBAs don't need to do that themselves but developers shouldn't have permission to do it really nearly ensure your developers lend their expertise to your DBAs to create the deployment pipelines you need encourage your DBAs to treat it as an opportunity to learn more about outside in usage of their databases and encourage your developers to take the time to learn something more about databases and make sure there's some decent monitoring on the pipelines ad hoc high value tasks right now we're talking make sure developers know they can reach out to DBAs for complicated database tasks that aren't repeatable and are well beyond developer expertise debugging live database promises a good one that's where DBAs really shine now you'll notice that this kind of classification isn't tied to DBAs at all you can apply to network admins as well if you want more infrastructure analysts for example your network admins should be lifting and shifting any on-premises DNS networking and firewall management into AWS route 53 or its equivalents in a different cloud provider now full disclosure these changes are hard and they can take time more importantly your DBAs might freak out and not doing these tasks anymore and that's important sit them down and explain to them their job isn't to do typy type they're not paid to code they're paid to think they're paid to solve higher order problems and these kind of changes are about empowering them to help the entire organization benefit from their expertise this is how you solve these specialists at scale problem all right in my experience these concerns usually boil down to we haven't done this before and that sounds like we don't know how to get started that's totally okay change is hard that's why companies like equal experts exist share the mission tell the organ in time organization you need to try different operating model pick a pilot team find a team that's well suited to trying you put it around it don't try with all teams at once and change the mindset make sure your people know it's okay to fail it's okay to make mistakes and it's important to learn as they go along all they have to document their findings and share their progress and above all else keep going so we've got a question i'll come to in a moment uh takeaways for when you're back at work give your developers what they need balance financial exposure and on-call costs trust your people to do the right thing make instant management self-service measure and eliminate bau work make repeatable specialist tasks self-service and remember that you build at your run it is great and it can work here so thank you very much for coming thanks to our journey for having me i'm going to quickly go to questions as we're nearly out of time so thank you for your questions won't this introduce a lot of context switching which is impacting developers productivity teams following scrum plan their work during sprint planning any alterations will cause churn yes that's a very good question and i hope that i touched on that when i talked about measuring and eliminating bau work if your developers aren't doing you build at your run it then that bau work is still happening it's happening elsewhere in your operations team and that's still costing your organization a lot of time and money okay but the problem with an operations team is they can't eliminate that bau work themselves when a development team is on call they have to tackle bau work because it's in their interest to do so because they're the ones on call and they have all the tools need all the missions they need to actually make those changes in the example of scrum i would certainly hope that in sprint retrospectives the team would be measuring the amount of unplanned work they're taking on each sprint and then they would use that to factor into their sprint planner they'd understand that this much time a week is being spent on unplanned work that leaves us this much time for planned work and these are the stories that will take on in the next sprint and of course that amount of unplanned work should be coming down over the weeks okay i hope that answers the question if it doesn't please do contact me on twitter i'm steve smith underscore tech and i'm on linkedin as steve smith tech if there are any other questions i'm very happy to take them now otherwise thank you very much for having me that's steve for the session thank you somebody bye everyone bye