 Thank you very much. I also thanks to all of you for being here instead of being over at Matty's, even though he's like DevOps famous. Thank you much for being here. Life after DevOps, I'd like to talk to you about sustaining engineering as a discipline and how it can solve a few problems that you may have in your organizations. You can find me at Tim at Bonsey.net and the slider up on SlideShare. I have three main points that I'd like to cover, and then there's time all set aside for Q&A to answer any questions you might have. And if we get through all that, I've got some bonus content where I'll go into the issues I still have and the problems I still haven't worked out here. So I'd like to talk to you about the software lifecycle, basic stuff. I'm sure you've all seen something like this. For any given product, you play in design, build, test, gather data, repeat. This is probably not surprising to anyone here. I'm sure everyone's seen something like this. I personally don't like to use the arrows here because I know sometimes you go backwards. You're going concurrent many times, but this is very familiar to most of you, I'm sure. There's also this software lifecycle that I'd like to refer to where you have a piece of software that starts off in concept, you prototype it, you do your alpha-beta testing, and then you do your release at which point. It's live to customers. There's still some feature growth. That happens, but once you get past your alpha-beta releases, you now have a live piece of software that is in what I consider the sustaining part of its lifecycle. That's going to go through a supported phase. It'll eventually be deprecated and retired. Any piece of software that goes live is going to have those steps that need to be covered. I break this up into two parts, the traditional development part, up through your release, and then your sustaining part, once it's live, the customer has to remain live. There are some complications that arise when real-world situations come into play. I've got a few of those. Names may have been changed to protect the innocent. Facts may be just fabricated, but here are three common types of problems you might see. You've got a team of two who build the bulk of a service, leave for different companies, leaving it in the hands of their squadmates. What do you do? Who's responsible for that service? You've got a problem on your hands. You've got another team, two members build a thing, move on to new squads to meet new corporate priorities. There's a restructure, a reorg, the lateral move, however. There's a third teammate who was there but was on leave for part of it. Who's responsible for that service at that point? You're in a bind. You've got a problem. You've got your friends in the Butterbeer Squad. They've built four different products. As with most teams, they've spent some membership changes, some join, some leave. There's only one left with the working knowledge of the two legacy products because cross-training was deprioritized. We always wanted to get to it. It was something that we knew was important, but it just never bumped up. It always got pushed to some point in the future. So now she's got a sweet job offer and what's going to happen to those legacy products. So these are the problems that arise due to real-world circumstances and I'm going to suggest sustaining engineering is something that we can use to solve this. So here's how sustaining engineering could look. I'm going to talk about sustaining engineering as a discipline and also as a team. There's the DevOps mentality that I've heard a lot of times where you build it, you own it, it's yours forever. I think that that's all nice and good, but I'm proposing something slightly different. DevOps is a methodology, a way we do things. It's a culture about collaboration. It's not necessarily about a team and we don't necessarily need to maintain the connection between the team that built it and the team that sustains it. So when you have a service that needs to exist unchanged for the foreseeable future, there's work there that needs to be done. There's a distinct class of work that I call sustaining engineering, but DevOps does not preclude that being done by a different team. The transfer of ownership from one group to another group is certainly advantageous under certain circumstances, such the ones that I laid out on the last slide. So I'm going to talk about sustaining engineering as a discipline and sustaining engineering as a team to keep the two things separately. I'll be referring to the sustaining engineering team, the group of people as the purple cross. So when you see that on a slide or you hear me say it, purple cross is going to refer to the group of people and not the discipline as a whole. So our purple cross team going to start up describing them as a persistent group of engineers. They have a catalog of supported services. A little bit of humility here. These aren't my ideas entirely. I didn't invent this stuff. I happen to do it and have some strong opinions about it, but we all kind of stand on the shoulders of those that came before us. So I'm actively trying to make my team better and draw out thoughts from you folks. So I'm happy to hear feedback from you later during the question part. So I don't have all the answers, but I will tell you what I do know. Sustaining engineering is something I first heard about 10 years ago from Microsoft. It's also something that's been used a long time in the defense industry when they have a piece of hardware that is long lived and needs to be maintained. They call that sustaining engineering. I didn't bother to source either one of those claims, so there's a giant grain of salt for you. Staffing is super important to me. I think anytime you're designing a team, you need to understand what your staffing looks like. So I'm going to talk about the type of people that we want to staff on our Purple Cross team. We don't talk about staffing enough as an industry, and I think it's super important that we understand what the challenges are, what our definitions are, and how we really see these teams coming together as actual people that are more than just a sum whole. So there are two groups of people that I see as the ideal staff for our Purple Cross team. The first are our junior engineers. These are people who don't have a ton of experience and need to still gain some exposure to different things. They're going to want to see something different every day. They're going to need to kind of build some expertise, see what they love, and find what they may want to specialize in. So sustaining engineering is a great place for your junior engineers to get that exposure to different concepts. You'll be working on a networking problem one minute, then you're going to jump to DNS, and after that you're going to have to figure out why this thing in the Linux kernel is doing this. So all in one day, you're going to have a broad exposure to topics, and that's great for junior engineers. The next type of staff that we need are your hard and bad asses, people who have seen some shit and know how to handle it. It's the end of the world, but don't worry, we're going to make it through this. I've seen this before, so you also need your experienced engineers on the team. People who are going to be able to operate well in a fast paced environment and for the critical things that they support are not going to be dragged underwater by them. So junior engineers, experienced engineers, I'm kind of saying there's a gap in the middle and that's okay. We don't necessarily need all of the engineers on this team to stay on this team for the entirety of their careers. The junior engineers will grow, they'll get some experience in things that they might enjoy, and they may go off to other teams. Just understanding that that's what your staffing looks like is going to make you more successful at making sure that your team, your sustaining purple cross team is itself sustainable. You're going to have a hiring and staffing plan, and you're going to be able to get the right people in, and you'll be able to grow the right people out. And maybe bring them back in as your hard and bad asses. So the purple cross team, persistent team of engineers, staff with a mix of junior and experienced. So this is who they are, now what is it they do? When people hear sustaining engineering, this is the thing I hear. Oh yeah, you guys fix bugs, right? Yes, absolutely sustaining engineering fixes bugs. So that's what everyone knows, but there's also a lot more that sustaining engineering is about. And particularly the things that are important and most valuable are the long-term architectural changes. Things that for any long-lived piece of software, any service, any product, there's going to be things that you're going to need to tweak. And not necessarily because someone made a bad decision early on in the process, it's not like someone decided that we'll just deal with, okay, this database column is going to grow at some point in seven years. It's going to overflow, and that'll be a problem we'll have to handle down the road. That's not necessarily a bad decision you've made in the moment, saying we're going to handle that in seven years is a perfectly reasonable decision. Also things like your backups, your backup strategy for an active product that you're doing releases to multiple times a day. Your backup strategy is going to be much more aggressive. Your disaster recovery strategy is going to be different than it would be for a product that is going to sit with no intentional changes for a long period of time. Changing up how you handle your backups, how you handle your disaster recovery is absolutely something that needs to adjust once it moves to a sustaining model. You also have things that crop up due to the nature of some systems changing. Patches coming in underline architecture when your software isn't changing. All of a sudden you're getting deployment time creep. That's a problem that comes up. You just have to fix. There are also instance types are going to get cheaper. You're going to need to make sure that your stuff was going to work on a new EC2 stuff. You're going to want to move to the more cost-efficient stuff when it comes out. These are all things that need to be handled besides anything actually breaking. This is all stuff that you need engineers to be able to understand and fix. Also, you're going to want to eventually deprecate and retire some of these services. Not everything you build is going to live forever. Some things you're going to say, okay, this validation service we needed at one point, but the things that relied on it have moved to something else, we now need to retire this. This stuff can die and go away, but you need an engineer to make that change because if you just have someone hit a button and not all of your dependencies were in line, something breaks. You need engineers in charge of this stuff. Our sustaining engineering team, this is who they are, this is what they do. Resolving incidents, fixing bugs, long-term architectural health. I'd like to use a metaphor here of the wet basement. Sometimes when you have a problem with a service, stuff goes bad, all of a sudden there's water in the basement, people are going to grab buckets. You've got to get that water out of there. But we actually have engineers with a true engineering discipline to handle these problems because sometimes just using buckets isn't efficient. You're going to need to install a pump. And once you get all that water out of the basement, you're going to need to figure out where all that water comes from. How do we keep it from happening again? We need to move from a mindset of just bailing water to installing pumps, figuring out where the water came from. Do we have aging plumbing that needs to be addressed? Do we need to install a drain in the hill behind the house so that we keep this water from coming in? These are all kind of engineering decisions that need to be made when incidents happen. So it's not just about, okay, all the water's out, let's move on to the next problem. You need engineers to understand these problems, make the long-term architectural changes in order to keep these things happening. Topic III, once we've decided sustaining engineering is something that we want to do, how do we enable it? And once we've done that, what does this enable us as the business to do? I'd like to start first talking about transitioning as a service. Easiest thing I can say about this is this is not throwing something over the wall. Like everything else in DevOps, this is around a conversation. This is different teams working for the same end goals. This is not something that you're going to take and say, yep, we're all done with this. This is yours now. Have fun. Great. Thank you very much. When we're transitioning a service, there's going to be some concurrency going on and some back-and-forth about, is this healthy enough? Do we have our documentation, et cetera? It's also not a second swing at the backlog. We have moved from the feature growth phase to the supported and sustaining phase. It's not about, okay, we're done with this. This is now going to the Bush leagues to put in some of these last things. That's not what sustaining is about. Making sure these things last and these architectural changes are made to keep these things lasting. What transitioning may be is an alignment on business fundamentals. Let's talk about what it is this thing does. So any decisions that we make are aligned to the end goal. We're not going to sustain a piece of technology for the sake of that technology lasting forever. That's not really useful to the business. If we know that the business needs an address validation service in order to get to these outcomes and the sustaining team's job is to make sure that address validation service stays up to make sure those goals get met, then their job is to make sure those goals get met one way or another. Sometimes you can even combine services, deprecate one over another, say, hey, this other team actually built one of these. Let's switch everything over to using that and that's the better decision. So we want to focus on what the actual business outcomes are. Sometimes this is also a tough conversation. You will have some things that need to be transitioned that are not in a great state. If you run into one of these, yeah, we have this system. You need to take it. No one knows a thing about it. We don't have the passwords. Good luck. Here you go. These can be tough conversations. The solution there is sometimes you inherit the code but not necessarily the architecture. You get it up and running elsewhere and make sure that you can swap all the dependence over. The services that we like to transition, there's a couple different patterns here that I'd like to talk about. Lighthouses, this is an analogy I like to use where you've got a product. It's super critical. It needs to run every night. If it doesn't run, it's pretty bad. But luckily it's very well automated. It's a simple system, so there's not a lot that can go wrong. If anyone here is a lighthouse expert, just tell me about it later how I'm doing it wrong. But these things are critical and they're simple. This is a great service to transition. This is the kind of thing that you hand to an engineering team. Here's your documentation. You're off of the races. You're good to go. The other analogy kind of on the other end of the spectrum, dogs. Everyone loves dogs. They're not necessarily business critical, but they make people happy. They deliver joy to the customers and employees that they use them. They're sometimes finicky, but overall they're stuff that people like having around. They need daily care and feeding, not necessarily a problem. Now, something we don't want to get handed off to us are puppies. Puppies are services that don't have that business value baked in yet, and it's more like, hey, we just built this thing. We think it's kind of fun. The house breaking should be in the next patch. Don't worry about it. The spaying fix is scheduled. So let's get those puppies matured into dogs before we go ahead and move them off to sustaining. Thank you very much. The reason that dogs, you might not think that that would be something that would be great to transition daily care and feeding, but the reason we do that is it scales really well. What you need to put in place, like automation-wise, process-wise, to go from supporting zero dogs to one dog, yeah, that can be a bit of an adjustment. Going from one dog to two dogs, almost nothing. You're going to be able to apply those same procedures, that same automation, and you can scale that up very well a dozen or more dogs without really batting an eye. So those services are going to scale really well. So once you have your Purple Cross team in place, what that is going to enable the business to do, you're going to be able to start building a culture of sustaining. What I mean by that is the people that are going to want to start to transition the teams, the people that are building these technologies that are going to want to transition the products to the Purple Cross, you're going to put together documentation in a better way because you know that documentation is going to be read by someone else. That's not just, oh yeah, I need to know about the ham sandwich thing because I know what that means. You're going to write it for someone else to read. Also with the documentation, things need to be written down why those decisions were made, not just how they were made. The whole code as documentation thing doesn't really work when you're going to hand it off to someone else and you don't always bake in all of your underlying assumptions. I can read through the code and know how this thing works. I can trace through and understand it, but I don't understand why you made this decision here. It may look like a bug to me and I fixed that and all of a sudden there's some business goal that's not being achieved because I didn't understand that. That can be fixed through a comprehensive test suite if you've got a good enough test suite that actually handles all of your business outcome cases as well as all of your unit test integration tests. But the documentation really needs to be built around a culture of sustaining where we understand why the decisions were made and what the business outcomes were trying to achieve are. So something that also arises from this, we get to a series of defaults. It's not necessarily governance, but once there's an understanding that we're going to build this thing, we're going to hand it off to other people. If the engineers that are building this thing know, all right, we've got to put in some monitoring platform on this. It doesn't matter to us really which monitoring platform. It's an arbitrary decision. If we know that the sustaining team prefers Sumo Logic and New Relic, if we don't have a good reason to use something different, we're going to plug in Sumo Logic and New Relic and we're good to go. We don't waste time making that decision. We don't put in something else arbitrary that we then have to teach a new team how to use and your Purple Cross team has to end up supporting six different monitoring platforms. We can coalesce around a source of defaults without it really needing to be top-down mandated or some sort of governance board deciding, here's how we're going to do everything. It's just about creating a set of defaults to answer those questions like, if I have to make an arbitrary decision instead of going to Google for 15 minutes, I'll go to one of the guys in the Purple Cross and say, what are you guys going to prefer we use for our monitoring here? And it's easy enough to have that conversation. The holy grail here is to take that conversation and have those discussions around, you know, what do you prefer to use and make sustainers actually a stakeholder and push those discussions left if we can talk to the developers that are building things that they're going to want to transition, then we're going to have time to actually say, all right, you want to use a new monitoring platform because there are these things involved. We're going to have time to train up on it. We're going to know the intricacies of that so when it comes time to transition, we'll be ready for it and we'll also be able to provide our inputs to say, okay, you want to do it this way. In AWS, we prefer to do it this way. But if you don't have a good reason, let's make sure we're standardizing so that all of our automation just plugs right in. So ideally, we would like to have those conversations as early as possible, but every stakeholder says that. So again, this can be a little bit difficult. DevOps really wants to include everyone as early in the discussions as possible, but I understand sometimes that's not possible. But please, if you're going to adopt sustaining, try and have those conversations as early as possible. Now I've come to the point where I've got time for some Q&A. So if you have questions, I'd be happy to field them for you. While someone comes out here maybe with a microphone to see if there's questions, these are the takeaways that I'd like you to focus on for this. If there's engineering involved, there will be sustaining engineering. Any service that goes live to customers is going to have those tasks that just need to happen, those long-term architectural changes. Either the team that built the thing is going to do it, or you're going to pass it off to another team. But that's engineering that has to happen. And involved with sustaining a stakeholder as early is going to allow your business to function more efficiently. So what questions do you guys have? Have you had any luck trying to integrate, say, remote teams as a sustaining engineering platform? It feels like a softball. We actually, at Vistra Print, our sustaining engineering team is split from Barcelona already. So that has worked well, but we also have different development squads that are positioned around the world. So it's just part of our culture already. If it weren't part of the culture, it may be a little bit different. So it may be harder if that's your use case, but for us it's a natural fit and it certainly works. How are you? It obviously makes sense. In my experience as an engineer, while you would need sustaining engineering, obviously a very 100% business need, but how do you make it attractive? That's one of the things that I would argue as, like, it's there, it's a given, everybody knows, but how do you make it attractive? Because if I was told, like, oh, we're going to put you on the sustaining engineering team, I would almost be like, oh, what am I going to learn new here? I guess I was going to try and look at it as a really positive thing. Okay, so you're saying attractive for people to work on the team? Yeah, so if you need to attract that talent, like you're saying, you want to build that team, you want to attract obviously some good people, so how do you make that attractive? Yeah, if I can go back to the staffing plan, your junior engineers are going to have exposure to a lot of different things. They aren't going to go on to a team where they're going to be heads down, working on debugging Java all day every day. They're going to be able to be exposed to a lot of different things and a lot of what's out there. So it's a good place to kind of get your feet wet and get some exposure to a lot of different things that actually happen in production systems. From the experience engineer side of things, we're looking for the type of person that is who are looking for a fast-paced environment, who are looking for something different every day. It's not a fit for every experience engineer, for sure, but there are some tough architectural challenges that need to happen on aging systems that can attract some people. So it's a great place for probably most junior engineers. It's a great place for some experienced engineers who are looking specifically for some different things. There are people who want to be able to jump around, work on this, make this change, jump over this thing, make this change, jump over this thing, make this change. So it's kind of a personality thing that we look for in the experience engineers. So the question was what's the difference between this mechanism and a traditional operations team? In this case, the sustaining engineers, the sustaining engineering team, your purple cross, is actually in charge of the code underlying it. There's no one to... You don't say, okay, we found a defect and hand it back to someone to fix. You say, we found a defect, we're going to break this thing open, we're going to fix it, we're going to make it better, we're going to change the entire platform underneath it. We don't need to ask it in its permission because we own it. Does that make sense? Great. I think we're at time. Thank you very much for coming. If anyone else wants to provide feedback, there's a four o'clock open space that was unclaimed, so I'll be talking about sustaining there. Anyone else who wants to talk about it more? Thank you very much.