 So first off thanks everyone for coming as Matt said my name is Kevin Harris and I'm gonna talk you through how the company I work for in Nova we did more of an evolutionary approach to DevOps. So first off a little bit background about myself I'm currently a system engineer team lead at Inova I've been doing systems for over 10 years now long enough to be doing it back when they called us sysadmits. I've been building platforms and tools that help get applications and technologies out the door so we can actually solve business needs. I've done this for university startups enterprises so I've done every kind of DevOps transformation from non-existent to rapid rabbit changes back and forth to kind of more freighter kind of directed changes. So a little bit about Inova the company I work for we're an online lending company that's publicly traded all that goes to say is we're highly regulated and have lots of compliance issues we have to deal with. We also have about 1200 employees 300 which are technology and analytics engineers with six distinct products that we offer. We're hosting both AWS and in our own data centers yes we still have our own data centers. These are primarily written in Ruby and Go and the company is about 15 years old. Why am I telling you this this all goes into why Nova made a more kind of slow freighter evolutionary approach to how we did our DevOps transformation and are still are. So what do I mean when I talk about an evolutionary approach to DevOps? Really what is that? So we all know change is going to happen at any company we're at and that change can come by fast rapid or sometimes it can be more slow paced and frustrating and hair pulling when you can't see the iteration of the rate that you're wanting to go. So really what we had to look at was as we wanted to involve and embrace more of these DevOps features technologies principles how can we evolve those into our company and make our company faster to deliver products to our customers. So we really wanted to focus on improving our ease to launch new technologies new services new ideas and also at the same time improve our happiness of both our engineers that are building those new technologies and engineers that work with me on my team that have to support help get those out the door make sure the infrastructure is stable and running so it's a combination of both making sure things can get out the door fast but also stable and make sure we're doing it in a in a scalable fashion but at the same time we wanted to minimize risk to both our customers our developers and engineers inside and make sure we're doing things that keep us compliant and we're not having any kind of regulatory issues while also disrupting the impact we have on our engineers is they're doing those day to day things because some things have hard deadlines of we have to make sure we're meeting this compliance regulation by this date and if we don't we can't do business so we had to balance both that risk and that disruption while we're looking to make these changes so that's why we did a more methodological kind of targeted approach as we're going think going out the door instead of the big bang approach where a sea level from top down sends a memo that says hey we're doing this today and if you don't agree the doors over there we did a more let's survey our landscape let's see what's out there how can we go and fix that of course as we're going along this journey we failed lots of times I'm not going to tell you that our journey is perfect no one's journey is going to be perfect the key thing from those failures is learning from them and understanding what we can do better so I'm here today to share kind of five key lessons we learned through failures on our journey so hopefully you don't run into those same failures so lesson one the first lesson revolves around how we actually figured out who was going to kind of spearhead this first wave of our DevOps transformation so that revolves around what kind of teams who's going to do that and of course we started with a thing a lot of people do and then usually change off we had the DevOps team they're going to come in and they're going to do the DevOps they're going to DevOps all the things didn't really work out so well that just ended up being a catch-all for engineers that didn't fit on other teams or scopes or projects teams were like oh that's not a feature that's just something that you need to do to get on ops or make our deployments easier that falls to the DevOps team the DevOps team can do that so we quickly started seeing that team scope start creeping larger and larger as well as the engineers team size start growing larger and larger where it was too large for a single manager to maintain and no one could keep up with what everyone else is working on on the team so we reevaluated and looked at that and decided how can we better split this up to serve the actual demands and cheat and achieve this change we're wanting to go so instead of a DevOps team we started building platform teams what I mean by platform teams is each team had to define scope to deliver a product to our customers and they were off that was known to deliver functionality internal or customers that they needed so how did we align and decide how what or needed a platform will be aligned by internal products I don't mean an AWS team I mean a team that was supporting the platform that our application services ran on or a team that does our CI CD that builds out the platform that pushes your code from an idea to production build out an observability platform team that builds out that toolbox for here's how we do metrics here's how we do all the observable observability tools that you need so you know what's working in prod so by having those products that are driven by your internal customers and users around the services required to build that we could build out those core platform teams to help solve start solving those real problems how do we handle the scale and the engineer creep of those teams well we made sure not by any mandate just by the actual efficiency of the team it's kind of fell down to the whole two pizza team size so we had engineers that were about seven eight people per team large enough that if you had a team lunch everyone could sit at a table and have a conversation with each other but that wasn't a mandate or anything like that we found out just through natural workloads and through the engineers that when we got too large things started falling through the crack it was hard to keep up with whatever everyone was doing so by having those smaller teams we're able to kind of better know what each other were doing on the teams as well as stay on top of the work that was coming in so to recap we first solve those problems of who's going to do the work by instead of having one bucket team that does all the DevOps by focusing on specific platform teams that have defined criteria of what they work on and are tightly scoped in scale and engineers to make sure that they can keep those running so the next lesson we had now that we had these teams that are dedicated that are doing the work is how do people know what they're doing that's the key problem lots of engineers run into is a how to your customers know how to engineers know how does leadership know what you're actually doing because if you don't have a C level that's coming down with a mandate that's saying we're doing this your teams need to be the ones that are trying to get that C level and those engineers to go yeah I want to follow you down that path that path seems really exciting how do you do that well those teams had that internal vision before of what they needed to do they talked to the engineers we were much smaller at the time when we started so we had an idea of what the problem space was that we needed to solve the problem was is that vision stayed internal to those teams so what we really really needed to do was create an external vision for those teams and by an external vision I mean take what is in your head and make sure that it's available to everyone else that might be a new engineer coming to your team or a new engineer joining your company or even better the C level CTO who you might need to convince hey this need we're not getting traction this is really important and show him why that's important so how did we actually transfer that internal vision to an external vision first off we set up team charters we made sure teams wrote down on public documentation what was the scope of the team how the team communicated how if you needed to work or you wanted to interact with that team how to go about that but more importantly what the team didn't do then that gives us a point in time to tell people oh if you need to if you need help with your deployments or your observability or whatever other platform you could point them to that specific platform team that got us out got us out of that catch all one team where everyone just goes and goes well you're the DevOps team so I'm just gonna come to you and get my DevOps where we could say no we're glad to help you we need talk to these people over here the other thing was that as we started building out projects and work that we are going to do to solve these problems we needed people to know what those projects were what we're looking to deliver and more importantly what they could expect to see in result to do that we needed to have public project plans project manifestos project documents what everyone to call them but just have a public written down form that says we're going to solve this problem here's how we're planning on doing it here's kind of what we're expecting you to do to be able to use this and here's what the end result looks like now ideally you do those as early in the process as you're talking to your users of solving those problems that way you can have a collaborative document because you don't want to have it where no one knows what you're actually working on or even worse you build something out and they disagree with how it's solving the problem and then the final thing is those of you that use AWS and some of the services might notice they've started making their roadmaps for their services team public on GitHub this is great it allows you to know what's upcoming in those services you use we do the same thing internally with our teams we have roadmaps that scale out for one and a half years all the way down to quarters and then just two real-time backlogs but these roadmaps aren't rigid where it's saying we're committing to do this thing that we said we're going to do one and a half years from now because our customers could change our landscape could change we might change how we're doing business so what you need to have is that flexibility in there and by having those kind of long-term visions that are then divvied up into smaller chunks that are in more real-time allows you to commit to projects that you're going to deliver to your customers so they know and expect what's going to come well at the same time giving people that high-level big picture direction that more of your upper management your C levels might actually need so they can get on board with you so to recap once we had the teams that were doing the work we needed people to know what work they're wanting they're looking to do and how they're looking to do that so we solved that problem by having those teams create public external visions of what they're currently doing what they're looking to do and more importantly what's the scope of their team so next up lesson three was once we had those project plans and those roadmaps how do we fill them out so as we started to fill them out early on we're always like hey there's a problem we have Kubernetes could solve that or AWS could solve that or the latest shiny new tech could solve every problem we wanted to have so we started chasing down shiny new techs and going hey yeah that's gonna solve all our problems oh wait we have to do x y and z for us to even first be able to use that well that might not work well what about this over here oh well we have to do that okay so we were starting in the problem where we're jumping and chasing a bunch of shiny new tech instead of what we needed to focus on was making sure we're using the right new tech and what do I mean by the right new tech I mean the tech that more importantly actually solves the problems that your those platform teams are looking to solve lots of problems can still be solved by bashing Jenkins might not necessarily be the best solution but it might be the best solution for you it doesn't necessarily need to be solved by the most complicated Kubernetes deployment structure infrastructure out there sometimes it's more important to know what the right tech for your situation is so how do you know what that right tech actually is first off context rules everything around me so how do you know what's the right tech context without contextual information you don't know what's the right tool so what's that contextual information maybe it's how much how much time you have engineer wise to dedicate to a project maybe it is how much work is it going to take to get to that point do we have the skill necessary or can we get the skills necessary to achieve that in project that we're looking for a doesn't even solve problems we're looking to solve or is it just a shiny new thing that allows us to write cool blog post to give conference talks so with that contextual information you could determine whether or not something is just shiny new tech or the right shiny new tech and then also with that context you need to focus on making sure that you're actually choosing things that are helping solve your problems over just giving you hype and street cred in the tech scene so we made the same mistake where we chased hype shiny new projects and then we ended up not solving problems that we needed to really solve so we had to reevaluate and focus on what problems are we actually solving and is this tool actually solving those now at the same time when I'm talking about making sure you're using the right new tech the right new tech might be the tech you're using today but it might not don't fear the sunk cost of the existing technology you have if your context contextual information tells you that it's a better choice to move to something else change is going to happen like I said so you need to make sure that you're changing at the appropriate time with the appropriate tools so definitely don't fear having to change just because you have a tool that solves the problem kind of so to recap context is king context is everything without context of your company how your company works the problem space you work in and how your engineers solve problems you can't pick the right technology to solve your problems and if you choose just based upon blog posts conference talks or whatever is getting the most traction on Twitter or GitHub stars you're going to be choosing technologies constantly and jumping around and usually that's not going to make fun for anyone so the fourth lesson we ran into as we're going around this is now that we kind of had an idea of the right technology we needed to ship to our engineers to solve these problems what do we actually do and how do we get it so they can use those problems and get those products solve those problems so the first way we kind of started out is how a lot of people traditionally ship software we went away off did our thing we talked to them early on we got our requirements we talked thought we knew what systems that they needed to get fixed we went off we build it we came back to them maybe maybe a couple sprints later maybe it was months later maybe even worse off it was a year later and we came back to him said we built you this Kubernetes cluster and they come back and go oh well we don't need containers today anymore we have these bigger problems we need to worry about so you spend a year building all the stuff that isn't solving a problem or maybe it was you went you're like oh yeah that problem with all those users getting bad unicode in the font in the user names we fix that bug let this problem they go oh all that work you spent fixing it there we just sanitized on the UI and fixed it on the front end it was a two-point story for us to get that out the door and so by going away building it and coming back we ended up solving problems that weren't the problems of today we were solving the problems of yesterday and causing ourselves to do a lot of extra work so essentially once we came back and built it the engineers essentially just like cool you did some work shrug maybe we might use that in the future but really what they're telling us was one what value are all are you actually giving us as your team and are you doing anything that's making my day better and making it so I can achieve my goals faster so how did we get it so we were actually making it where we were solving today's problems instead of chasing yesterday's problems first off that came from realizing that we always needed to be shipping so always be shipping going through and making sure we're delivering value to those teams means building an iterative product an iterative platform something that we can easily add features to work on without breaking everything else so that means getting out the mindset of I get my requirements I go to my engineers we work on it for six months we come back it might mean hey here's an idea of what we're trying to solve go back build out a prototype come to them and say hey here's this prototype give me some feedback let me know what's working like how is the inputs looking is that kind of still what you're thinking about needing to do does the results look and match up what you need to do and so just always keep iterating on those products and making sure you're taught in constant communication with your customers and users about the problems you're solving I know my team as infra engineers we have a tendency to think we know all the problems and we could just solve all the problems of the company but unfortunately we don't always know all the problems we need to talk to those software engineers those analysts those analytics folks that are trying to solve the same problems of the business and talk to them and go how can we actually make your job easier today and work with them to make sure that's a case we also needed to get out of the mindset of perfection and quit chasing perfection because as is mentioned earlier code is never going to be perfect there's always going to be issues with code and so it's more important to focus on solving 80 85 90% of the problems for that your engineers are currently expecting rather than holding away for an extra weeks months never to get it where it's 100% perfect it's much better to iterate ship something that's practical that's solving today's problem rather than focus on something that is perfect and you're sitting there always waiting so as we kind of went around this and figured out how we could solve these problems what we needed to do was instill this into a little bit kind of very simple runbook of how we actually solve this first off find those excited users they might be excited or they could be angry but those are your users find people that have a problem that you can fix you need to fix and that is available and ready for you to work with give them a solution a product a service a tool whatever that might solve an initial part of that problem listen to them complain about it because it's not gonna be perfect and they will have lots of complaints fix those complaints repeat as necessary so keep running through that and that's how you'll actually be able to solve today's problems as we do instead of chasing the problems of yesterday so to recap the biggest thing we needed to focus on as our teams once we determined we had teams we knew what to work on and we knew what was available we had to make sure we're solving the problems of today by delivering an iterative product and constantly talking to our customers without that constant communication and that constant deliverable we were never it we wouldn't would not have been able to achieve the pace that we could solve today's problems we would have constantly fighting fires in combination of delivering product solving yesterday's problems so the fifth and final lesson for today that we ran into was how do we actually get this system that we built out to scale up into the right we got it kind of working for these internal platform teams but as I said earlier we wanted to improve the happiness not of just our engineers that are trying to ship new technology new services or anything like that we also wanted to improve the happiness of the engineers on those platform teams you don't want to just flood them in toil where they're constantly taking request tickets or human chat ops or anything like that so how do we scale this so actual engineers can service themselves so we started off with this problem of how can we do this by as most people do we wrote a bunch of bash we took a bunch of random bash scripts some might have SSH'd into boxes and things like that and we just wrapped a nice shiny new bash layer on top of it and said here's your one interface to build everything you need and it would work once out of every 10 times maybe once every five if you're lucky so we had this problem of basically trying to shove square pegs into round holes and we're like how can this be better we can't actually give this to people and say yeah your life's gonna be so much easier run this tool that works maybe less frequently than you would actually expect so how do we actually go about that we had to make square holes we had to cut out those round pegs and make them square holes so we could put the square pegs in there and we did that by building common interfaces and what I mean by common interfaces well that bash was turned into maybe it could be actual API calls to the devices is interacting with or maybe we could build out Terraform modules and have things to find a Terraform that is a common interface out there or even better instead of someone running a bash script on their laptop maybe it's an API service that then has interfaces for a chatbot or a curl or whatever tool that they want to do to interact with that so we started to building out these common interfaces to the problem so that way it was easier for engineers to actually self-service and we can actually more easily build out air handling and better integrate those various bits so as we built out those common interfaces what we also were able to achieve is under the hood implementing everyone's least favorite thing standards and practices we're able to actually implement naming standards tagging standards oh you want your database to be called something like this oh it's a standard because you don't really pick it's just the tool decides based upon the information passed in how you how the internals of the system that you don't necessarily need or should care about work because they're provided to you by a system so we were able to abstract out those questions of like well what do I call my database user or what do I call this what port do I run this on we abstracted all that out by building those common interfaces so that way you're passing in maybe four attributes and then based upon those four attributes we can infer the rest of the system we also were able to build out better practices around how we actually build out tools and what observability tools you get out of the box by having these because we're able to add features onto it under the hood without engineers having to change their interface so they could be using the platform signed up like oh whenever I need to do new service I just make a call or I fill out this JSON YAML this JSON file this YAML file whatever and under the hood they might get all the observability tools we offer for new service with all its information prepopulated they get any configs that they need set up automatically set up for them the other key thing we had to do was documentation and not documentation after the fact for you just kind of throw a blurb up there a y-year tool has a funny name that no one's gonna understand or a picture that goes along with it but documentation of why this actually works how it works the problem is trying to solve how to get help when it doesn't work and more importantly how to contribute the biggest thing we needed with documentation was dev setups and how to contribute because that then made engineers feel like oh I need a new feature for this oh it tells me how I can set up a dev environment and add the feature that I need we started then getting pull requests and people coming to us instead of us just instantly going oh yeah we'll take that take it to add this feature here people started making pull requests to add their own functionality so then it started becoming a community effort that we ran to grow those systems so we were able to scale them out we also needed to focus on reducing the number of handoffs and transactions that were going between getting something out getting something deployed or anything we wanted to reduce as much human interaction as possible so we wanted to build automated checks automated compliance request where if you had a pull request and its checks were green its unit test passed things like that we could then basically accept it and say we do a code review this looks good it's out the door instead of someone going all right a code review that looks good now let me go kick off this staging test now let me wait for this to run all right now I need to manually run this linter now I need to do all these other manual steps make them as automated checks on the PR so that way instantly someone could just look at the PR and go oh it's green okay now I just need to review the functionality and what this is actually doing rather than the oh well you didn't put a new line there or your braces don't line up things like that because humans don't want to do that computers are great for that the other thing we did to get people to understand what these platforms were provided but also get people interested in working with them more information about that was set up a notion of office hours where teams go and sit in like our public kitchen or set up a meeting room and they make them they make either one or two representatives from their team available for anyone to kind of walk in and ask questions off the street the idea there is that it's a custom customer driven interaction so the customer could come in and say I make experiencing this problem today can you help me fix this I need to get this fixed or the customer could come in and say hey that was really slick that new thing you added how'd you go about adding that and get talked through now of course we do these weekly we don't have them set up daily weekly works out pretty well for us we also have a setup that we did where we called it our solutions office hours where it's our architect and our principal software engineers and then a handful of people from our various observability team technical operation teams and myself that can sit there as consultants so someone has a new idea that they might be like this could be really helpful for us to solve this problem they could come there early on and run their idea by this essential internal group of internal consultants and get feedback and say hey that's a good idea have you thought about this or that's a really great idea could we also use it for this so by doing those office hours we're able to grow that community interaction so to recap we scaled by building standard self-service interfaces and reaching out to the community via documentation and those open office hours and by doing that that allowed us to get out of the day-to-day toil of a lot of these platforms and make it where engineers could solve their own problems so to recap I went over about five key lessons that ANOVA learned as we're going over this journey and we started out by making sure we were building the correct teams to solve these problems by focusing on teams that deliver a consistent and thorough platform to our customers we focused on making sure that those visions of those teams were publicly documented publicly shared and more importantly known throughout the company we then made sure to focus on context because without context you don't know anything that's going on we also made sure to constantly be delivering product and value to our customers because without that we were chasing the problems of yesterday instead of solving the problems of today and then we were constantly scaling and made this a process that anyone can do by ensuring we had correct documentation self- service interfaces and reducing the amount of humans in the process thank you that's all the all the slides I have I'm Kevin Harris if you have any questions I'll be outside