 I'm honored to introduce Bob Weiss, General Manager of Kubernetes at one of our sponsors, Amazon Web Services. Thank you, Amazon, for sponsoring this commit. Bob's session is titled, GitOps-Based Ops Modernization. What is GitOps and how is it changing ops even more than DevOps has? Bob has been deeply involved with Kubernetes since before its 1.0 release. He's also been a governing board member of the CNCF. So if there's anyone who has great insights on where DevOps, Kubernetes, and software delivery in general is heading, it's Bob. Let's check it out. And don't forget that you can ask questions in the chat during the talk. Hello, I'm Bob Weiss, the General Manager for Kubernetes at AWS. And I'm going to be talking today about a topic that I have a lot of personal passion for, but I think is one that is becoming increasingly important to our customers, and that is this topic of GitOps. I've titled the presentation here, Evolving DevOps to GitOps, because I view GitOps as a proper subset of DevOps, a more opinionated way to think about DevOps approach. But as always, I think it's important to think about the big why. Why do we care about DevOps? Why do we care about GitOps? Why do customers care? And I think these are the reasons that I hear from customers all the time. Velocity, overall velocity of building systems, building software, delivering value is critical to competitiveness, and this isn't just true in IT. This has become true across all industries. And we've also been seeing in a number of industries huge velocity gaps between the high performance orgs and the average performance orgs. Now, I wouldn't say big gaps between the high and the low. That's certainly true, but even between the high and the average here. And then combined with that is a natural desire to be more efficient, constrained, sprawl, use fewer better tools, have everybody use fewer better things. And the teams, this is an HR issue. The teams need to modernize their approach because it helps employees be happier. Happy employees help recruit their friends and a good work environment is good for everybody. So there's considerable pressure to evolve. This is a recent pressure. This has been going on for a long time. And this has, of course, been leading to a lot of thinking. Some of it more recent like GitOps, some of it has been in process a long, long time. I'm talking about the importance of how do you make more small changes as opposed to fewer big changes? And this is, of course, the discussion that's been going on for years and years about waterfall versus incremental kinds of approaches. But DevOps is strongly in favor of a small incremental change approach. GitOps has a kind of more opinionated way to think about the specific approach for how to do that. But overall here, what we're trying to make sure is that even if you start with the greatest team and the best intentions, if you're in a mode where you do lots of handcrafted things and it takes too long and you're not using enough automation, you go slow. And then when you have a problem, the reaction is to add more time and add more process, which means the next thing you have to do gets even bigger, which increases the chance of a failure just based on size. Whereas if you can make progress towards a more high velocity approach with the smaller changes, and it isn't just about small changes, it's about how do you make small changes and use automation, get the help from the robots to make that happen. So GitOps is a way to potentially kind of help move from spiraling slower and slower to spiraling faster and faster. But that said, fully automated deployments are really, really hard. So I don't want to minimize that because as important as it is to kind of leaping that chasm from, let's say, average velocity to high velocity, it's a big investment, it's a big change. And hopefully some of the things I say here today can help on that journey. So why is that so hard? A big complicated systems and trying to coordinate all the changes between those systems is really hard. When you have systems that have lots of ordered steps, when that number of steps gets really large, it gets really dependent on each other, things that get complicated, dependency tracking. There's a lot of the success of immutable style approaches and containers generally are based on improving the dependency tracking situation. And then you have packaging at various levels of the system. How do you deal with versions? Different versions of different versions, let's say you have lots of microservices, each one of which is versioning differently, you have even more complication there. And then what do you do when things go wrong? You have to have an approach for rollbacks when continuous, when a particular deployment isn't working. So these are all the complications. And where we've been moving to as an industry to approach these things has been to take some pretty bold steps. A shift from imperative approaches to declarative approaches is certainly one of them. Moving from kind of automated mutation to immutability is another big one. So one of the approaches that teams took to approaching the automation was to have systems that kind of reach out into production and change them around a lot. And one of the things that has worked really well is taking a more immutable approach, a container, you don't go and upgrade a container in production. You throw it away and you replace it with a new one, a new one that's been tested. So immutability approaches have been really critical. And then on these larger, more distributed systems, you can't depend on a single state. This system has too many states, it's too complicated. And we have to absorb and live with that complication in a good way. And the approach that has been working really well is to view kind of reconciliation and convergence on an ongoing basis as a critical technique. So to be a little bit more specific, right? So manual processes, automation, this has been going on for a while. This is at the DevOps level, really what's going on here. But in a GitOps and containers world, things are a bit more prescriptive and there's some better opportunities here to improve. So replacing package managers and batch scripts and so forth with containers. Actually, depending on configuration as code, having all of the configuration be in a source control system. Again, best practice in the DevOps world, I'll get to this in a bit. GitOps has an even more opinionated view of that. And then including a security posture in what you do here is critical. One of the big, I would say, a realization for me here over the past number of years has been that focus on CI is really important. But lots of teams have, I'll say, solved the CI problem. CI is well understood, lots of teams do it. It really helps the development teams go fast. It's a critical DevOps practice, but continuous deployment is actually quite hard. And so as we move from a world where the dev teams are going really fast and the operations teams are maybe not going as fast, I think this shift from focusing on CI to focusing on CD is really important. Okay, so now to the main topic here. DevOps is transitioning to GitOps for a lot of folks, especially in the container world. Okay, so first of all, can't really say GitOps without tip of the hat to the WeWorks folks who invented this term. And now let's move on to kind of what are the principles of GitOps. So systems described entirely declaratively. The desired state is versioned in a source control system. Approved changes can be automatically applied. Note here that the shift here is a bit towards humans do their work up front and the robots do the work after that. And then the use of systems like Kubernetes that have great support for this kind of model built in support for building agents and watches and things that work really well in a GitOps world. So just to emphasize here, declaration, versioning, automation, and then support from the underlying systems. Okay, so let's think into each one of these a little bit more. Declarative is important because it's much easier to reason about its scale it forces simplification in a disciplined way. There's always things that you kind of want to think about imperatively but ensuring that the teams are required to think through a declarative approach often helps simplify things. A really important point here, it's defined intent. So this is the, how do we in this new world manage robots? How do we cross the human robot interface? And the approach here that's been working is to define intent and let robots do the work. And then a notion of constant convergence of the system is really critical especially in these large distributed systems. Declarative approach helps make a clear convergence point that the robots are always working towards. Everything in source control. So this is, I think to DevOps practitioners this would be non-controversial maybe even outside of that. This is a good software engineering practice. Really critical here though. You want a human readable source of truth. Want the humans to be stating the intent. And one of the ways you manage the human interfaces here is this is a great place to have multiple eyes looking at a system. So when you want to have multiple humans looking at a thing using the well understood kind of source control code review kinds of mechanisms work really well. And then in this kind of workflow typically deployments are triggered by merging the PR. And then the robots get to work. We love our robots. They aren't yet our overlords but we still need to be nice to them. I would say this is probably the biggest difference. If I wanted to poke at that difference between the really high performance org and the average org, I would say this is it. Most organizations are really, really worried about humans declaring a thing and then automation taking over and taking care of the rest of it. And then this is where you can see opportunities for a spiral to slowness where you have lots and lots of process and checks and other things that humans are doing in the middle of a process rather than doing that up front. At Amazon, we are terrified if our orgs are deploying software directly to production and not using automation. So I think this is a place where we're practicing what we preach. Again though, this is really hard. This is a big investment but the payoff that you get for making this investment is great. All right, so why Kubernetes? Kubernetes was really built as a robot. One way to think of it is a robot for ops teams. And it supports, has from the early days, a declarative kind of approach. The notions of convergence are built into the system. This is why the controller's and operator's pattern is so popular. Immutability, it's container management system. It has support for things like deployments which helps here. It has these kind of convergence properties around, you know, organic healing properties of, oh, a thing failed. I need to converge it back to the desired state. So a lot of the basic things that you need in order to start taking a get ops approach from there. So just to poke at this a little bit, I've talked about this a few times just to make sure we have the common context here. So continuous convergence, drift management. We'll hear the drift word often in get ops conversations. So the idea here is you should have this desired state and that's really that immutable thing that's checked into your source control system. You have your desired state. You check, is my current state, the matching with the desired state and then take some action to either correct it or it can be the case that because the robots are still not that sophisticated sometimes, they still need human help. So often the robots can fix a problem. Oh, a pod died. I'm supposed to have five pods here. There's only four pods. Let me converge the state back to five pods, for example, in Kubernetes. The robots can do some of the work, but maybe it gets too complicated. Oh, I ran out of resources to get back to five pods again. So how is it that I, when do I ask humans for help? So a big piece of the story here is ensuring that it's clear in your systems where the robots are doing the work and where they need to ask for human help. Okay, so digging even a little bit deeper here, if you want to think about this another way, I think this was, again, when I first started trying to understand the implication of GitOps operationally, the notion of the source control system as a firewall between the CI world, the developer world and the GitOps world, more the ops world, the deployment world is really important. So if you want to think about how do you fit the notions of CI and CD together, this is the way I would suggest you think about it. CI as a build process where artifacts, by the way, one of the things that's missing here from this diagram is artifact repository like a container registry, for example. But if you'll forgive the simplification, the main point is here that in the CI and dev world, you do a lot of things, they're committed into the source control system, the humans review it, and then the robots take over. So you get a single source of truth, all operations, all changes of the system are committed by pull request, and then you get into this checking the diffs and automatic convergence. This has another really positive benefit, especially from a security and audit perspective, which is you have a clear, very clear kind of change control path. Here's the humans that looked at the changes. Here's who approved it. Here's who merged it. Here's the desired state of the system. That has a lot of positive effects outside, even outside the operations team. I have to mention here as well, in the Kubernetes world, if you're practicing GitOps in the Kubernetes world, flux is an important or extremely useful component here that does the kind of Git cluster synchronization or source control cluster synchronization. In summary, GitOps is a standardized workflow for configuration deployment, updating and managing, well, broadly infrastructure is code, but in this case, the more narrow view here is Kubernetes and all components. I'm going to say another word about the all components piece. And then applications as well. Okay, so one of the things that we announced last week at KubeCon is a system we call ACK. It's in developer preview. It's open source. Come check it out. Try it. Give us your feedback. But the reason I mentioned in the previous slide, the all dependencies is that to date, it wasn't entirely clear if you're using a GitOps approach with Kubernetes or EKS, for example, on AWS, how to think about application dependencies on AWS services. So we have a new thing here that we think is going to help our customers practice GitOps. Example would be you want to define an RDS database instance in a Kubernetes manifest. You can clearly do this kind of work with or without ACK. We would certainly recommend if you're a Kubernetes centric and you like to work in that way, that this would be some help. So effective strategies. Where are we seeing customers actually put this to use and what are they learning? I think the main one has been focus on operations on the operation side. If you're in an organization that like AWS practices, you build it, you run it, this is more of a life cycle and roles thing, maybe within the team. If you're a operation that uses SREs or perhaps has development and operations and separate teams, you can think of the focus here as being on the operations team as opposed to the operations practice. Either way works. And again, the focus here is on operations agility as opposed to developer agility. You can, of course, use container orchestration to help realize CI and CD. And I would also say that this should not be viewed like most things. These kinds of transformations are really hard, as I said earlier. Continuous deployment is really difficult and it's important to take an incremental approach to this as it is anything else. And you can incrementally evolve the entire organization, not just new projects. So for example, we see customers have had a lot of success in taking an application that perhaps in the longer run, Yorg wants to re-engineer into a microservices style of architecture. First, containerize that application get continuous deployment working for the monolith, build out the organization and practices incrementally in order to achieve that. And then they can start to incrementally evolve that or maybe not, right? Like some applications in that state can use a lot of the kind of self-healing properties and immutable approaches to understand and manage changes, but don't really need the additional investment in microservices kind of approach. But I would say treating this kind of operations modernization as a critical prelude to any kind of microservices re-architecture is critical. So my recommendation here would be make sure that you can do continuous deployment on your monoliths before you start re-architecting those into microservices. And beyond the topic of this talk, continuous observability is as important as continuous deployment. Okay, so other approaches that we've seen that seem sensible also in a kind of incremental way. And this is a bit more common for companies that have split their platform or ops teams separate from the applications teams. And you can indeed use a GitOps approach at both the application level and at the platform level. And so what we see sometimes is the application teams absorb this kind of approach first. But what I would say seems to work a little better and is pretty common is where the platform team first tackles a GitOps-based approach for their kind of internal platform and infrastructure management. And then over time that moves upwards into the application teams. So there we have it. Thank you very much for your time.