 It's wonderful to see this many people in the room. I'm Tim Hansen. I'm here to talk to you a little bit about GitOps and how we use it at Spotify. And I'm so excited that there's so many people eager to learn about this. Could I get a quick show of hands? Who feels like they know kind of what GitOps is? All right, that's a pretty good percentage. That's really good. Well, if you're bare with me for the people that didn't raise their hands, I'm gonna go over some basics, but I promise it's just a few minutes and then we'll get to the good stuff about Spotify. So, what is GitOps? This is my simple definition if you haven't heard this term before. GitOps is an approach to infrastructure automation using files stored in source control. Git is obviously the preferred source control system here. If you're using SVN, you might be too old for this talk. I'm just kidding. I've been there too, man. All right, so you might be thinking, this sounds familiar. Isn't that just infrastructure as code? We already have a term for that. You can't just make up new terms. Well, first of all, I didn't make up the term. WeWorks did, so you can blame them. But infrastructure as code is a little bit loose. I can apply changes from my local machine. Still be doing infrastructure as code. I can push code straight to the main branch without reviews and still be doing infrastructure as code. I can set things up with code and then go into the cloud console and muck around and still be doing infrastructure as code. So, GitOps applies infrastructure as code, but it's a bit more prescriptive in the workflow. So, in addition to the prescriptive workflow, there's a few basic principles that must be followed for GitOps. These are almost self-evident from the name, but worth going over. So, first, we must be able to express the desired state declaratively. That means I can describe the infrastructure I want in a file so that I can store it in Git. Second, these declarations must be versioned and immutable. This makes the third bullet much easier. And third, these changes to these declarations must be applied automatically. So, that means something's paying attention to Git and tries to make reality match my declarations. So, the Git part of GitOps already handles the first two. Since we're using Git as the source of truth, the declarations are clearly stored in files. And Git commits, or I'll show versioned and immutable. So, that leaves the last bullet, which is way harder than the first two. I can imagine someone proposing these like, I'll handle the first two guys. That last part means that we have an agent. This is a process paying attention to Git, which can look at your declarations versus reality, make a comparison, and apply the necessary changes. I added a little Kubernetes icon over here, but quite often with GitOps, we're talking about Kubernetes resources, but not always. I'll cover some other examples later on. And I just led the image of a cloud with a steering wheel. Okay, so GitOps isn't all that complicated. We store declarations in Git. Those get reconciled with reality. This should be very familiar if you're used to tools like Terraform. So, what's the benefit of embracing this approach? So, let's look at a typical release to production at a software company, company X. I was using X as a variable here, not the Twitter thing. So, it starts. Developers make some code changes and check them into source control. The code gets built, packaged in the CI system. If you need new cloud infrastructure resources, maybe an ops person provisions them, or a developer clicks around in the cloud console themselves. If there are database schema changes, you might have a DBA that applies some migration scripts. The code is pushed into production with scripts or even manually with FTP. I wish I was joking about that. Someone out here feels like personally attacked right now. If there are configuration changes needed to the runtime environment, an ops person might apply them. Now we can start accepting traffic and do a rollout, which might be either a manual or someone automated. And finally, the DBA might apply a post-release database script. So this process is pretty slow and there's a few people involved. I'm even missing some steps here. There's probably release verification, some manual testing, decommissioning the previous release. If your release process looks something like this, you probably don't release every day. You probably have someone clicking things late at night to avoid production outages. And this doesn't scale. So let's compare how we typically do software releases at Spotify. Starts out the same, developers make code changes and commit them. We build and package things in CI, maybe a little bit differently, I'll get to that later. But then things start to look very different. Cloud resources are automatically synced based on the declarations in the code. Database changes are automatically applied based on migrations in code. New application pods are created, again, based on declarations in code. The rollout is handled automatically based on canary configuration in code. So the release is completely automated. The only people left in this process are the developers and we didn't like fire all the ops people. They helped us make this possible. But this allows us to continuously release thousands of microservices many times a day using GitOps. So this shows a very tangible benefit of GitOps by automating all the infrastructure steps. We can scale the release process and make it faster. And because the changes are all stored in code, we get all the nice things that Git provides. A review process, a version history permissions, easy rollback mechanism. And we could use Git blame to point fingers too, but come on, we're better than that. So thank you for hearing me. That was a crash course in GitOps to set the stage. And now I want to show you how we use it at Spotify. So let's start with builds. The build system is kind of the starting point of GitOps. This is where the process has kicked off. We have a central build system at Spotify called Tingle. It's a custom system, but it's designed around Jenkins pipelines, action containers, very similar to GitHub actions that came out a little bit later. Most interestingly, it has a declarative pipeline language. Remember those principles of GitOps? The desired state of our build is stored declaratively and it's in Git, so it's versioned and immutable. So lastly, we just need something to apply the changes automatically. So Tingle uses GitHub web hooks to trigger builds automatically. And this is roughly what a Tingle build pipeline looks like. It's in YAML. Who doesn't love YAML? That was an alternate title for this talk. YAML, this is my life now. Don't worry if you can't read this, it's not terribly important. I cut out a bit to fit on the slide, you get the idea. We have two pipelines, a review build and a master build. We build this Java application with Maven in both cases and the master build triggers a deployment. So if I make changes to this build file and commit it, Tingle picks up on that via GitHub web hooks and changes how it builds my application accordingly. Now there's something super interesting that GitOps enables here. Many of our microservices use these exact same build steps. Compile Java, deploy to Kubernetes, et cetera. So this build definition could get copied and repeated all over the place. But with GitOps, we have an agent watching for changes. So why don't we have that agent do a little translation for us? Instead of that big pipeline definition getting copied around, we created build templates. So this tiny file on the left is what you'll typically see for microservice. And this template is translated by Tingle into the bigger standard definition for Java backends that you saw a second ago. So we have similar templates for websites, Python backends, streaming pipelines. This is another superpower of GitOps, abstraction and simplification. By using GitOps and standard tech stacks at Spotify, we can make builds absolutely brainless for developers. And given this template definition, we can expose some commonly used options to customize the build. But again, in a super simple way. Maybe you wanna be on the Java bleeding edge or you're willing to sacrifice build dog readability to parallelize it, speed things up. Those are flags we expose. And for software that needs a truly custom build, the full pipeline language is still available. But for most components, these templates are just fine. So next, let's talk about deployments. Like builds, we have a central system that handles most of our deployments. And again, it has a declarative language with the desired state stored in Git. The changes are applied automatically, though in this case by the build rather than directly from Git. And here again, we use a custom YAML file to define a very minimal format. We're leveraging the power of standards and defaults. All we have to say is this is a back end. Please deploy it with Kubernetes. And for Kubernetes deployments, our deployment system looks for a root level Kubernetes folder in our source code by default. And this contains standard Kubernetes definitions like service and deployment along with other definitions managed in Kubernetes. So here we're not only storing a declaration of how to deploy in Git, but also the declarations of what to deploy. You can also see a few cloud resources we managed declaratively such as database, IAM users. So I can show you those. So we use an open source product from Google called Kubernetes config connector. This allows you to define cloud resources as Kubernetes custom resource definitions. So here we've defined a cloud SQL database, Postgres and a partial IAM policy. So our service account can access the database. Config connector uses custom Kubernetes controllers to reconcile the state of cloud resources with what we've declared here. We call this declarative infrastructure but more generically, it's GitOps. Changes to these resource definitions are applied as a separate build step before the application deployment. Oh, and one more thing on deployments. This declarative format allows us to build very powerful features and make them easy to express and add to a service. As an example, our deployment system could do automated Canary deployments. So this means for each code change, we stand up a new pod, shadow load some traffic onto it and check metrics like error ratio to make sure the new pods behaving before we promote the deployment. And you can see how easy this is to add to your service with our template language. And as a result, lots of services do use it. We can even create the PR for you to add this in. Now, moving on to monitoring. We have metrics collected from our services generally through Kubernetes sidecars and sent to our central time series database. And from there we use Grafana to visualize these metrics and provide alerts. And we use GitOps. You might be starting to sense a theme here. We again have a central system. It again has a declarative language and changes are automatically applied. And we use the same abstraction and simplification trick that we did with builds and deployments. Since so many of our microservices are using standard technology, emitting the same metrics, the same monitoring graphs are useful. So we've defined templates that are expanded into graphs covering all the standard metrics. So here on the left is a declarative monitoring file for a typical microservice. It's very simple. Just say it's a Java backend. My work here is done. And this gets translated into JSON to create Grafana dashboards. And there's no way that would fit on a slide. So that's just a tiny little snippet. It's huge. So by using our standard backend framework and that tiny little ammo file, I get a full-featured dashboard for my backend service. Not only that, but alerts are already configured for metrics that are typically significant. So you can imagine the power of these templates. No one has to spend time creating dashboards or thinking what metrics are available for their service. We provide alerting recommendations. If we add a new metric, we can roll it out using the template. No code change is required to all those services that depend on this. And like our build templates, we expose configuration for common variations on a bundle. So exclude Hermes, that sort of thing. There's also a way to override specific panels if the defaults don't fit perfectly with your service. Of course, it has to be customizable. You can add more bundles or even supply completely custom panels. That's what's shown on the right there using this declarative language. And we get all the power that get-ups provides. Version history, review process, automatic updates. And we can start doing cool stuff on top of that like showing a preview of dashboard changes on your pull request. Something I haven't mentioned yet is how we tie all this infrastructure together. To make all this work seamlessly, we need some metadata about your service. For example, deployments default to certain regional availability based on the tier of the service. Those Grafana alerts need to know who owns the service to send the alert to the right place. And for this, we have our software catalog. And of course, we use get-ups to manage this metadata as well. The software catalog stores information about the service and also information about relationships the service has to other software and resources. This one, for example, declares that it depends on its database and that the API is consumed by another service. This means our software catalog is not just a list of services but actually a graph that tracks all these relationships between services. So the software catalog is part of Backstage, our developer portal. I hope you've heard of it. We release Backstage as open source and donated it to the CNCF three years ago. And now there's thousands of companies using it. It's been really awesome to see it take off. So Backstage brings all this metadata together. So our developers have a single pane of glass to all this infrastructure. I can see my builds, my deployments, how my service ties into the software graph. And Backstage also has software templates. These are like blueprints for creating a new piece of software so that you start up a new repository ready to write code instead of doing a bunch of one-time setup. For example, the Spring Boot template might have Maven already set up for me, a sample route defined, default server configuration. And it will probably not surprise you that these templates themselves are defined and get using a declarative templating language. But that's not what I wanna focus on here. Instead, think about all those declarative files we just talked about for builds, deployments, monitoring, software metadata. Of course, we include all those YAML files in the template as well, already customized for the type of software you're creating. So by using a software template in Backstage, filling out a few form fields, a developer at Spotify can create a service that's ready for production in just a few clicks. You can build and deploy right away. It's automatically registered in the software catalog. And it already has monitoring and learning setup. You never even had to touch this YAML. I don't know about you, but this blows my mind. By having everything declarative in code and having standard technology stacks, you get infrastructure basically for free. And if you need to make modifications, of course, it's easy. All right, so if I've sold you on GitOps this far, you might be thinking, oh, great, now I have to go work at Spotify to enjoy all this great stuff. And I do encourage that, apply it today. But it's not necessary. These tools are out there for you to use GitOps at your company. So there are tons of tools for running builds in a GitOps-friendly way. The big version control providers are actually doing some nice work here. Git Hub Actions is very similar to our internal build pipelines. It's kind of great, I have to admit. We use it for the backstage open source project. GitLab has a really nice declarative CI setup too. Google Cloud Build is along the same lines of a container action runner. Tecton, I haven't used myself, but people tell me it's nice, I believe them. Argos famous for Argo CD, that you'll see in a few slides, but there's also Argo events and Argo workflows that can be made into a CI pipeline collaboratively. And Dagger is another tool that enables builds as code for provisioning infrastructure with GitOps. The solution space is a little bit more limited. Terraform is the classic infrastructure as code tool that everyone thinks of. But I gave it a kinda here because it doesn't handle drift and reconciliation in a very automated way. Open Tofu is an open source fork of Terraform created in response to the Terraform license changes a few months ago. Config Connector that you saw earlier is fantastic, but it only works for Google Cloud, so not great for everyone. But Crossplane is a CNCF project that's super interesting in this space. It does basically the same thing as Config Connector, but it's cloud agnostic. And it's support for all the major cloud providers. And there's a talk tomorrow talking about the back stack, backstage, Argo CD, Crossplane, Coverno. So definitely tune in for that one. Those are all CNCF projects. Lastly, I have Pulumi on here, which is similar to Terraform, but offers libraries in many different languages instead of Terraform's HCL. For application deployment, Argo CD and Flux are the two dominant tools that you'll hear about for GitOps application deployment. These are both focused almost exclusively on Kubernetes deployments, and I highly recommend using Kubernetes for application deployments. Now, if you check out Flux, there's an extremely cool tool called Flagger that you should definitely check out. If you were jealous of Spotify's automated Canary deployments, this will give you the same thing. It's a progressive delivery tool to help with automated rollouts. You know, at least at Spinnaker, it's an old tool that can learn new tricks. It can support a GitOps workflow. If you use pipelines and store your delivery config in Git, but it's not really designed around GitOps the same way that these other tools are. I'm monitoring. I shouldn't have left it for last. I promised myself I wouldn't cry. They're unfortunately not great tools out there for managing dashboards and alerting with GitOps yet. Spotify is a little bit ahead of the curve here, I would say. Grafana has a means to provision data sources and dashboards through code, which is what we use, of course. But wrangling this into a true GitOps workflow is best described as an exercise left to the reader. If you know of any tools that I missed here, please tell me so I can stop looking like Kermit here. But I don't wanna leave you on a downer. So I had to throw in a little bonus section. There's another huge advantage to GitOps that Spotify has jumped on. If you think about it, since our desired state is stored in code, we can use code-based tools to do something a little bit magical. Automatic refactoring. So we created a tool called FleetShift that takes code refactoring to the next level. Spotify engineers can define a code transformation and roll it out to our entire software fleet. We can define which repositories to target based on software catalog metadata or by detecting eligible code. We can do gradual rollouts, sending the transformation to say low-tier services first to the less critical things and monitoring the build status of those PRs. Many of our teams have even enabled auto-merging of FleetShift PRs. If it builds, ship it. And this has been a game changer for vulnerabilities. We can roll out security updates in a matter of days rather than weeks or months. This graph shows our rollout of the fix for the log4j vulnerability from a little while ago. 80% of our services were patched on the first day, but it also relates to this talk. We can apply this power of FleetShift to our GitOps declarations. So the first line of defense was the abstraction and simplification that you saw earlier with templates and bundles. So those implicit defaults give us one way to make sweeping changes across all of our software. But if you need to tweak all those declarations that are out there in code and migrate the stragglers to a more recent version, for example, FleetShift, FleetShift makes it easy. I'm submitting a talk on fleet management to the next KubeCon, so stay tuned. A little bit of a teaser, but I digress. Enough about FleetShift. I wanted to leave you with some great GitOps resources to learn more. So OpenGitOps.dev is from the CNCF GitOps working group. So that's definitely near and dear to all of our hearts. GitOps.tech is a site from Weaverx who pioneered this space. They have a list of awesome GitOps tools too, including a bunch I didn't cover here. And of course, there's a ton of GitOps talks here at KubeCon. So look for talks on GitOps, Argo, Flux, Cross Plane especially. Check out the back sack tomorrow. So thank you so much, everyone. If you have any feedback on this talk, please use this QR code and give me some feedback. Thank you, happy coding.