 In a previous talk, we heard about how AutodevOps makes it really simple to get up and running with minimal configuration, and it absolutely does, but your business and your team and your DevOps adoption in your company are going to grow and change, and it's important to be sure that your DevOps platform keeps pace with those changes. Now, growing and changing doesn't mean you have to throw out everything that's come before because GitLab is a true DevOps platform. It's not a collection of tools that you've stitched together to meet a very specific, narrowly constrained use case, so it will absolutely grow with you, but there are some best practices and how to evolve from AutodevOps to something that will grow with your use cases as they grow. To learn more about that, it is my absolute pleasure to introduce Marshall Cottrell to talk about his work for NASA and his experience evolving beyond AutodevOps to handle some increasingly complex Kubernetes deployments. Let's hear what he's learned. Hi, guys. Welcome to GitLab Commit Virtual. Thanks for joining this session. I'm going to be talking about customizing AutodevOps and how we built a more extensible Autodeploy stage using Customize and Kept. My name is Marshall Cottrell. I'm the lead engineer of the platform team at MRI Technologies. MRI Technologies is a woman-owned small business contractor. We do almost all of our work exclusively for NASA specifically. We're a 100% remote team, rapidly growing, and we've always been evangelists of open source, repeatable deployments, cloud native development practices, and we really take pride in our ability to modernize software development within the agency. Before we dive into the technicals, I wanted to give a bit of background on what we've been working on over the past couple of years. We've been primarily focused on developing this cloud native software platform that we call apt-at to give a bit of background on why we chose to build on top of AutodevOps. We were very limited in funding initially. We started off with a 30K grant proposal. With GitLab and AutodevOps, we were able to really hit the ground running, demonstrate a lot of value, highlight a lot of the features of GitLab without having to do a bunch of custom development. Obviously, this talk is about how we have started to outgrow some of what's built into AutodevOps, and so we'll get into that more in just a bit. What makes our platform successful? One of the core problems that we were trying to solve was the fact that NASA really had no way to manage software deliverables. As a result of that, contractors generally perform all project and software project management using their own corporate systems and own corporate infrastructure. In many ways, that actually benefits contractors, particularly those that are large and well established within the agency because they're able to silo all of this domain knowledge on how to build these systems, all the requirements gathering, all the issues and ticketing is all happening in their own systems. And so when a contract isn't going well, maybe it's over budget or taking too long to implement, NASA doesn't have a lot of options in terms of being able to maybe transition that project to another contractor or even bring it in house on the civil side because NASA has never mandated that any of the deliverables outside of the project being deployed to a production environment be part of the contract. And so that's really what we were trying to solve. We wanted a place where software could be continuously integrated and delivered onto one platform so that NASA could ensure that software contracts were made competitive, right? Everything's being done out in the open. And if NASA decides that they want somebody else working on the project, they can transfer all of that knowledge about how the project's been developed in the past to another contractor. So like I said, this has been a collaborative effort. There's no authoritative corporate ownership of the platform or operations. NASA owns the platform and NASA funds the platform. It's built entirely on open source software, so there's no vendor lock-in. There's no hidden cost to scaling. It's really easy to contribute to and extend individual products. We've exclusively focused on the cloud-native aspect, which allows us to avoid a lot of the complexity of doing anything on-premise or dealing with the NASA networking infrastructure. As a result of that, we've built it on zero-trust principles, which allows us to avoid the VPN. This has been really great during COVID with everybody working from home. Our systems aren't bogged down by how slow the NASA network can be during peak business hours. We're able to deliver a really fast experience because we don't rely on traditional firewalls and stuff like that for security. Everything runs on Kubernetes. That includes GitLab itself. We've been running GitLab on Kubernetes since the Helm chart was in Alpha. We also run all of our customer workloads on Kubernetes, which allows us to operate a significant number of workloads at scale with a relatively small team. We don't have to have a lot of domain knowledge about how a bunch of different pieces are deployed independently because everything is generally deployed the exact same way. One thing you might take for granted working in the private industry is the fact that we've vertically integrated pretty much everything from DNS, identity brokering, SAML provisioning, authentication and authorization, all that stuff that a typical software team in the private industry would have vertically integrated just by default. This is kind of unprecedented within NASA. We've never had a platform where one team can do wildcard DNS provisioning without going through some sort of external ticketing system or something like that. Lastly, and arguably one of the most important pieces, is that we've put a lot of emphasis on changing culture and promoting collaboration, which is absolutely necessary for our customers to have success on the platform. If people aren't bought into the philosophy of using a single tool for integrating all their software development, then it wasn't going to work. So that was really big for us too. If you're interested in seeing or understanding more about the platform, please check out our 2019 commit Brooklyn talk, where I really dove deep into the business aspect and the platform side. Okay, so let's get into the technicals a little bit. What is autodevops? I think if you'll get a lot of different answers to this question, depending on who you ask, some responses are going to tend to even be philosophical in nature. But at the end of the day, this is what autodevops is. It's a set of built-in CI templates that are composed together to give you what is referred to as the autodevops pipeline. It's a bunch of sort of opinionated templates that just get you going. It's limited in the set of use cases it can support. But it does help you integrate the vast majority of the functionality that GitLab offers right out of the box. So let's focus on the autodeploy stage in particular. How does this piece work? This YAML template is copied directly from the GitLab source and is what's in that template that was referenced on the previous slide. Let's go through each step and get a feel for what's going on here. The first step just checks to see that the environment variables, the necessary environment variables are set in order for the CI job to be able to connect to the cluster in the first place. The second step downloads a third-party external helm chart if you have one specified. Otherwise, it just moves the built-in chart to the appropriate directory so that it can be deployed from in subsequent steps. Ensure namespace just ensures that the namespace exists. If you're using the GitLab cluster integration, then this is done on the back end for you. And so this steps a no-up. Otherwise, it'll create the namespace if it's not already there. Initialize tiller does nothing. Just echoes that we're on helm three now until it's not being used. Create secret creates a registry secret using kubectl, which is what basically what you're going to set is the image pull credentials so that the cluster is able to download the image from the GitLab registry. And then finally, most of the complexity is implemented in the deploy step here at the end. And in the default implementation, that potentially manages two separate helm releases, one for the Postgres chart to spin up an in-cluster database, and then another for the application itself. Okay, so let's talk about our challenges with the auto-deploy chart and what led us down this path of needing more flexibility. So the good news is that it works out of the box, right? If you are building a single container application and all you need is a deployment, a service, and an ingress, then the auto-deploy chart does what you need. And this is great because you don't have to write any code. You don't even really have to understand much about Kubernetes for this to work for a small project. The other really nice thing is that you get integration with a bunch of GitLab functionality like deploy boards and pod logs. If you're not familiar with what some of this stuff is, the links are embedded in the presentation. So feel free to follow along if you're watching in real time. It also has support for review apps, so this can get spun up and down dynamically. And then, like I said, it has support for specifically a in-cluster Postgres SQL database and also support for initialization and migration jobs, which are necessary to support spinning up and tearing down review apps dynamically. So what's not so good? Like what challenges do we have with the default auto-deploy implementation? First of all, it's fully imperative, so it's very difficult to infer what deployments are going to look like prior to just running a full pipeline and looking at the state of the cluster after the fact. It's really difficult to test in isolation and get the YAML that Helm is generating to kind of understand without doing a lot of work to fake a full-blown auto-deploy pipeline. It also has fairly opaque mechanisms for extensibility. You have to understand a lot of the details of how it works in order to be able to override certain things. This is kind of true of auto DevOps in general. You know, like what environment variables can I set? What functionality is that going to override? Is this environment variable supposed to be set to the string little rule true or like to setting any non-empty string value trigger behavior, that sort of stuff? It's also not easy to just swap out the default chart. You don't get integrations like the deploy boards and pod logs for free. You have to understand how those are implemented in the default chart and then implement that kind of stuff yourself in your own chart. So there's a big gap between going from everything built in to providing your own Helm chart. You basically have to understand how all the pieces work to be able to provide your own custom chart and still get the same sort of behaviors. Along those same lines, there's no incremental upgrade pass from doing these imperative deployments to more declarative GitOps style deployments. We'll get into that a little bit more in a bit. And it's also geared, like I said, exclusively towards single container application deployments as soon as you want to compose multiple applications together or do anything more complex than a deployment and a ingress and a service, you have to sort of jailbreak out of the entire thing. So I could spend a whole talk going over the benefits of declarative resource management. If you're interested, I'd encourage you to read the full post by Brian Grant here. I have a link embedded in the slides. But to summarize, declarative resource management and version control is rapidly becoming the industry standard as your application grows more and more complex. You'll reap more benefits out of managing your application deployments declaratively. And I think that as developers, we spend so much time automating processes in CI pipelines that it kind of feels funny and maybe a little counterintuitive to manage Kubernetes YAML resources directly in a Git repo. But at the end of the day, you should really be thinking about your resource manifests the same way you think about application code. You want any changes to your Kubernetes deployments to be deliberate and reviewed by the same sort of merge request approval and review process as application code. For the same reasons that you don't want a third party dependency being upgraded on an arbitrary deployment without reviewing that explicitly, you don't want the helm chart that you're using to change under the hood without you understanding the implications of that. I think another interesting point here that Brian Grant was making is that despite the fact that that helm almost exclusively targets Kubernetes deployments as the reason it exists, it doesn't really take advantage of that. It's not intelligent or aware of the resources that are being templated. Helm is perfectly happy for you to generate invalid YAML or resources that don't align with the open API spec for the schema for that resource. It's interesting that we tend to use the same tool for both generating our configuration files and generating our Kubernetes resource YAML. That's powerful, but it's not always the right approach. Just to give a bit of background, what is imperative versus declarative? To help explain that, I have some examples here. Towards the top are more imperative approaches to deployments, and then the bottom is leaning towards more declarative application management. Like I was saying, helm, any sort of templating, stuff like that, JSON it, doing arbitrary setup and CI pipelines, that tends to be pretty imperative. Kubernetes itself is completely declarative. You apply your resource YAML, that represents exactly what the cluster state is, and then Kubernetes does things like you request a deployment with three replicas, and so it creates that deployment and then spins up three pods to do what you told it to do. That's declarative application management. Our experience has taught us, though, that despite declarative application management being the preferred way to go in the long run, that it's significantly easier to get started with an imperative CI pipeline-style deployment. We wanted to provide our users with a mechanism that supported both use cases and provided relatively seamless upgrade paths towards declarative application management. We didn't want a tool that just did one or the other. We wanted something that would sort of shuttle you along through the evolution of your project. What are options for tooling if we want to take a more declarative approach? There's two that we're going to talk about today, and these are the backbone tooling that we're using in our auto deploy implementation. First of all, we have customize, which allows us to generate and apply patches to Kubernetes resource manifests in a highly structured and programmatic way. It's not possible to generate invalid resources using customize, which is very different from helm. There's no templating involved. You can kind of think of customize as like a scalpel and helm as like a sledge hammer. You're making highly structured edits to Kubernetes resources, and customize is aware of what those resources are and what is valid and what is not valid. There's also another really important thing about customizations is that they're easily composable. This is a huge advantage over helm, where the only way to compose multiple helm chart deployments together is to write another helm chart that wraps the other two helm charts, but even so, there's no way to share information between the two helm charts. Let's say that you have two applications that you want to spin up, and one of those applications needs to talk to a service that's being spun up by the other deployment. How do you get the DNS name of the service from package A into package B? Because helm might be adding the release name, some prefixes, stuff like that. There's really no easy way to propagate information from one helm chart to another. As we'll see, that's very easy to do and customize. Then we have Kept, which is a Git native package manager for Kubernetes resources. Kept is really awesome because it basically allows us to have these shared packages of Kubernetes resources and distribute them using nothing but Git. We don't need any like helm chart repository infrastructure. We don't need to host things or publish things to S3 buckets or GCS buckets. We can just use Git, which we're already using anyway, so it's great to not have any extra infrastructure. It also has next generation live commands for managing cluster resources. Similar to kubectl apply, we have live apply, which we'll get into the details of that and what the differences are. Kept also allows us to make in place edits to resources. Again, we'll get into more of this in the demo. One of the powerful features is this three-way merge strategy where you can pull a package from a remote Git repository, make in place edits to that package, and then later in like day two operations, you can update from the upstream remote and sort of rebase the changes that you made on top of the upstream changes so that by making these in-place edits, you're not like hard forking from the upstream. It'll make more sense when we get into the demo. What were our goals with developing our own auto deploy stage? We wanted a similar out-of-the-box experience that the GitLab auto deploy image provides. Sometimes our customers don't really want to understand how Kubernetes deployments work or anything like that, and they have very simple applications and we didn't want them to have to manage Kubernetes resources themselves if they didn't want to, so providing a similar out-of-the-box experience was goal number one. But at the same time, we wanted seamless upgrade paths between using built-in packages that we might provide and customers that needed fully bespoke configurations. We're also looking forward to using the GitLab Kubernetes agent, and so we wanted to roll out something that would work against the legacy cluster integration, but have a seamless upgrade path as we migrate customers over to the GitLab Kubernetes agent. And then we also wanted, once we start adopting the in-cluster agent, we want to slowly be able to migrate people away from CI deployments to pure GitOps, and we don't want to have to have significant barriers to entry along all those upgrade paths. We want the way you're managing things and you get repoed to stay relatively the same across all these different use cases. That way, there's not a lot of mental overhead at each step. We also wanted to reduce the complexity and improve the extensibility of the auto deploy stage, and we did this by making things in cluster databases an application concern. If you need posters in the database, that's something that you can do in your own application. We don't need to have all these opinionated behaviors in auto deploy. We can just focus on the core functionality. Like I was hinting at before, we didn't want to have to maintain any Helm charts or Helm chart repository infrastructure. We just wanted something that would work with just to get repositories. Another thing that I've already hinted at, we wanted stuff like deploy boards and pod logs to just work, no matter if you're providing your own custom resources to deploy the cluster or you're using a built-in package that we provide. An end user shouldn't have to know what annotations need to be applied to their resources in order to get that stuff to work. We wanted a mechanism by which we could apply those annotations on whatever resources they were providing without every application, every project having to do that and know about it. Okay. So how does the app.dat implementation of the auto deploy stage work? This should look very similar to the original auto deploy stage template that we looked at. It's virtually the same. It's mostly just the underlying implementation that's been swapped out that allows us to do a lot more cool stuff. I'm not going to go over it in too much detail. The gist of it is that we're using kept and customized under the hood where we were using Helm releases before. So let's dig into some code. To follow along, you'll need to get customized and kept as dependencies and then you can clone this repo to follow along through the code examples. We'll start by looking at customized and kept and then we'll see how we incorporate these into the auto deploy stage. Okay. So to get things going, I have this CIML template to get things going quickly. We have the auto deploy stage implementation that we looked at already. We're also going to be running K3s in the CI pipeline itself rather than deploying to a remote cluster. Everything would work exactly the same if you're using the GitLab cluster integration or the new GitLab Kubernetes agent or bringing your own KubeConfig. Really, all you need is a KubeConfig for any of this stuff to work. So let's take a look at what's going on here. Similar to the default auto deploy implementation where you can provide a custom Helm chart. In this case, we're just going to provide a reference to this Git repository. So let's take a look at what that looks like on GitHub real quick. So we can see there's this customized directory and then just a bunch of resource YAML. This one happens to have a customization YAML 2. That wouldn't be strictly necessary for this example to work, but it does. That's great. So let's hop over to our CI pipeline. I've already run this. Let's take a look at what was going on in here. So we deployed nginx to a cluster and we deployed podinfo to a cluster. Let me scroll up to the top here. We can see all these steps that we've talked about have been applied. We downloaded this remote package from the Git repository. We ran these setters. Like I said, setters are very powerful. You'll have to look into that on your own. It's out of scope for this demo. Finally, deploy. This is where things get interesting. So we're using kept live apply, which is very similar to kubectl apply. The main differences are one that the kept live apply performs status checks. It pulls the cluster. So if you've done a kubectl apply, and it sort of returns immediately, it just writes the state to the cluster, but it doesn't wait for your replicas to be available. So we can see that before the process actually exited, it waited until all of our pods were healthy and our deployment was actually considered available. The other thing which is out of scope for this demo is that it supports pruning. So if I remove resources from my Git repository and they've been applied to the cluster, it keeps track of the inventory of what was applied previously and diffs that against what you're trying to apply now and will actually prune those resources from the cluster so that stuff that is orphaned isn't just left out there. So let's look at one last thing here. As you guys may know, the way that the pod logs and deploy boards work, the way GitLab is able to surface that information in the UI is by looking at these annotations on your deployments. And so as you can see, we've taken these resources straight from a GitHub repository that we have no control over. These annotations are not present in the resource camel here, but we've applied a customization on top of the customization that was provided to the end user, in this case, pod info. And so by doing that, we've been able to apply these annotations that GitLab needs without me as a user having to know that that's how it's implemented. So this is really powerful. That's what's powerful about customizes that allows us to compose customizations together to apply these last mile changes on top of what's been specified by the user. So let's take a look at what our day two operations might look like here. So let's say, for example, that we want to make some edits to these resources, we aren't quite happy with what they are by default. We'll create a directory here called deploy. This is arbitrary. And then we'll do kept package yet. And we'll write that to deploy pod info. Okay, great. So we have essentially just cloned this sub directory of the Git repository. It has all the resources that we were just looking at. And now we'll commit that. And then like I said, we want to make a small change here. We have two replicas here. Let's say that we want a minimum of three and a maximum of five. We'll commit that too. But then we've realized that it turns out that the latest version of pod info is v6. And we're on v5 now. We've already made this in place edit. Let's try to use kept to update that package for us. So we're going to run kept package update, the path to the directory, the new version that we want. And then we're going to use this resource merge strategy. So let's see what that does for us. Okay, so as you can see, these in place edits that we've made are still there. And in addition to that, all the other edits, all the other changes that have taken place between v5 and v6 have also been applied. So essentially what kept us done for us here is it's taken the changes that we made, it's pulled in the upstream changes and then rebased the manual changes we made on top of that. It's called a three way merge. And this is really powerful because it allows you to make trivial edits like that to your resources without having to write complicated patches and customize or anything like that, but still ensure that you're tracking tracking upstream changes that you can still pull in changes from the remote package and have your edits that you've made manually persist even across updates. So that's great. The last thing we want to do here, though, is we'd really like to deploy all of these things together. So let's go ahead and do kpt package get and pull in the nginx ingress controller. So now we can see we've got ingress nginx and pod info. Additionally, we're going to get into the deploy directory and do customize knit. So this just gives us a bare bones customization. What we want to do here is add and we'll add pod info as well. The last thing is there'd be no point in using the nginx ingress if we didn't actually have an ingress. So let's do keep code create to generate ingress resource for us here. Okay, the service let's check service name, pod info, perfect. And then the last thing we need to do is actually add it to our customization YAML. Okay, looks good. One last thing. We need to uncomment this guy and we will comment these out just to make our pipeline a little faster. Now let's take a look at our pipeline and this is looking really good. So what we've seen here is that customize has allowed us to compose two completely arbitrary deployments. In this case, we wanted to deploy the nginx ingress controller as well as our application and then deploy an ingress that's using the ingress controller. We can see that that all worked fantastically. One thing that I wanted to point out, a cool thing that that that customize did for us there behind the scenes is that you'll notice that we're prefixing all these resources with a name that is unique to our project. This is just an opinionated thing that we've done in our auto deploy implementation to make sure that if you deploy multiple applications to the same namespace that they don't conflict. And in that case that we had where we had our service name here and then we had our ingress that targets this service, customize is, you know, as distinct from how customize is aware of the resources and it knows that if I change the name of the service here with a prefix or something that any references to that service name should also be updated appropriately. So behind the scenes there this service name reference would have been updated to include the name prefix there automatically for us as well. That kind of stuff is really handy with customize and really makes it a lot easier to manage deployments and compose multiple deployments together, just a lot less that you have to worry about. Okay, so that wraps it up for the demo. Last thing that I just wanted to go over is the bright future ahead for GitLabs integration with Kubernetes. If you guys are not tracking the development of the GitLab Kubernetes agent, you should be. It's a really exciting project that GitLab is working on. Finally we have this in cluster component as well as CAS. CAS is the server side implementation that facilitates all the communication between agent K, which is the in cluster component and GitLab. This allows us to implement all sorts of nice behaviors like having a caching layer so that GitLab isn't constantly pulling your clusters. It also allows you to integrate clusters that may be behind a firewall or NAT, which is great for on-premise. You don't have to have your cluster endpoints publicly exposed anymore for a Kubernetes integration to work with GitLab. So take a look at all these issues and proposals. These are things that we're particularly excited about and really the future is really bright for Kubernetes integration with GitLab. And then finally, YAML pipeline definitions have gotten us quite far. GitLab has also always had really great CI CD tooling built in. But if we think about what the future of autodevops might look like, is that the ideal solution? There's new project coming out like Tecton which allows you to basically have these declarative pipeline definitions that are Kubernetes native. Clearly the industry is headed towards buying in fully to Kubernetes. It's becoming the de facto standard for orchestrating applications and appliances and all sorts of stuff. I think it makes sense for us to think about where does GitLab fit in with all these evolving technologies and what can we learn from projects like Tecton? So again, lots of interesting proposals around the future of autodevops, the future of compliance pipelines, and this one project that I wanted to call out. There's a proof of concept called Labflow. It's basically like how we might represent, basically like lessons learned from build tools like Bazel and whether it might make sense for us to have a scripting engine built into runners to execute arbitrary code. That wraps it up. Thank you for attending my virtual presentation. Look forward to the Q&A.