 Cool, so forgive me if things are a bit rusty initially. This is my first time doing a presentation at a conference with an audience this big. So I'm going to be pretty excited. So like Priya said, we were early adopters of Kubernetes. And we've kind of migrated from AWS to GCP over time. We never existed in AWS and GCP kind of simultaneously. We kind of did a fairly quick cut over there. So I wouldn't characterize this as multi-cloud today. We are definitely multi-cluster, though. And we're moving between cluster technologies. So we started in GCP. We have our own tool and called Terraflop, which I would love to open source one day. But it basically builds clusters that aren't GKE clusters. We are moving away from that to using GKE directly. So we do have two flavors of Kubernetes clusters in some ways that are kind of as different as maybe like AKS might be and GKE. So yeah, Kubernetes. This is the tool that we built and we'll talk about it. So brief background again. So I work for a company called Planet. We have the largest Earth imaging fleet of satellites in the world. So we've got 300 of them up in space right now. They're about like a big for the most part. They're called keepsats. Started off as kind of like a research project with some folks at NASA and eventually grew up into this company that we have today. It's kind of cool. They all kind of form a ring and rotate around the poles. And as the Earth rotates during its daily orbit, they kind of scan the planet like a line scanner. And then we kind of stitch all the imagery together. And every day we image the surface at a resolution of three meters per pixel if that means something to folks. I guess it's like the smallest object we could resolve would be about three meters big. So we can't really see people like on the beach or anything like that. Yeah. And like I said, we've been in Kubernetes since 2017. We started in AWS. We brought up our clusters using kind of a combination of CloudFormation and some Python tooling. So we've been strong proponents of infrastructure as code, configuration as code, as long as we've been doing this. Definitely trying to avoid situations where we're going into GUIs and pressing buttons to create clusters. And things kind of get out of sync over time. And I am a staff engineer on the Kubernetes team. So when we were in AWS, I was working on the tooling that deployed our Kubernetes clusters in AWS. I worked with Nick Cope at UpBound on the tooling that did it in GCP. And now I'm working on getting us migrated over to GKE because as cool as Terraflop is, we don't want to have that maintenance burden of maintaining our own Kubernetes deployment technology. Cool. So what were we looking for when we built this deployment tool? So Kubernetes, it really helps with a lot of the challenges that we saw. So like I said, we're really strong proponents of configuration and a lot of infrastructure as code. So we basically want all of our microservices to be described in revision control, how they're deployed, basically they're Kubernetes manifests. We want those all in revision control. We don't want folks doing kube-cuddle edits directly on resources. Although we allow that, it's sort of an escape hatch in case of an emergency, for example. We also wanted to do, we do secret management and revision control, so they are definitely encrypted. And this is kind of nice when you have this kind of combination of your configuration, your infrastructure, your secret management and code. If you need to go and stand up that application, you're not chasing down some external secret store. It's all kind of self-contained in an individual project's repository. And then that kind of leads to this other thing, is so different projects have different repositories where development takes place. We don't have a monorepo at Planet. We kind of messed around with Bazel for a while and sort of gave up on it. But from my perspective as one of the engineers that's running our Kubernetes clusters, I don't want to have to go and chase down people's deployment artifacts in the 50 or so repositories, probably more, that we currently have. It's really nice for me to have this single view into all the services that we've deployed. And we kind of do that with this effectively like a configuration monorepo. Kubernetes helps kind of orchestrate that for us. So when you issue a deployment, we're actually creating a commit to this repo on your behalf or on your pipeline's behalf. And that lets you do really cool things. Sometimes you're just looking for inspiration. How is this service setup? It's readiness probes. You can just go and rip-grip through that repository for readiness probe and see how they've done that. That really helps with information sharing. I think it's with deploying things in Kubernetes, there's this headache of, I have to swap all this context in about this Kubernetes resource API that can be kind of daunting at first, like deployment services, ingresses. It's really nice to have something to get you started there. And then for us, it's just really nice to audit. Disaster recovery, so by having everything in revision control and forcing that every time you make a change to the cluster, to your deployment manifest that they go into revision control, if we ever need to go and spin up a new cluster, because maybe the NCD instance that's backing our Kubernetes cluster just got totally fried, all three of them, all five of them got fried, we can go and bring up a new cluster and quickly deploy the applications that need to be running in that cluster once we bring it up. And then provenance. I kind of touched on this. So by having all of our operations go through Git, you get all the nice cities that you get with Git. I can do a Git diff between two commits and see, oh, they changed their labels on this day. And maybe one of those labels took that service out of its service rotation. So that could be really cool. You can also just get log works. Yeah, it is great. Simplified multi-cluster deployment story. So this is sort of our touch point with the multi-cloud con as well. The way that you specify which clusters you're going to deploy to with our tool, it's really just like an array of cluster IDs. We handle all of the authentication behind the scenes authorization for getting resources pushed to those clusters. I'll touch on this later. It is really just like an array. We're not doing anything too clever around orchestrating. Like if you go and remove one cluster from this array, we don't do anything smartly deleted and drain traffic. That is still kind of a high-ceremony event that I would probably be involved in helping teams kind of orchestrate that. So probably many of you are wondering, like why did we go and build this? Like there are things out there like Spinnaker, there's Helm, Helm3 recently came out. So why did we go and do this? Well, one of the answers to that question is we built this about two years ago when a lot of that tooling was less mature. And that's probably one of the major ones. I think actually hearing some of the talks today and even like a week ago Helm3 came out, I think the landscape has changed quite a bit, but we can get into that. So yeah, some of the alternatives, the top link, I don't know if my slides will be shared, but that describes what effectively became a tool called Customize, and that dock is a link to the RFC for that tool. It's actually kind of like an interesting bit of Kubernetes, RFC, how that process kind of goes down. It's just really well documented. I think there's like some really great ideas in there. We might not have agreed with all of them, but I think it's kind of interesting just to read it and see that there's these common pains. I guess like briefly, so one of those common pains is once you start having your Kubernetes resources described with the YAML files, you quickly kind of have this temptation or need rather to customize them for like specific environments. So you don't wanna go and like have a fork of your YAML files for like your development machine for your staging environment, for your production environment. Oftentimes like most of the details are quite similar and you're for example using different CPU requests in one environment. You have a slightly different ingress resource in another. Copy, pasting, those things would be kind of a nightmare. So Customize provides a templating solution for this as it does Helm and we made our own choice. We used JSON it in Kube release. The second bullet there, Weaveworks, they're kind of like thought leaders in the get off space and they have a product called Flex CD. So that one's actually evolved quite a bit over time. I think that would also be worth checking out again. Honestly, can't remember exactly why we didn't choose it two years ago, but it's definitely evolved quite a bit over time. Helm, so Helm was definitely on our radar. We were kind of uncomfortable with like the notion of tiller as this component that sort of breaks our back, has funky limitations, it manages its state and config maps in its own namespace, but if you have two deployments with the same name, they conflict with one another. Like funky things like that made us uncomfortable with using it. Plus the templating language is basically go line templates that's splat out into a YAML file. So they kind of have to be, they're not contextually aware at all. They're basically, you're just outputting strings into a document that happens to be structured as a YAML file. So you'll see funky things like in order to make it correct, you have to indent your like output so that it matches what the YAML indentation level would have been. If folks have used Helm, you're gonna be familiar with this otherwise that probably sounds totally insane and it is as funky as it sounds. Spinnaker, so Spinnaker, we actually put in like a pretty serious effort to look into Spinnaker. At the time when we were looking at it, it was sort of clear that they had transitioned from this product that had been designed for things like managed instance groups or like MIGs in like AWS. I'm confusing my terms now because that's the GCP term. The transition from that to kind of overlaying their model on top of like Kubernetes deployments. So it felt like there was a slight impedance mismatch there that like they were just sort of getting over their like Kubernetes implementation but also I think basically been in transition from like kind of a V1 to this V2 model. So that kind of gave us some concerns. We also didn't want to necessarily have the complexity embedded in what would then become Spinnaker pipelines. So one of the things that gave me pause was I like wanted those to also be describable in sort of a configuration as code methodology and didn't quite seem to have that at the time. So I've kind of talked about like what we're looking for and now I'm gonna describe like what it is that we built and how we piece it together. So Kubernetes like effectively is us putting together various pieces that already exist and composing them into an opinionated toolkit for doing our deployments at Planet. So I mentioned secret management and encrypting secrets and revision control. We do that with a tool called SOPS. SOPS is pretty great. You SOPS edit a file, a YAML file that'll transparently decrypt it for you and you like VIM right quit. It'll encrypt it. You can use different KMS systems. So we use Google KMS. You could also use like GPG keys. The experience is pretty slick and it has like it'll has integration points with various key management services which is super, super handy. So yeah, Google KMS. We're kind of opinionated on how we structure this and again KR kind of encodes our organizational practices into basically its defaults when you use it. So we have a single project that has a key ring and on that key ring are per application keys that we generate when you register your service with us. One of the reasons for having like a single KMS project is just like ease of management for our group. We need to be able to like for example assign our deployers the ability to basically decrypt any of the application keys that it need to. Now that's kind of a fairly powerful permission to have. Want to have that project kind of locked down so only like a select few can do that. So that kind of led to having that single project for a KMS key. GitLab jobs. So when you actually conducted deployment with the tool that actually runs inside of GitLab CI as a job which is pretty great. We didn't want to go and like build our own UI around this. We didn't want folks to effectively be running Kupacuttle apply from their laptops because you don't get visibility into when like folks necessarily did that among other things. But like we want to have kind of like some record of like this deploy ran at this time. Here's the output from that deploy job. And you get that with GitLab jobs for free. It also contains basically a Python click based command line interface. Getting some of the details here. Click is cool. I don't know folks have used this pretty much in every language that I write code in. I want to have something that's like click whether it's like Cobra or Kingpin and GoLang or click in Python. Makes it really easy to like build out just like command line interfaces. And there's a Python flask based API server. So when you issue a deployment, we basically will foist or will hoist, sorry. You're the contents of your deploy folder into the GitOps repo. And we do that by talking to the API server which kind of then uses the GitLab API to do repository manipulation in kind of our opinionated manner. And Python cookie cutter to provide some application scaffold. So I mentioned about how it's nice to be able to go into our kind of monorepo and examine other people's deploy manifest just to get inspiration. One of the other things that kind of helps people get bootstrapped and up and running is we have some tooling to create a simple HTTP service. And I think that's one of the main draws for things. That's one of the draws for tools like Helm is people do just want to be able to Helm install things. I think this cookie cutter basically scaffold that we provide, I mean it's pretty simple but it does actually provide a lot of mileage and I think it does give you this warm, fuzzy feeling when you go and bootstrap and you service using this tool. Cool, and we'll go through some of these moving parts. Okay, so one of the things that we ask our developers to do when they create their services to write this deploy conf. I was told there's a laser here. Sweet. Okay, yeah, so this is kind of, this is the name of your service, this is the strategy, so this is the templating language that you're gonna use. We typically advocate for JSON, we do offer a JINJA templating as well because some folks just find that easier. Like if you've already deployed, if folks had already deployed their services before we create a Kube release, a lot of them were using YAML. So it was easier for them just to use a JINJA strategy with their YAML files. This little bit right here is cool. This maintainers array lets you specify the folks that should have access to deploy the service and also to decrypt the secrets files that are contained within your deployment manifests. So our registration time will kind of enforce that you're actually a member of the group that you specify here. And then, yeah, and then anybody that's, like typically this is like a Google group, so we use Google groups for our various teams and if you're a member of that team then you have access to that service. And again, we'll create a KMS key on the curing for you during the registration. And just because of where this happens to sit, so this is totally unrelated to deployment but kind of by virtue of having every service specify this file for us, we did a few drive-by additions of metadata, like, oh, wouldn't it be useful if you also specified an open API schema? Maybe we could surface that in a bit of a service catalog. So yeah, we ended up doing that and I'll see a quick screenshot of that in a moment. And I said it was not shown but I actually modified the screenshot, so we do have this array of clusters that we deploy to you. So that is effectively our quick tool to get multi-cluster deployments going. Cool, okay, so GitLab Merge Requests play a major part in this. So when you issue deployment, we create a merge request with your change set. That merge request then also has a GitLab job that will run against it. If the deployment succeeds, we merge the merge request into the master branch, otherwise it will remain in kind of an open state and it's up to the deployer to go and kind of investigate and see why the service did not deploy successfully. The merge requests, so we're using the GitLab API to do this. So shout out to the GitLab folks. Their API is pretty slick. And yeah, we label them according to a few bits of metadata, like the cluster it was deployed to, the org, what's that, that's the service, and then what we call the tier, I believe. Tears sometimes don't like overloaded terms of planet and environment felt overloaded. So we decided to come up with our own nomenclature here, but you can just think of this as end staging, for example. But you can do cool things, like you can filter this UI by just clicking on these labels and you can see all the deployments that went out for a particular service and quickly browse that. So here's an example of our JSONID templates. What's of note here? We do have an internal library of helper macros that provide sugary wrappers around Kubernetes resources, so those are exposed through these helper libraries. We're actually using something called JSONID Bundler to vendor this library and then manage it using semantic versioning. JSONID Bundler is not a ton of activity on it, but it actually does what we needed to do here. So that works okay. I do kind of wish that like the vendoring story was a bit better, but it's sufficient. You can see here, so this is how you then access your values and your secrets within this template. Typically what you do with the secrets, for example, is you can just shunt this into a secret, a Kubernetes secret resource, and then you'd write your secret key reps and what have you. The KR metadata provides metadata about the deployment so you can introspect which cluster you're deploying to. Potentially, maybe you've got some kind of like workaround that's cluster specific that you have to apply. I don't think we have any of those at this time, but you can get like useful information out of there. Yeah, and then this is pretty much like, there's like the ability to check if a key exists. You can do string substitution and interpolation, all that kind of fun stuff. I haven't shown like four loops over here, but if folks aren't familiar with JSONnit and they're looking for an alternative to like let's say Jinja and Yaml, I felt pretty good about using it. Here's an example of one of those merge requests and like the kind of diff that you might see during a deployment, and here's an example of a GitLab job log for a deployed job. This output isn't super verbose, but you can see here, for example, we basically, effectively what this is just doing behind the scenes is typically issuing a kubectl apply, and then it'll do a rollout wait to see if that deployment achieves stability. There's a configurable TTL, so you can tell it to wait up to five minutes in this case, and if it times out then something has gone sideways potentially with the deployment. And we're using trigger variables here to basically get parameterized GitLab jobs. That's kind of a, I found that wasn't like super ergonomic to figure out, but yeah, it eventually worked. So, yeah. If folks have played with that functionality. And GitLab. And here's kind of like the UI that we kind of, quickly built this kind of like a hack week project or a hack day project for taking that deploy conf metadata and surfacing it. So I'm looking at a single service here, but you can see the list of maintainers. If I click on this, it'll take us to the Google groups page for that group. That's kind of helpful. Like if I'm curious, like if we onboard a developer for a new team and why don't they have access to their secrets? Like are they a member of their respective Google group and what Google group is that? This makes it quick to navigate. If you click like this, for example, it'll take you to the source code for the repo. And depending on the metadata that you specify, you'll get different buttons here for like dashboards, logs, run books. The actual repo itself has this GitLab CI.yaml file. And this is basically, this is the finds like the trigger job. There's a command in the CLI that'll bootstrap any repository and turn it into a Kube release compatible GitOps repo. And one of the things that it does is it splats out this file for you. Did it do? Yeah, and what's not shown here in that bootstrap script is it'll also configure the repository with the necessary like environment variables for the runners that basically build off of it to have necessary permissions to deploy to the clusters that we allow folks to deploy to. This repo is like typically locked down only a few folks like in like view the repository and or have admin access to the secrets because there's some powerful secrets in there. This screenshot could have been a bit better but this is the list of services. Now the interesting things are when you click inside of here you can see the JSON file and you can also see the like environment specific folders that were opinionated about that contain your per environment values and secrets files that will like end up overlaying for you. But yeah, this is kind of like an example of like there's this folder structure now and just having this is nice. You can go and this is kind of like our single pane of glass. Cool, so wish list. Like what do we wish this tool could do for us? So one of the things that we're doing. So DNS management. So I said we deployed multiple clusters. We don't simultaneously currently route traffic to deployments that exist in multiple clusters. So one of the last mile steps of let's say migrating from one cluster to another is updating your DNS records to point you to the cluster ingress. Kubernetes could do that. It currently doesn't. So there's this out of band step that you have to do during a migration to do that manually via like the DNS tooling which in our case would be GCP's cloud DNS. Better logic for resolving the cluster addition and removal. So again, if you remove like a cluster ID from that array of clusters, we just stop deploying to it. We don't actively go and like reconcile it and say, oh, you removed it. That means you want to delete the service. That's kind of a high ceremony event. And delegating less state to GitLab. So one of the things that we're doing too is like a lot of like the information about the services exist in these deploy comps, the merge requests are used as a kind of a locking mechanism. So if you have already an open merge request for your service, we won't let you deploy to it with current deployments. Now, that's not really a proper semaphore. It just happens to work because of the, we're not issuing thousands of these requests, these deployments per second or anything like that. So it happens to work, but if we had like Radis, for example, we could actually create like a proper lock there. It would also facilitate some more kind of navigation of the API resources that we have. So like right now, when we get those deploy comps, when I do that service catalog view, it's actually just like loading it all from basically the repository, it's reading the files, and then eventually it'll cache it, but that could be a bit quicker. So convergence, convergence is actually like a pretty important topic in this whole thing. So when you've got your people deploying with this kind of like GitOps mindset, but they also still have the ability to kubectl edit, you kind of want some way to guarantee that what actually is in version control is like the true state of what's in the cluster. There are tools like kubediff that can kind of help figure out like what is actually specified in like your rendered manifest versus what's in the cluster, but we don't currently use those. So generally this hasn't been a problem like folks are good about only deploying with the tool that we've given them, but maybe you're firefighting one day and you decide to like increase the CPU requests and then you don't commit that back in diversion control. That would potentially get clobbered the next time you deployed. One of the things we could do for example is to take away like the permissions from folks to do like kubectl edits, but we felt that that would be kind of a draconian choice at Planet just how given like how this is the size of our organization and our team's function. And canary deployment. So it'd be great if there was a better story about canary deployments. For example, you wanna like bleed traffic over slowly to your next revision. We don't automate that in any way. I think like maybe the right way to do this would be more of like a custom resource definition route which would then be, would just specify what your canary deployment looks like and that would be pretty static. I don't wanna bring like a bunch of state into KR and like build complex pipelines around orchestrating those if that could be avoided. So would we still build kuberelease? And my thinking on this is really sort of evolved even over the past week. I think it's been really quite successful and all the things that we wanted it to do for us. We did get, we've had over 10,000 deployments. It's got 100% adoption, our product engineering group. So everything that's deployed is deployed using this tool. We're only deploying to a Kubernetes clusters. I heard 20,000. I thought eight was gonna be a nice number but then somebody said 20k so it's still a good size. It's enough to kind of like force you to face this problem of multicluster deployments. And we're always able to answer questions like when was the service last deployed? What changed between deployments? So like those kind of nice and easy get out of get ops we achieved. Cool. So I think like it'd be worth taking another look personally I had helm three these days. Flex CD would be an interesting alternative. I think like fundamentally like KR is still kind of encoding some organizational practices like how we do secret management, how we manage KMS keys that are kind of specific to the planet. And it's nice that we have a tool instead of no tool around that. So yeah. That was amazing. And you said this was your first big talk. You were totally a pro right audience. Didn't he just nail it? So would you like I know we're like almost at time for the panel but maybe we could take a couple of questions. Yeah for sure. Yeah welcome questions. All right I'll start with you and then I'll find whoever else. Have you tried Argo CD? Are you familiar with this tool? Argo CD is sort of like a DAG GoLang based pipeline building thing. Oh continues to develop. You know I think that might have been on our list of like tools to check like a couple of years ago even but no I haven't I didn't personally investigate that one. Yeah I mean it has quite a lot of features that you've built. Yeah. And it's used by folks that into it. Yeah definitely worth checking out I think. Second question just a quick one. So Monorepa versus multiple Reapers why Monorepa? So we ended up with this hybrid approach where like teams can still do development within their own repositories. So like when I commit my deploy artifacts I do it within my own repository. I don't do it in our kind of Monorepo GitOps repo just because like so from my point of view it's easier for me as like the person that manages Kubernetes clusters to have that single pane of class. For me as a service owner I don't want to have to go into this other repository that I don't really manage and like at the end of the day I'm not super opinionated on. So I want my deploy artifacts to live in proximity to my source code as a service owner. As a cluster administrator I want that like Monorepo so that it's easier for me to go and like let's say deploy everything that used to be on one cluster to a completely different cluster. So we kind of have a hybrid approach there where we help shunt your data, your deploy artifacts into like the GitOps repo but you still do your development within like your local service repo. If that makes sense. Yeah. Any more questions? All right then. Cool. Well, my favorite line from your talk was by the way environments is a loaded term. All right, thank you so much Jacob.