 All right, so really glad to be back here first off. Doing an in-person presentation, in-person conferences has been so long, and it's one of those things I never thought I'd miss until it was taken away. So I'm just really glad to be here. I'm actually going to take off the mask for a bit. A lot easier to talk. And I'm happy to be here talking about this subject, managing Kubernetes webhooks. Webhooks have been kind of a thorn on my side. I'm sure a lot of others manage them in Kubernetes. Comes with a lot of challenges, and we've been able to use Spire to make the webhook process a lot easier. So I'm Faisal Lemon. I'm a software engineer at F5 Networks. Previously, I worked at Nginx, and Nginx was acquired by F5. Nginx, of course, is the open source web server load balancer solution that a lot of you guys are familiar with. And I'm sure are using in your deployments. The picture there, of course, is my daughter, Salma, and our cat, Marley. We just recently adopted her about six months ago. Very naughty little cat, but a lot of fun. She likes jumping on shoulders, so we think she was a parrot in the previous life. So the agenda for today is I'm just going to talk a little bit about the challenges of webhooks, just to set the table. So then digging into how we use Spire to solve those challenges, and then some Q&A as well after that. So let's go ahead and dig into it. So a little bit of refresher on the webhook side. So first off, what is a webhook? So in the world of Kubernetes, a webhook is a callback. And it runs before Kubernetes objects are created, updated, deleted. So any sort of modification of the Kubernetes state, a webhook runs. And then once that webhook runs, you have the opportunity to do something. And so what is that something that you can do? And they typically fall to these two buckets, the two types of webhooks. The first one I have listed here is a validating webhook. And that is used to, as it says, just validate configurations and make sure that they are OK. There's nothing weird in them. There's nothing that's illegal or something you don't want, some state you don't want. A very common use case of the validating webhook is in the custom resource definition, or CRD. And when you create a CRD, you obviously have your custom resource. And there's custom rules as to what's acceptable and what's not acceptable. And so you use validating webhook. You have the opportunity to reject and valid configs. A good example in the KubeBuilder book is that they go through how to create a CRD for cron jobs. And so the validating webhook they have there will reject invalid cron configs. A second type of webhook is a mutating webhook. So if you use service meshes like Istio or others, they typically deploy some sort of mutating webhook to modify the pod to inject the side card there. And so that's the mutating webhook and the other comment. So what makes webhook so challenging? And I think the first one really is that you have to provide a certificate, a key, as well as a root CA certificate. So you have to provide all three of these pieces of information, two Kubernetes, and they all have to be kept up-to-date, kept fresh, rotated. And it's up to the operator to do that. The certificate and key have to be saved on disk, which is another challenge. KubeBuilder is more in-memory certificates. The webhook system still requires certificates to be on disk, and that's another challenge, another step in the process. And if anyone knows how to do webhooks without storing stuff on disks, I would love to hear how you accomplish that. And then the root CA certificate has to be provided into a Kubernetes object, the validating webhook or the mutating webhook configuration, in the CA bundle field. And you have to keep that up-to-date. So sample configuration here is just a validating webhook configuration. It has some sort of name. And then you see the CA bundle there field. That's the base 64 encoded root CA certificate. But so is the operator of a webhook. It's your responsibility to keep that up. So how do you manage webhooks currently within Kubernetes? There's a couple of different options. You can use a long-lived certificate. It's a very easy way of getting around the problem. You just have it expire far in the future. We're guilty of doing this as well internally. Not the best way to do it, but it does work. The challenge there is manually, you have to rotate it manually. But if you keep the expiration date, like in the year 3000, probably not a big problem there. But the biggest problem, of course, is that if those certificates leak, it's a security liability. How do you evoke it? So typically, you don't want long-lived certificates. The standard solution that people use for managing Kubernetes webhooks is CertManager. If you follow the CoupBuilder book and the steps they put in there, they have a section on CertManager and how to use CertManager to manage Kubernetes webhooks. It does the auto-rotation for you, which is great. The problem with CertManager, I think, is that it has a lot of overlap with Spire, the most kind of certificate management solutions and deploying both. I think it has a lot of overlap. It has a lot of operational overhead. And for us, it was out of scope just because the only gap that we had with Spire was the webhook, and we didn't want to deploy CertManager. Just purely for the webhook use case, it's a lot of a new tool to learn, new logs. And so we wanted to avoid going down that path. And so the solution that we came up with in working with the Spire team, the contributors and maintainers, is to use Spire to manage certificates. And the solution that we have on Spire is based a lot on how CertManager does it as well. So moving forward, how do we actually use Spire to manage those certificates? There's three basic steps that we need to do, and we'll go through them in detail. The first is that we need to keep that root CA certificate fresh within the validating webhook, mutating webhook configuration, the CA bundled fields. We have to have a way of rotating and retaining that field. And then, of course, we need to create an entry on the Spire server for the actual webhook pod. The server that's going to be servicing the webhook needs the certificate. In order to get that certificate, it first needs the entry. And then on that pod, we need to actually save that certificate and key to disk. So those are the three logistical steps. And just to note that this requires Spire 12.2 or later. That's the release of Spire that we got these changed. The first part of the solution is keeping the CA bundle fresh. This is actually one of the easier parts of it. And the way we do that is using the CA's bundle Notifier plugin. And so if you're not familiar with this plugin, what it does is it takes this Spire root CA certificate and pushes it into a config map, which then you can use to bootstrap your Spire agent. So if you're not using this plugin, you should be. And so what we've done is extend this plugin to also, in addition to pushing the root CA certificate to a config map, we push that root CA certificate also to the webhook configuration. And so the configuration is webhook label, and then the name of the label that you want to use. And then what Spire server is going to do is it's going to filter on that label and pods, or sorry, not pods, webhook configurations with that label. I'm in it. Spire will manage the CA bundle field. So the first box there shows a Spire configuration. So you just specify the actual label you want to use. And then the second box, with the actual webhook, you put in the label field whatever label you specified. In this case, just spiffio slash webhook. Cole and true. And then if that is there, when you deploy the validating webhook configuration, you just keep that CA bundle field clear. And then once it's deployed, we have a watcher on the Spire server. It filters on that label. So anything with that label, Spire is going to push the root CA certificate into the CA bundle field. It rotates and manages it for you. So whenever the root CA certificate is about to expire, it goes through all the validating webhooks, mutating webhooks that have that label set and rotates the CA bundle for you. And on startup, if you deploy the validating webhook first, once Spire server starts up, it gets a list of everything that has that label set and populates that. So all you really need to do is just really deploy your webhook with that specific label and then Spire server the rest for you. So that's step one. And so that manages the root CA field within the webhook configuration. The next thing we need to do is create an entry on the Spire server. There's two ways you can do that. You can just do it manually. You can go onto the Spire server CLI and just do the create entry as you normally would. The only thing you need to pay attention to is a dash dash DNS. So you need to create a DNS name in this specific format, the name of the service that you've deployed along with the webhook.namespace that it lives in.svc. The Kubernetes API server expects that DNS name to be in the certificate. Other than that, whatever you create your entry can be whatever you normally create, whether it's pod name, pod UID, whatever it is that you need, so as long as the pod is associated with that entry. You can also use the Kubernetes workload registrar. That automatically will also add that DNS name that's required to the certificate for you. The CRD version of it has that configuration enabled by default, so if you deploy that, then your pod that services the webhook comes up that will get the certificate with the correct DNS names in it. So now you've created the entry. The most complex part of the whole solution is saving the certificate and key to disk. So there's a lot of code here. The first thing to keep in mind, though, is this is just based on the Ghost50 example, the watcher example, sorry, and the Ghost50 v2 directory. So what you're doing here is we're creating a watcher that watches for new certificates using the Ghost50 client, and then when the new certificate comes in, we save it to disk. So the first thing to pay attention to here is the imports. I have an import in my private GitHub, or not private public GitHub, that just has some routines, two routines to save to disk. If you don't want to import yet another library, you can just go there and just copy those routines. That's not a lot of code. But in case I'm moving forward, we create the Ghost50 client, as you normally would, in this first step here. Let me show up. It's OK. So that first block there. And then the second block there, we started using the Go function. So that starts the Ghost50 client. And then after we start the Ghost50 client, we wait for certificates to be on disk. We have to have a barrier there, because if we try to start the webhook server without the certificates in the disk, it throws out an error, and then crashes and goes into that restart loop. So you have to make sure the certificates are there on disk. And so I have that in my GitHub. There's a little library function that just waits for the certificate disk. The wait for certificates to dysfunction just waits about three minutes. And after three minutes, it times out and gives you an error. And then if the certificates are there, of course, it works. So that starts up the Ghost50 client. The thing to keep out for, one more thing to keep out for, is this X5 and I in watcher field. So we're passing the directory that we're saving the certificates to. To the next part of this slide, which is the watcher struct. And so if you're familiar with the Ghost50 client, what happens is that every time that there's a new certificate, this on context X5 and I in context update gets called. And so what we're going to do is when that gets called, all we do is just write the actual svid out to disk. And so that's also just kind of encapsulated in this library function. But it just takes, extracts the default svid and then just saves it to disk in the directory you specify. The file names are static, cls.ct, cls.key, because every webhook API that I've seen in Kubernetes just requires it to be on those files. And the only configurability you have is the actual. So you just pass in the directory and then we save it here to those file names in the webhook server. So you do it. And then once that's all set up, that whole system will then just keep refreshing the certificates on disk for you. And then you start your webhook server normally either using the control runtime APIs or whatever other webhook setup that you do. Looking forward, I think we need a better way to save certificates to disk. The save to disk case, I think, is a little complex one. There's different ways to do it. I think two options here is just having Ghost50 directly save the certificates to disk. If it can't do that, then just have export some utilities that I have in my GitHub just have those in the Ghost50 library so that the user can just save the disk. Another option that I think would be much more clean and nicely integrated is just to have Spire save the certain key to a Kubernetes secret object. And your pod mounts that secret object, volume mounts it, and then the certificates will live on disk because that's how the secrets are mapped. So I'd work out nicely, and that would also parallel what the cert manager does. Cert manager saves the key and disk into a certificate CRD, which is basically a secret object. So it'd be a nice, nice parallel there. Also great is API service CI bundle. So we extended the plugin from webhooks to API services. So if you're not familiar with API services, it's used to extend the Kubernetes API server with custom API endpoints. We use this in our product to have some custom metrics endpoints, but nonetheless, it has also a CA bundle field to use for the Kubernetes API server to use. And so we added some functionality there using the API service label config to the same plugin. And then you just deploy your API service with that label and then Spire will manage those certificates for you as well. API services are nice because they don't require certs and keys to be on disk. So you can use a GoSpiffy, TLS peer functionality. I'm not worrying about saving stuff to disk. If there's any other Kubernetes objects anyone knows about that have a CA bundle that needs to be rotated, let myself know or open a ticket. We could take a look into that and see if we can add functionality there as well. And so where all this is coming from is our nginx service mesh. That's what we're working on at F5. So we embed Spire and deploy it as part of the service mesh, deploy the Spire server, the agent, and then we use Spire to manage all aspects of identity, MTLS, webhook certificates, API service certificates, secure NATs for communication, secure communication, sorry, secure control plane communication. We secure through NATs using certificates provided by Spire. So we built our service mesh using Spire as a first class citizen and all of identity and security through Spire. So that's my talk for today. So let me know if you have any questions. Well, I've got one. You mentioned on the last slide that the nginx service mesh is a Spire for all of its identities. Does that mean that you don't run Cert Manager in your Kubernetes clusters or you run a little of both or how have you? Good question, no, we don't run Cert Manager. And yeah, it was turning out that the only thing that we needed Cert Manager for was that webhook certificate. And yeah, I'm working and getting the stuff attributed to the Spire code base. We just use Spire for. Amazing. Now there's a question over there. Questions. Just a simple one, which is, have you done any scale testing with using Spire for doing the CA rotation or creating the CA? So questions, have we done any scale testing? No, what kind of scale are you looking at? I'm just wondering based off the way you say it behaves where when Spire comes up, it then generates CA bundles. I'm wondering how many of those can you do before you get to that three minute. Also the three minute timeout is for just when Spire delivers the certificate to the workload. And so that hasn't come up. I mean, that hasn't been a problem for us. The testing that we did was just to see that we put like a short lifetime on the CA bundle and then to make sure that it keeps rotating. But we haven't done a whole lot of testing if we have like 5,000 webhooks or anything like that. We haven't tested it at that scale. That's a good, definitely a good suggestion, something we should look into. Shameless plug on that. On Thursday, I'll be giving a session with another community member on scaling Spire to one million. And they're running Kubernetes. So if you have questions like this, I highly recommend you attend that session. Great question then. Any others online or in person? Let me go ahead and say no. If any questions come in on the platform after the fact, we'll just drop them in slack. Thank you again, Faisalas was awesome.