 Hey, welcome to Rotate Roots Right Round using CertManager for safer private PKI. You don't need to know what CertManager is and you don't need to know what PKI is. Hopefully by the end of this you'll know and want to use both of them, but obviously it will help if you do know both of those tools. I'm Ash, here's a picture of me with a lot less hair. I'm a senior software engineer at Jetstack by Venify. I've provided a load of ways you could reach me here. Feel free to reach out, I'm happy to talk. Always interested in talking about certificates to the point where actually I did a live stream recently where I was described as a professional certificate nerd and I'm kind of going to wear that as pride and take that as my new title going forward. I'm also a CertManager maintainer which is helpful because I'll be talking about CertManager a lot in this talk. Without further ado let's jump into it. What is CertManager? We need to make sure that everyone's on the same page before we start. CertManager is the easiest way to automatically manage certificates in Kubernetes clusters. We like to say this because it's kind of a guiding mantra for us. We're trying to aim for this in everything that we do. You can actually see this quote at the beginning of every CertManager release that we create. That's because, well technically it's because when we share the release on social media this is the line that we want people to see. There's just a little insider detail but this really does matter to us and it's what we try to do. CertManager is a CNCF incubating project. That's part of why I'm here today at KubeCon. But it's also kind of unique in its role. By that I mean CertManager is really the only tool in the CNCF as far as I can see which kind of does what it does and that is it helps you to manage certificates on Kubernetes. We've also recently reached the milestone of hitting 10,000 stars on GitHub which is a really different kind of achievement to being CNCF incubating but still really satisfying and cool one and we're really chuffed about that. The core of CertManager is really the issuer. By issuer I mean how does CertManager know where to get certificates from and that's a decision that you get to make. In this talk we'll be talking about the CA issuer which is built into CertManager but actually there are loads. Here I've given an example of HashiCop Vault, Venify which is my employer and Let's Encrypt. Let's Encrypt or ACME is probably the most popular choice for CertManager because the certificates it produces are publicly trusted i.e. most devices including the computer on recording this on will have trust for Let's Encrypt without even needing you to install anything and that makes it a very attractive choice. It's not an appropriate choice for private PKI which we'll see later. So what's PKI then? I've mentioned it a few times and we need to know what that is first. Well it stands for Public Key Infrastructure but that doesn't really help much. PKI can basically be thought of chains thought of as change chains of certificates. Private PKI simply means PKI that you control that is a certificate chain that you have full control over and you can do what you like with it. You don't have control over the issuers used by say Let's Encrypt or DigiCert but you can have control over your own. The advantages of this are many. One is that it's cost effective so some issuers especially older ones will charge for issuance or there are other tools that will have a cost associated with them. Private PKI has none of that. The only thing you're really paying for in Private PKI is the cost of the machine to do the actual cryptographic work which is not nothing but it's pretty much negligible in the grand scheme of things. Obviously there are other choices for issuance that are free though. So mainly the bonuses are that first of all you get total control over the certificate you issue. That means you can do whatever you like. If you wanted to use Spiffy which we're not going to go into today but if you did want to use Spiffy well Private PKI lets you do that. You can totally issue a Spiffy S-Fit if you want or any other type of certificate. There's also no artificial rate limits here. There's nothing wrong with the fact that Let's Encrypt has those. They use it to defend the service and I love Let's Encrypt but it can be frustrating if you run into their rate limits and it can cause you problems over a time period. You don't have to worry about that here. And just in case this certificate thing that I keep on talking about is going over your head, certificates in this context of what we need to do TLS or SSL. Many or in fact most services today need certificates. That's because they want encryption in transit. When there's data going over the wire we want it to be encrypted protected from attackers, protected from modification, all that good stuff. That applies just as much in Kubernetes whether that's inside the cluster with pods talking to each other or with traffic coming into the cluster. We almost certainly want any publicly facing services to be encrypted at the bare minimum at the bare minimum. So I mentioned earlier about certificate chains and this is what usually a certificate chain will look like. For example this is what Let's Encrypt has if you're getting a Let's Encrypt certificate for your device. Usually there's some sort of root certificate and that certificate will be self-signed. That's often a confusing term and in this context it's important for me to define that. Self-signed simply means that the certificate's public part has been signed by its own private key. It doesn't mean private. In the context of private PKI means PKI that you control. So we will be using self-signed certificates for this but Let's Encrypt's root certificate is also self-signed. So I'll be careful to talk about private PKI going forwards. So a root certificate usually issues an intermediate certificate. That's because the root certificate tends to live offline. For valuable certificates like Let's Encrypt their root will be stored in some secret place where it's not connected to the internet ever to reduce the attack the attack surface of it. We don't want it to get stolen that would be terrible. So the day-to-day issuance tends to use an intermediate certificate and that intermediate is the thing that will issue the leaf certificate. That is sometimes called an entity certificate if you're interested. The leaf certificate is the thing that has our identity in it. That's the website that we're trying to identify or the email address that we're trying to use our certificate for. Whatever you're using your certificate for the leaf certificate is the thing with the identity in it. So ultimately it's the goal of what we're trying to do. Like I said traditionally it's the intermediates that do this and roots are stored offline. Roots are then only used to issue new intermediates but that's not required. We won't be doing that today. Roots can absolutely just issue any kind of certificate directly they're marked as CA certificates which means they're allowed to sign any other certificate and what's best in what kind of architecture you want to use will depend on you. That's a choice that you get to make. So although I'm very passionate about the topic of PKI and I find it super interesting I would be remiss if I didn't talk a little bit about the risks. It's not a free lunch here and we do need to think about what the risks are that we're trying to deal with with private PKI. There are serious issues like this is not simple necessarily although I'm going to try and make it as simple as I can. It is important that you think about these risks when you consider deploying a private PKI. That's not to say you shouldn't I think everyone should try it but you do need to think ahead and avoid from causing outages or security vulnerabilities down the road. Some of the risks I'll talk about are specific to cert manager and some of the risks will mitigate using tools in the cert manager suite to get around those risks and some are just true generally of PKI and there's something we would have no matter how we issued our certificates. First risk is not locking things down. We need to make sure that people who can request certificates aren't allowed to do that for certificates that we don't expect, the certificate types we don't expect that is, or for domains that we don't want them to get certificates for. If we don't lock down issuance in cert manager someone could use a CA issuer to issue any other type of certificate including their own CA certificate. They could then exfiltrate that out of the cluster and sign other certificates that will be trusted by anyone that trusted our route. We can't allow that that's not great but it can be mitigated using a prover policy. If you've not heard of that we'll talk about it in a second. Risk number two is a huge part of what I'm going to talk about today and that's rotating route certificates. It's easy to get this wrong there's a lot of ways to shoot yourself in the foot and unfortunately it's one of those things that just requires a plan. You have to have a plan for how to do this and it's tricky to automate it which makes it fiddly but we'll cover that. By the end of this talk you'll know what to do. The third risk is trust. If you don't mitigate this then your certificate will be useless. If nothing trusts your new shiny certificate you can't use it to sign anything else. So there are subtleties here relating to rotation I've just mentioned but fortunately we have another tool trust manager and that can help in this situation but you still need to prepare for the thought for trusting certificates and you need to have a plan going forwards. As I said we'll mitigate these risks it's not a showstopper but we do need to have a plan and that plan does need to be production ready I don't just want to talk about something that you can't use. So we'll do our best. So let's start off and let's see what a safe private PKI will look like. First thing with seeing what safe private PKI looks like is to not just copy from these slides. There's a repo there's a QR code on the recording here. I'll have another that QR code showed again later but don't copy from the slides. The examples I'm using here are not full. You'll see it's just an illustration but there's a full guide available in this repo. So the first step is issuance. How do we actually get certificates signed? Well the simplest approach here is just entirely in cluster. By that I mean we just use cert manager to create a CA issuer that will all live in the cluster they'll live in a Kubernetes secret in fact. We could just use cert manager we don't need any other tools to do this but bearing in mind the risks that I mentioned before it's actually safer to use both the proof of policy and trust manager for this use. There are other ways of doing this you could sign an offline route we're not going to consider that just now. And this is what root certificate looks like. This is using an issuer ref which points to a special kind of issuer called a self-signed issuer. That just means we'll get a self-signed certificate which again just means root certificate. You'll see that I've just used an uninventive name of root certificate here. You could customize that if you wanted to. I'm not trying to be too fancy here I just want to show you the idea of what we're talking about. You might notice that I used a 25 year long certificate in that last step. That's totally fine for root certificates. As long as you have rotation policy the fact that it's long lived isn't really a risk. Again feel free to use descriptive names but coming back to that sort of long lived risk problem the only real risk here is maybe quantum computers breaking our certificate but in that case we still need a rotation policy to distrust the old certificate that's been broken by quantum computers. So don't worry too much about the lifetime it's kind of fine. Once we've issued our certificate using a self-signed issuer we actually just create a cluster issuer which looks very much like the example I showed earlier. We're just pointing to some secret in the cluster and this issue will then be able to issue all the kinds of certificates. It's that simple to get issuance set up but this on its own is not safe. To make it safe we need policy. Policy is here to prevent several things. One is that I mentioned earlier people could now just request a CA certificate from this CA issuer that we just created. They could also just request any domain. They could request a certificate for google.com or cncf.io say and if the ifapod trusts your awenu root certificate it would then trust a certificate issued for google or the cncf so we shouldn't really allow that. We also have to look at lifetimes. Lifetimes for root certificates can be pretty long because it's not really an extra risk there but lifetimes for leaf certificates were different stories since leaves are much more likely to be stolen. Ideally if one is stolen we want it to be valid for as little as possible and this is what a policy looks like using approval policy and cert manager. We have a certificate request policy crd and this policy here says that we are not allowed using our root issuer to issue CA's but we do require that the certificate that we're issuing has a common name and it has DNS names and they may have any value. We also say that the max duration of a certificate may only be 48 hours so we set an upper limit there. In production you should probably restrict to the DNS names here if you can do that in a reasonable way and you should try and keep the max duration as low as possible but that will depend on what your operational constraints are. I can, I should say again that you really shouldn't be considering running private PKI without something like this. It doesn't have to be this but you shouldn't just allow anyone to issue anything that could be dangerous. The final component of running our private PKI is trust. Trust is essentially answering the question well how do we use our root? Pods in Kubernetes will need to trust it and they need to trust public certificates as well. We get is no good if our root is trusted but people can't connect to GitHub anymore for example. That's where Trust Manager comes in which is another project that we have. It's also kind of my baby and I kind of really care about Trust Manager so I really hope you try it. Trust Manager adds yet another CRD in this case the bundle. In this bundle we've specified two sources. We specified the default CA's which come bundled with Trust Manager but are easy to update. That's the publicly trusted bundle that we have on all of our devices. You can just use the same one. We also specify a config map and this points to our trusted certificate which we issued in the first step of this. If you look at the example repo for this we'll go into some detail about how actually that config map gets created but it's not particularly interesting so I won't show it in the slides. This bundle will cause a config map to be created in each namespace so pods can then mount that resulting config map and they'll trust ACA because it'll be in the trust bundle that they get. This bundle resource means we only have one thing to update when we need to update trust. That's super important even if we weren't doing private pki that would be valuable. It means that we can avoid rebuilding every container if we need to update the trust bundle because today most containers bake this in at build time. If there were a vulnerability in a trust store you'd need to rebuild all of your containers. Trust manager gets around that even if you don't look at private pki following this talk I highly recommend that you do look into trust manager because even if you're not running private pki I think it's really important. If you are using private pki though it's worth saying that your root certificate will have a ca.crt field which has the root in it. That's the kubernetes secret that got issued as a result of our self-signed certificate. Don't use it directly. This is a really subtle interaction but if you use that directly and the cert is changed in place your trust will automatically update because it will be pulled from that secret. That would break any running application that relied on that trust to exist. Instead it's much safer to copy the certificate into a config map and the example in the repository does show this. So cool we've got a certificate, we've got policy in place and we've got a way of trusting it. So how do we actually scale up with this and how do we get it going and operationalized and running in production? Well really there's a main thing to consider here and that's the elephant in the room which is rotating roots. It's the tile of the talk I had to come back to it at some point. Before we start this like it's worth addressing that alternative architecture that I showed that lets encrypt users where you have an intermediate certificate and you may well think well rotating an intermediate certificate is actually a lot easier and that is true it is generally easier but the context is important here. Rotating intermediates in a disaster is actually really hard. If an intermediate is exposed generally it will have a reasonably long lifetime and you have to sort of rely on revocation revoking the certificate to be able to get rid of it. If you can't rely on revocation then you need to rotate your root certificate anyway so you need to have that in place ready to go. So you may say well what about revocation maybe I can use that and it will work maybe you can but it's very very hard to actually get that working. Most places don't try in the internet it generally doesn't work well and even if you do get it working it's very unlikely that you'll get it in place such that you can defend against all possible attacks. My it's not even a hot take my lukewarm take is that revocation is simply not worth it and you can't rely on it so don't. So if we can't use revocation we need to have a root rotation plan and our assumptions for that are that we'll have no downtime for a regular rotation and we'll automate this where possible. When I say no downtime for a regular rotation if you have if your root certificate is exposed you may want to distrust it immediately and that could cause downtime but that's an important and powerful tool to have. If your certificate's been exposed you don't want to trust it any longer you want to fix it as soon as possible but if we're just doing a sort of regular once a month practice run of our rotation policy then actually we don't need to have any downtime at all and we'll see how to achieve that. So this is the simple sort of five step plan for a safe root rotation this is what you need to have in place for your organization if you're running a private pki. First we can't rotate the root certificate until we have something new to rotate it with that's a simple thing like in the in the repo associated with this talk you'll see an example of this it's nothing nothing complicated at all then we need to trust that so with trust manager that means adding that root to the bundle again we'll create a separate config map and then trust that config map in our bundles sources. Great third we need to ensure that everything is using that new trust bundle that means that every pod that mounts that config map probably needs to be redeployed we'll come back to this step. Step four is to shift issuance to the new root by that i mean if you're creating a certificate off the old root you should now create your certificate using the new one once this has been done nothing will be using the old root anymore and that leads us to step five where it can be removed. This seems simple it's a simple enough plan but really there is a warning sign on step three this is why it's so difficult to automate and why you need a plan. Anything outside of your cluster that trusts this root also needs to be updated to also trust the new root. The issue here is that inside the cluster it's very difficult to specify where the root is being used because it doesn't need to call back into the cluster. If you copy that root out which is totally valid although i wouldn't recommend it you could be using that Roots of Issue certificates anywhere or if you're just using certificates in your cluster to access services outside the services outside may know of your old root but won't know of your new one until you manually update them or create some system by which they will be updated. Don't underestimate this step is what i'm saying you need to have a plan here. So within Kubernetes in summary you can redeploy everything because trust manager will allow you to update your config maps in each namespace and then a redeploy will ensure that each part has picked up the new trust store so step three becomes simpler. If you're outside like i say you need to keep track the actual process of updating the bundle i should say is very simple you just add a new source it really couldn't be simpler and trust manager really helps here but the main theme that i should get across to you about rotating Roots is that there's no one answer here. You need to have a plan that matters your organization and your architecture. In fact the key points of this talk are that your architecture matters a great deal if your architecture is such that you have one cluster one god cluster that has everything in it having a root just for that is totally easy to do and easy to get started with and if everything's running inside that cluster you can use tools like trust managers and manage that and approve a policy will handle everything for you as well obviously on the policy side. But if you're using a multi-cluster type situation which obviously is very common nowadays maybe even you're running in multiple clouds if you want those clusters to talk to each other you have the problem of distributing that trust across different clusters that doesn't necessarily come easily trust manager can help when you've got all of your config maps created in each cluster but it doesn't solve the problem entirely because you still need to get those config maps created without planning this there's nothing but you can do beyond say like you need to think about it it is worth me saying that it is possible to create a root certificate which issues an intermediate certificate for each cluster then you only need to distribute that root across each cluster and they'll all trust each other because they all chain to the same root and that's how TLS trust works that is a possible architecture that you could go with the only potential downside with that is that if you have this one root and you issue a new intermediate for each cluster you then need to have access to that root every time you create a new cluster so you may if you're storing a root offline have a manual step where you actually need to go and like unlock a safe and get out a raspberry pi to issue a new certificate to create a cluster obviously that's not a particularly automated thing it's something you need to think about key point two really is that rotation is key and this really is the main thing i'm trying to get across here whether or not you're using any kind of particular architecture you need to plan how to do rotation this is true of any secret but PKI just makes it even more true rotating routes is not that hard it can be time-consuming to make sure that everything is using a new root but trust manager makes it simple to distribute things the main thing is that you need to practice it and you need to have a plan and if a root is exposed you may well need to do this in an emergency so planning it and practicing it regularly is a great way of ensuring that you have the skills to rotate when you need to the other main point is that revocation is not a reliable way of doing anything like i said before you need to have a plan for rotation whatever you're doing so you can just kind of lean on your plan for rotation to get around the fact that revocation can't be relied upon i would suggest you don't even try to set up revocation it's probably going to be really hard for not a lot of gain get a good rotation plan in place instead that does mean that anywhere where you go on to trust your root certificate in your private PKI you need to know about it so you can revoke that trust and you need to have a plan for how to update trust on that thing that you're using your root CA on finally i guess my actual main point is that you should try private PKI and that if you're running cert managers today you've already got all the tools you need to start on a basic level and at the very least you're going to learn something i think it's really fun to play with this stuff and that's how i got into sort of certificates and public key cryptography is that i think this is cool so give it a go maybe you can leverage that to then save some money or time it also enables all kinds of other cool things like you may intentionally want to perform a man in the middle attack on certain connections you may want to do just thousands of certificates and this will give you the freedom to do it just give it a go it can't hurt here's that same qr code that's been on the screen the whole time i highly encourage you to check out this repo if you do do that please feel free to open an issue or leave a message or do anything like open a poor request i would like for this repo to become a useful resource for people and i hope it's useful for you thank you for listening to this i really appreciate the opportunity to talk at kubecon EU so shame i couldn't be there in person but the main thing i appreciate is the ability to spread the certificate gospel i really hope this is useful to people i hope you've enjoyed this thank you very much enjoy the rest of your conference