 Hello, salut. Welcome and thanks to come in our session, securing the supply chain with six store artifact signatures at scale. I'm Dmitri Savvincev, an engineer at Yahoo Security Team working on the company-wide security infrastructure. I'm Yong He. I also work for the Yahoo Security Team. Our security team is known as the Paranoys and as a top of the slices or logo, we are very excited to be here. The topic we are going to talk about is something that concerns all the Kubernetes users. Kubernetes has lots of moving parts and infrastructure, but in the end it's about parts running containers that do the useful work and these containers are started off OCI or Docker images. The big question is how do we know? How can we be sure and how to verify that those images are really the ones that we intend to run and not some malicious falsification? And also how to do such verification for all of possibly millions of images we are deploying? In this talk, we want to share what we have done trying to answer those questions and here's the agenda. Supply chain security is about tracing the things we use from the region throughout their existence. It may be best to illustrate the dangers and risks with a couple of horror stories. How things can go terribly wrong in this area. Supply chain security concerns not only software. A literally deadly example is the case with style and all poisonings. When the capsules laced with cyanide were placed on the store shelves. At the reaction to this, the company Johnson and Johnson introduced tamper-evident packaging for all its over-the-counter medications and they were able to overcome the crisis and win back the user's trust. In the software area, the SolarWinds attack isn't famous. Huckas compromised the software update mechanism and new versions of the popular Oreo network management software brought malicious code to government agencies and major corporations. One more example with the code cough incident, intruders were able to modify the bash uploader script and that resulted in compromise of code and credentials of code cough users in the CI setup and that went and detected for a couple of months. The software supply chain attacks exploit the time gap between the generation of software artifact and its usage. The generation can be, for example, an OCI image built in a CI pipeline, which is then stored in a registry, potentially for a very long time and this consumption would be deployment in a Kubernetes clusters. This time, the intermediate time of storage can be between seconds and multiple years and there is a non-zero risk of modification of malicious substitution, intrusion. So the storage effects may be modified with some other version. So even if we may consider this risk small, we must do our best to prevent such attacks. Cryptographic digital signatures are a classical mitigation for them, so let's explore them in more detail. Typically, the public key cryptography is used for digital signatures. You sign that effect with the public, with the private key and verify using the public one. You need to provide the public key to all the points where the signatures are verified, for example, to all the Kubernetes clusters. And critically important, you need to ensure that the private key is never ever compromised. Otherwise, attackers can use the stolen key to sign maliciously falsified artifacts and those signatures will be as good as the real ones. The private key management is a hard problem, especially when you need to use it at large scale. Before we talk about what we have done with digital signatures and OCI images, a few words about our world and our environment. We have several thousand developers, over 60,000 daily builds, more than 700 Kubernetes clusters and over 100,000 parts. And about half of them are getting deployed or redeployed every 24 hours. And every deployment is a verification event. We are using different public cloud providers, AWS and GCP, plus half on-premise clusters. The OCI images are produced by Docker, Portman and Build.org, each of which has some differences in the way they sign the images. Our build system is Screwdriver, which is a CI CD platform built and open sourced by Yahoo. But we also have some sprinkles of Jenkins and GitHub actions. There are lots of different software artifacts produced and used, but like in our project, we want to concentrate here primarily on the digital signatures of the OCI or Docker images. We did have an existing image signing system, but it was far from ideal. It used public-private key signing and verification, but dependent on a long-lived static PGP key. Even with the maximal security around it, having such an Uber key is not the best for security and also makes it difficult to rotate or revoke. Also, we have a big custom database and an API server to accept, store and provide and request those signatures, almost a custom registry for image signatures, plus some custom tooling to make it all work. I wouldn't be surprised if some of you have similar setups at your workplaces. It works but has security deficiencies, it's cumbersome and far from ideal. So, we started to look for alternatives and industry best practices. We have found the 6-store project also through CNCF security contours. The 6-store is a project specifically focused on digital signatures and their verifications. It's an open source supported by the open-source security foundation. It's vendor-neutral, works with different clouds and is widely accepted in the industry. 6-store stands out for its innovative use of ephemeral keys, enabling so-called keyless signing. We chose to use this new technique because it helped to solve the security gaps and reduce the operational cost of our existing signing system. First, instead of signing all images with one static key, this method enables to sign each image with a unique ephemeral private key, so there are no key rotations or revocations necessary. Second, there are no need for a dedicated database to store signatures anymore because they are saved directly in the registry. And finally, we can use widely accepted battle-tested tools supported by a large community with all the benefits of the open source. I'll hand over to Junga to explain the keyless signing and what we have done with it in more detail. Thanks, Dimitri. So, how does keyless signing work? It is not hard to be understood. Actually, it's just involved five different entities. Signer, OSI registry, OIDC provider, certificate authority, and timestamp authority. Let's say you are the signer Alice and you have an image in the OSI registry. To start signing, you need something called OIDC ID token. This token basically proves that you are Alice. Getting this token may involve different steps depending on the OIDC provider you are using. It might ask you for your username and password or require you to use multi-factor authentication. But no matter what, once you go through this process, you will have the OIDC ID token in hand. The next step is to generate a key pair locally. Then you send the certificate signing request with public key along with the OIDC ID token to the certificate authority. In return, you receive an image signing certificate. This certificate not only proves that you are Alice possess the corresponding key pair, but also has a short validity. There is no standard regarding the length of the validity period. Typically, the shorter the better. To be specific in this presentation, let's assume it is 15 minutes. And you have to sign the image right away within this timeframe. Because all signatures generated after this 15 minutes will be considered invalid. This is a great security improvement because even if an attacker somehow got your private key, it doesn't have too much time to do anything bad. And after this step, we no longer need this private key. We can safely discard it. This is why this whole process is called keyless signing. It doesn't mean signing without private keys. It means signing with short-lived private keys. And you do not need to worry about key measurements. And now you might be wondering, how do we know when the signature was generated? Why do we know the signature was generated while the certificate was still valid? Well, that is achieved by the timestamp authority. We send the signature to the timestamp authority. And in return, it provides your cryptographically signed timestamp. This timestamp directly proves that this signature existed as a current time t. This plays a crucial role during the verification process because only with this timestamp can we know when the signature was generated. If we find the signature, it means that the time t is outside the certificate validity, it should not accept the signature. Okay, now you are holding three things on your hand. First is image signing t, signing third, and signature, and timestamp. Finally, what you need to do is to wrap these three things into a new image . This is how six stores save signatures to OCI registries. I know it looks big on the slide, but actually it's just hundreds of bytes, which is small. Also, the signature image is tagged following a naming convention so that the verifier can easily link between an image and this signature image. During the verification process, apart from the image that got signed and its signature image, we also need two certificate chains. The first one is called certificate authority, say, third chain. It is a chain that starts from the immediate say that issue the image signing third all the way up to the trusted root say. It helps us verify if the image signing third is trusted. Similarly, we also need a timestamp authority, TSA third chain, which is used to verify the timestamp. If everything checks out, we can confirm that the image was actually signed by Alice when the third was still valid. However, there are cases where verification might fail. For example, if we only accept the images signed by Bob but not Alice, it will be considered a failure. Or if we find the image was signed when the third was expired, verification will fail too. We will have a demo about those cases later, but regardless success or failure, this is how killer signing and verification work. It provides us with enhanced security. Let's go back to this diagram. If you want to set up a private killer signing system, you just need to figure out who is going to be your signer or sign registry or a disappear provider, say A and TSA. In the standard setup, of course, the signer is called sign and for sure it's not a failure. But in the standard setup, and you can do that, you can also set up a certificate authority. The record or six-door TSA acts as a timestamp authority. However, you are not limited to those options. Think of this architecture like Lego building blocks. You can We have our established authentication system, Essence, which is a CNCF sandbox project. You can check it out on the GitHub. Essence can be both OID supplier and CA. And Sugriver is our signer Alice in the picture. As Dimitri mentioned, it is our CI system and producer of OCA images. Like Essence, it is also open source. And you can find it on the CNCF cloud-native landscape website. And yes, we run cosine on the Sugriver. The only missing part for us is the time zone authority. So we launched a six-door TSA internally. Compared to Rayquor, which contains MySQL, Redis, and web services, six-door TSA is a much simpler, stateless service. It allows us to launch the internal skillet signing system at the lowest cost. But Rayquor, as a transparency log, indeed provides us more security features. So we are considering adding it into our system in the future. For the verification, of course, you can use the command line to cosine verify. But in terms of signature verification in Kubernetes, which is more common, we can utilize the admission web hook. It allows you to intercept and check in the images before they are deployed. As far as we know, you can use a six-door policy controller and the K-vernal to check the cosine signature. They are all open sourced. But we, for now, wrote our own admission controller to support the legacy signature check. Once we've faced all the legacy signing mechanism in our company, we will explore switching to one of those open source options. Great. Now we have some screenshots to demonstrate what we have discussed. Actually, it works at Yahoo. Before we start, just a reminder, we have sanitized all sensitive data for security purpose. OK, let's get started. What you can see now is our CI system. School driver is signing an image. Firstly, it uses a command CTSID token to generate the OID token from essence. Secondly, after generating key pair locally using open SSL, we use the CTSRESERT command to gather image signing cert from essence. This certificate is valid for only 15 minutes. The third validity here seems to be 75 minutes, but it is because essence sets the start time one hour before the current time as a margin of safety against potential clock scale. The actual not before time should be one hour later, along with the current time, which is 15 minutes before the certificate got expired. Finally, we sign the image, request a timestamp, and push the signature image to the registry. And we can successfully verify the signature. However, if you only accept the image signed by Bob but not Alice, the verification will fail. The error message goes none of the expected identities match what was in the certificate. The last case is when someone signed the image after the image signing cert has expired. Even though you can successfully sign the image and push the signature to the registry, you cannot get a timestamp within the third validity. That leads to a failure verification. The expiration time of certificate is before the time that the image was signed. OK, that's the demo. I will hand over back to Dimitri. So we're currently in our six store adventure. We have now integrated cosine signatures into the Docker sine step of our CI, which this step is used for building all the Docker images. And we are currently doing bucketed rollout. One thing that we have to be careful and do is coordinated with the Docker registry ops team regarding the load monitoring. Because doing cosine signatures approximately doubles the number of API requests for every generated image. We now produce and upload two images where previously it was just one. We are running the TSA service in AWS. And since the TSA does its own digital signatures, it needs its own private keys. So we use a multi-region KMS key. And the TSA certificate is signed by Athens, our essential certificate authority. So everything gets rooted in Athens. We have contributed a feature to the six store cosine Github project to allow MTLS connections from cosine, so from CI to the TSA, which was a must have requirements for us. We have also integrated the six store signature verification in all the Kubernetes deployment. So it's 100% using an admission controller webhook. Similarly to the assigning code, it's implemented and go based on the six store cosine library. We use a custom webhook, as I said, since we need to do several security related checks. But there is a similar open source six store policy controller webhook that Younger mentioned and had a link in his previous slides. In the current transition phase, we do both the legacy and cosine signature verification. As we said before, the six store verification has less operational overhead, so there's no additional signature store or custom verification extension required. And we are really looking forward to the day when we can fully switch to the six store verification. The best thing is that we get security improvement, but our developers and DevOps do not have to do anything extra. The process is fully transparent to them. There is the same security is always in convenience, but in this case, there is no inconvenience or extra work for our users. I love the title of the previous talk, safety or security. Why not boss? Why not boss? So it's the same case. We proved both security along with usability, which we absolutely love and strive for. It's like having your cake and eating it too. We have published a blog post about implementing six store image signing. Please check it out for additional details and code examples, especially if you plan to adopt six store to your company. It also details our contributions in more detail. So that's what we have done, but now how you can get started with six store and cosine digital signatures. We, of course, go through lots of tests and proof of concepts. You can start in minutes using the standard flow and six store public server, Fulcher and RECOR, cosine, to get the initial feel for how it works and what it does. And then while it's a shameless plug, I would recommend to check out this repository we made for our initial testing and also testing some of their contributions. Cosine QLS and the Dmitry's GitHub repository. It runs QLS signing using the temporary anonymous image registry, TTL data sage, which is great for testing, and also the public TSA server free TSA.org. Once you have this, you can expand in the direction of your requirements and infrastructure. For example, if you, like us, need to run your own TSA and have some TLS collection, you can try this run TSA and TLS data sage script in the same directory. And it needs some parameters configured like for location of your secrets and the URL and the name for TSA, but it shouldn't be too difficult. And finally, you can grow and expand the solution for your requirements and to match with your infrastructure. And as you grow, the solution don't hesitate to ask questions. You can find us on the six-store slack in the private six-store users channel, and we'll be happy to help if you can. The next steps, beside finishing out the rollout, we plan to work on adding an internal instance of record to the system to keep the record of all the signatures made. The OCI images were the starting point, but we would also like to sign and verify all the artifacts, such as RPM and Debian packages or install bundles. One challenge there is where to put the signatures. With OCI images, they can be uploaded to the image register alongside the images they sign. But for other artifacts, some other solution is necessary. And to make the signature verification really user-friendly and transparent for users, it needs to be integrated with the standard tools such as DNF and Podman. Imagine how awesome it would be if you could run Podman Pool or Podman Run, and it would do the six-store verification and abort the operation if the verification fails. So please check out this issue, six-store assign issues 3.5 to 3 on the IPM integration. And maybe you could also help to lobby for this or participate or contribute. To summarize, in about a year-long project, we were able to implement the six-store keyless signing and verification of the OCI images produced in the CI and consumed in Kubernetes. We found the six-store being flexible enough and powerful and meeting our requirements and able to mesh together with the internal components, namely Athens, which is our OIDC identity provider and central certificate authority. We complemented a few pieces that we required with our contributions to the six-store project. Based on the standard use cases and documentation, it's easy to think of six-store as being a monolith, where all the pieces, full-ture, record, or sign, I must have. Our experience highlights an alternative approach to six-store as a set of LEGO-like building blocks that can be mixed and matched as needed and combined with the internal pieces, such as identity provider or public key infrastructure. We would like to encourage other companies, even the big ones, to consider adopting six-store to improve the software supply chain security. If you remember just one thing from this talk, please remember to always sign and verify all your OCI images and, if possible, other artifacts as well. We'd like to thank many people who really helped us with this project and people in the six-store project. The six-store community was greatly helpful and a pleasure to work with. Particularly, thank you Haydn, Nathan, Zachary, and everyone who reviewed our pull requests and answers our many questions. We're also grateful to our teammates at Yahoo listed here for their support, advice, and encouragement. Thank you. Merci beaucoup. We have a few minutes left, and we'll be happy to answer your questions. And also, please scan that QR code and leave feedback for this session. There are a couple of microphones in here. So the question is why we haven't used the transparency local tracker, and what about the benefits it provides. So we wanted to start somewhere. The record is great. It's a pent-only lock, so it shows and it cannot be modified. We think that record is especially important when you have multiple patches that are not necessarily trusted each other, like an open-source world. So it might be less applicable inside of one corporation. But we plan to adopt RECA, so we will be adding this to the mix. So we're just starting with the pieces that we have, the TSA assertions and our certificate authority. So it's kind of the eaten elephant piece by piece, so starting with some piece and then adding and expanding this. I think for RECA, so one thing, for example, that you need to check if it supports the MTLS connection from cosine to RECA, so possibly that it would also need to be extended. So we're going to look into this next. To add on that, record is not the required block for keyless signing. The timestamp authority is. And the record is what you can do is to add this as a security enhancement to add after everything is set up. And I know Six Store official blog has a blog on maybe last year about this, writing about, yeah, you can have a timestamp authority stand-alone and push the entire record, like timestamp and every other information to the record again. So without record doesn't mean you cannot have keyless signing. But without timestamp authority, it can actually destroy the whole system. Other questions? Maybe one more. Yeah, Gorka? Yeah, I don't think this is quite the same issue, but it might overlap. Have you thought about audit logging for the CA? That's issuing that short-lived certificate to the actor that's actually doing the signature. I know that when we do CA certificate issuance more generally in the web PKI, they've done a lot of work for certificate transparency. So you have effectively an immutable audit log of all the certificates that have been issued. Does the Six Store stuff integrate with any type of thing like that so I can then see a record of which certificates have been issued to which actors to then go off and do signing? So currently we don't have a record-like or append log of all the certificates issued by Athens. So Athens is issuing those short-lived. So the CI VM machine generates its private key and then it sends a request to the Athens so that it would issue the short-lived certificate. And it's also tied to the specific, like a scrutinized, specific build project. So and also access this, you cannot just, like from developer's station, I could not connect to the Athens and get a sign in certificate. Ionga had to do some actually run it in screwdriver so that he would be able to do this screenshot. So it's tightly controlled, but we currently don't have this append log of all the certificates issued. So this could be something that we can consider. Thank you. OK. Last question. You had to run on two. Yes. Thanks very much for sharing your experience. You said, so the part that makes the keyless signing work is basically that you're delegating the identity verification to your IDAC provider. Can you share how you do that in your pipelines? I think you walk briefly through that in the demo. Oh, yes. Your question is, how do we get the OIDs token from Athens, right, this step? Yeah, basically, how do you identify this specific pipeline that it actually has the permission to sign? Yeah, I think it's more of an Athens kind of question. But there is a bit of magic or setup so that when the CI build is started, there's a private key or the key identifying this built place so that it can identify itself to the Athens. They're the same way as the VMs in AWS or GCP when they are started. There is also, I would say, magic. But there's a mechanism that allows it to get a private key and like a assertion that also enables this authentication. So you can see this is command line 2 we are using. And you can see we have service third file and service key file. This is how we use to get OIDs token or prove IMLs to Athens. So is this part of Screwdriver or part of your general infrastructure? I think it's the part of the Screwdriver hooked with Athens so that on birth the VM received a piece of the identity that it can present for those transactions. So thanks very much. So if you have a blog post on that as well, that would be nice. OK, yeah, we'll talk to our Athens colleagues. Thank you very much. Thanks for your time and have a great rest of the conference. Thanks.