 Hi, everyone. Welcome to this talk. That will be an Introduction to 6-Tour for Pythonistas. My name is Maya. I'm a software engineer at Red Hat. I work on the emerging technology security team. And you can find me on social media, on Twitter, my student GitHub, under those handles. So this talk will be about supply chain security. So I would like to try to answer two questions. What are digital signatures? And why are they so important in this context? So maybe you know this information, but software supply chain attacks have increased of more than 700 percent during the past three years, which is kind of a huge increase, of course. And a lot of these attacks have been targeting the Python ecosystem. And in particular, the Python packet index. The attackers use techniques like typosquatting or dependency confusion, or also taking possession of a maintainer account on PyPI to try to inject malware in important libraries. So this increase in malicious package upload was so important that recently PyPI maintainers decided to temporarily suspend new user registrations and package uploads because they got so overwhelmed by those attacks that they couldn't handle the volume of malware anymore and just had to temporarily deactivate this important function. So we're talking about software supply chain. So I would like to try to define what exactly a supply chain is. So for the sake of this talk, we can say it's like the end-to-end journey that software takes from development to distribution. And that involves all the tools and the people that are responsible for delivering the software. So for instance, that is developers, version control systems, build systems, registries, package indices, and also deploying platforms in production environments. So when you have a supply chain, what attackers play on, usually, is the expectation of developers that every step in their supply chain is going to be systematically reproducible, which of course is not true. And that creates some kind of vulnerable links that attackers can exploit to upload malicious software in your supply chain. So why are cryptographic signatures or digital signatures important? And a key component of every secure supply chain is because they offer two guarantees. It's software integrity and software authenticity. So if I were to make some kind of analogy here, I would say cryptographic signatures are like a wax seal on a letter. So it ensures two things, usually, when you open such a letter. You can see if the contents of the letter were tampered with, if the seal was open. And also, the pattern on the seal can allow you to identify uniquely the center of the letter. So it's the same thing with cryptographic signatures. Of course, before 6-store, other signing tools existed and other standards. So I think the most famous one was probably OpenPGP or in supplementation GPG. But those standards and particular PGP had a bunch of problems that prevented a good developer adoption and didn't allow developer to use code signing as a day-to-day tool. So the first one is public key distribution. So public key distribution is the act of ensuring that your end users are going to be able to identify correctly the public key. That you generated. So we're talking about asymmetric cryptography. So you have a private key that you used to generate the signatures. You have to keep secret. And the public key, you need to distribute for people to be able to recognize your signatures. And in the case of OpenPGP, you have different methods to do that. So it's called public key infrastructure. So it's the infrastructure you put into place so that users can identify your public key. But the problem is that it's not very standardized. So usually standards for discovering public keys can widely differ between PKIs. So for example, you can trust more centralized certificate authority to say, what are the good public keys to bind to a signer's identity? Or in other models like the web of trusts, you can trust users. It's more decentralized. So other users can vouch for other users' identities. So I put this picture in the slide. This is called the key signing party. And it happened, I think, in front of FOSDEM 2008. So it's kind of a specific way to verify other people's public key in person. And maybe you will agree it's not very convenient. So if you have to verify a public key by meeting your colleagues in person, it's maybe not the best signing and PKI scheme. So it's quite a specific example. Of course, not all standards are that inconvenient, but still it's a kind of nice illustration, I would say. Another problem of OpenPGP is private key storage and rotation. So private key is a very important component of asymmetric cryptography because you have to safeguard it at all costs. And that means literally in terms of costing money. So you don't want your private key to leak at all. So you need to invest in some kind of secure storage, for instance, like a hardware security module to guard your key, which is very costly. So you need to invest in specific infrastructure, which also implies specific knowledge about it. So maybe hire people especially for this kind of things. And you also need to regularly rotate your private key because compromises are pretty frequent, I would say, and more that you can sing. So you need to also think about rotating your keys also as a best practice. If you've used GPG before, you could agree that the configuration can get quite complex. So sometimes it's difficult to really understand what you're using, and especially what that involves. Understanding the underlying cryptographic protocols when trying to sign an artifact because not everyone, of course, is a cryptography expert. So even if you're a developer, you're not supposed to really care about those things, but sometimes you need to. So that's not really ideal. I put a reference here to a really good article that was published recently. It's called PGP signatures on PyPI, worse than useless. You can check the link when the slides will be published. So it explains why PGP signature will remove from PyPI. So I think the title is pretty explicit, but you can still check the audits made by the blog author to understand why it was not worse continuing to maintain the GPG signatures. OK, so now it's time to introduce SIGSTOR, which aims to make code signing easier and more accessible for everyone. So the motto of SIGSTOR is to become to digital signatures what let's encrypt is to HTTPS. So I will explain what that means. So SIGSTOR has built-built in terms of philosophy on the model of less encrypt. So if you make a quick comparison between the two services, you can see that less encrypt is a free and automated certificate authority. So you can use it at zero cost to obtain TLA certificates to adopt HTTPS for your website. And in the same way, SIGSTOR is also a free service that has a public good instance. And you can use it, as you'd like, to log transparently and publicly your signatures. In terms of numbers, less encrypt has stored over 200 million certificates since 2016, which is roughly 3 million certificates issued per day. And SIGSTOR has stored over 20 million entries since the public good instance went up on 2021 GA. So what exactly is SIGSTOR? SIGSTOR is a tool that solves some common issues with current signature schemes, like the ones I talked about before, and that prevented developer adoption. So with SIGSTOR, you don't need any specific cryptography knowledge or any knowledge of PKI protocols. It has a very simple interface that makes signing truly accessible to everyone, developer or not. And you don't need to maintain your own private keys anymore. So that's a big advantage because you don't have to invest in all this infrastructure and knowledge at all. It also allows an easier auditing and revocation of signatures in case they get compromised, all right? Our fake, for instance, which is still pretty rare. And signatures are bound to a public identity and not to a public key anymore. So that's kind of a big change compared to others' asymmetric cryptographic schemes because you can bind, for instance, a signature to something more concrete like an email address that is easily identifiable by a human, not like a public key, for instance. SIGSTOR is composed of different sub-projects. So here I put the three main ones. The first one is RECOR. It's what we call a transparency log. And that's a immutable app-only data structure that allows us to store signatures so that everyone can be able to verify them. The second one is FULLSHOW. It's a free certificate authority and it delivers ephemeral signing certificates you can use for one-time signing of artifacts. And then that will expire and that people will not be able to reuse after you. So in terms of security, that's a pretty big advantage. And it's used usually as a certificate to verify the signature rather than to sign multiple times like usual signing certificates. And the third sub-project is COSIGN. COSIGN is a command line tool. So you can use it to sign and verify artifacts in a very simple manner. So all the cryptographic primitives are picked for you already. You don't need to care about cryptographic protocols. The command line is pretty simple. And you can use it to sign containers or blobs, for instance. In addition to the three projects, you also have a whole ecosystem-specific set of clients. So you have implementations for Golang, JavaScript, Rust, and, of course, for Python. I will talk about in a few minutes. So SixTor has known a pretty large open-source adoption since it was created, especially in the cloud-native community since it was incubated as a CNCF. So it's pretty well known to be used to sign Kubernetes releases and also resist for some other famous projects like Iverno, Tecton, and also the Python library URL lib3. And it has a lot of integrations with other supply chain projects, like Tecton chains, the update framework in Toto, and Kyverno as well. OK, so now let's talk about SixTor in the Python ecosystem more specifically. So let's cover a few initiatives that the Python community has taken to integrate SixTor into the ecosystem. So first of all, SixTor Python clients is available for Python users. You can do different things with it. So the first thing is use it as a common line tool. You can sign only Python blobs, not containers this time. Using what we call a keyless signing workflow that we'll explain later. But basically, as you can guess, you don't have to manage private keys using this workflow. Instead, you use OpenID Connect. So I will go into detail about this later. You can also use it in a GitHub Action workflow. So for example, in your CI, if you want to sign package release, or a build, or anything like that, it's pretty easy to use. And you can have signatures as output of your workflow. And finally, it has a stable API. Since version one, you can use to integrate SixTor natively into Python projects. If you want to test SixTor, it's pretty simple. It's a Python package on PyPI, so you can just run PIP install SixTor and start experimenting with it. The Python packaging community also adopted SixTor. So SixTor is appearing in two PEPs in particular. It's PEP 480 and 458, which concerns secure downloads of PyPI packages, and also everything related to software signatures on PyPI. So the PEPs are accepted right now, not implemented, but I guess you will see soon how SixTor fits into this picture. So basically, it will enable users to upload SixTor signatures along their packages and modify the API so that clients can also retrieve them and verify the packages upon downloads. So package managers like PIP, for instance, also supports verifying signatures. And overall, the goal of those PEPs is still to make the user experience similar to before. So it doesn't add any overhead for Python developers when downloading or uploading packages, but it guarantees way more security and integrity. So here I put an example as well of how SixTor is used in the Python ecosystem. It's to sign releases of C Python. So if you go to the python.org slash download slash SixTor page to download Python releases, you will see that there are now signs using SixTor. And here's the command I combi-based from this site to verify the SixTor signatures. So you can have an overview of what it takes. It's not that complex, actually. So it's just SixTor Verify Identity. That's the common line to verify. Then you pass the ephemeral signing certificates, the signature file. And then you pass information about the signer, who is the C Python release manager for the version. So here's the email address. And then the URL of the identity provider was used to issues this identity binding. And then you pass, of course, the Python Tor file. OK, so now I'd like to do a quick demo of signing and verifying a Python file with SixTor Python. OK. Can everyone see? Well. OK, great. OK, so here we have a file. It's called hello.py. And it does nothing special except say hello.devconf. So it's a very simple example. So now I'm going to sign it using this keyless workflow that SixTor Python enables. So I will just type SixTor sign and then hello.py. Oh, OK. I don't have an internet connection. So that's an issue. So sorry about that. Let me check the Wi-Fi back to the demo. Let's try again. So now we'll open a browser page. I was redirected to this page by the command line. So here you see that you have a login page appearing. You have different identity providers here, like GitHub, Google, and Microsoft that are officially supported by the SixTor public instance. So I'm logging to GitHub. So I will just click Login with GitHub. And normally I already have a session open. So I would have to normally input my email address and the GitHub password. But here I'm already logged in. So the authentication was successful. I can close this page. And back to the demo. OK, so let's have a look at what we got here as an output on this command. So we finished the browser interaction. We got an ephemeral sign in certificates here in PEM format. And then you can see this line here. So it say that it created an entry at index 24 million something in the transparency log. So this is RECOR. I talked about earlier. It also wrote a bunch of files. So a .sync file, which is a base 64 signature for the artifact certificates, which is the one printed here. And also a file called a six-store bundle you can use to verify your signatures directly without needing the signature and certificate. So let's take a look at what is in the ephemeral certificate. So you can see that the issuer is six-store. It's more specifically the six-store public good instance certificate authority for show that issued the signing certificate. Other than that, you can see that the SAM is my email address. So I used for authenticated to GitHub. And the identity provider is GitHub. So this URL here. And if you look at the normally at the time, yes. So the not after for the validity is in 10 minutes. So the certificate is only valid for 10 minutes to sign any artifacts I want. So right now I could reuse it. But in 10 minutes, it will be over. And it will just be approved that the certificate was issued to me with an ephemeral public key that was generated to sign signature. And it was binded to the certificate by full show. So an official authority approved like confirm that my GitHub identity was bound to my public key, which is bound to the private key, which ephemeral I used to sign the artifact. So now let's try to verify the signature. So here we have all the files we got from the signature operation. And now let's try to verify it. So I will just type six store, verify identity. Then I will pass cert identity. So that's my email address, the S-A-N on the certificate. So I'll put that. The next argument is cert YGC issuer. So this is the identity provider that was used. So here GitHub to provide the identity. So I'll just take this and paste it. And finally I will just pass the artifacts, which is held up by. And the files will be found in the same pass. So six store just detects this verification materials we have. Okay, so it says hookay. It means the signature has been successfully verified. So as you can see, it was pretty simple. You didn't have to configure anything specific, no cryptography, no complex things at all. So that was it for the demo. So now I think we have a bit of time to talk about how exactly that works. So I will try to go over the workflow. Feel free to go over the slides again, if it's a bit fast maybe. So that will be published, so feel free to take a look. So what exactly happened here? First of all, yes? Sorry, this is a Python specific client, but you have other ones. So for example, Cosine is implemented in Golang. Here it was six store Python. Yes. Okay, so how does that work? First of all, the client generated an fmrl keeper. So it's still private and the public keepers, there are fmrls, so they just stay in memory during the whole signing process. They never hit the disk, so you never have to see them again. They will get flushed at the end of the signing operation. Then the client will do an identity proof request to one of the identity providers we saw on the authentication login page. So it will ask Google, Microsoft, or GitHub, and others in some other configurations for a proof of identity for the signer. So this is where you log in and enter the proof you are the owner of your identity. And then the identity provider will send back to the client a JSON web token, which is called also an ID token, which contains the proof that you, the signed proof actually from the provider that you own your email address. Then the signing clients will send a signed certificate request to full show, which is the certificate authority, that contains the ID token, and also a certificate that is ready to be signed. And of course the fmrls public key to be included in the certificate. So full show also has a transparency log. So it's called a certificate transparency log. So when it issues a certificate, it logs it systematically into the CT log so that it can be able to audit for every certificate that was ever issued by full show. So it's also happened only and immutable, so you can't modify any entries once they are here. So you can have like a full audit of every signing certificate issued ever by this instance. Okay, so it has signed the certificates. So now it sends it back to the client, and then the clients will sign the artifacts of course, and then it will upload what you call a log entry into RECOR. So RECOR is the transparency log I talked about. So it's also apparently an immutable, and then you have a proof of the artifact signature during a given time span. Okay, so that was it for the signing part, and now onto the verification. So the verifier is also the same client in this case. So it uses a verification material I showed you earlier. So either only the bundle or a certificate and the signature files, and it also requests an inclusion proof to RECOR. So it asks RECOR if it has seen this entry that was supposed to be included in the log. If it hasn't, it won't be able to verify the signature unless you specify that you want to verify offline, but this is more specific to air gap environments. Okay, so that was it for the workflow. Now if you would like to join the six-store community, if you're interested in contributing, I would like to encourage you to join the Slack, the six-store Slack, to subscribe to the YouTube channel, the blog, and to check out the six-store.dev website with all the community updates. So thank you, and now we'll go to the questions. Yes? So actually they're not really related. They're kind of different components. So the city log, certificate transparency one, serves as a backend to store certificates issued by FullShow, and the RECOR serves to store signature entries. So it's two different components, but they are the same, actually they have the same backend, it's called Trillian. It's a miracle tree data structure. So it's used for the same purpose, basically. Yes? Yes. So in this case. Oh, okay, sure. So I will repeat the question. The question is if it's the same certificate authority that signs the certificates for FullShow, the same backend, or if there is a federation of CAs. Is that correct? Yeah, between different, it says it's a six-store service. Okay. So in this case, I talked about the six-store public good instance. So this is one instance of six-store that is maintained by the community. I think you also have a staging instance, but it's more for testing purposes. But in general, you can use this public instance, but you can also install six-store and bootstrap it on your own, so you can have a different CAE backend in this case. You can choose from other backends if you have your own infrastructure. Yes? Yes. Yes? So Cosign is used to sign containers. That's the first use case of Cosign, and six-store Python doesn't support that. And the goal of implementing different clients for different ecosystems is to support native integrations into projects. So you can expose an API on your library and integrate six-store directly into your projects. So the question was, okay. So the question was, wasn't there a risk that different clients will implement six-store differently, like the six-store protocol? So I think the community has been pretty good for thinking on this kind of issues. So usually you have normally a working group for clients, so people coordinate so that the protocol is still similar according to the implementation, so you have some kind of community agreement on what's supposed to happen here. Yes? Sorry, can you repeat? So the public instance runs its own. It's maintained by the community, but you can also install it on your own if you want to support a private instance of six-store. Sorry, I didn't understand the last part. Okay, so how am I supposed to know the correct identity for each package signer? So that's a very good question. In fact, I think you need to be aware of it. Or you can just, yes. I mean, it sends a certificate, but obviously you need to know what you're verifying. You can't just check any email address, so you need to know in advance what is the email address or identity you're looking for in this case. Yes? Which information? Okay, so the question was, is the information about signers provided on PyPI? So at the moment, it's not implemented yet, so I don't know exactly how it will be in the future, but I guess, yes, we will see that when the PEP 480 and 458 are actually implemented. I don't know yet how the API is going to look like. So any other question? Yes? Okay, so the question was, does it support having a single artifact signed by multiple people? So yes, of course. You can generate as many signature or certificate files as you want. So you just can store them in the same place, of course, but of course, yes. Yes? Oh, that's a very good question. So there's no line in this diagram between Fullsure and the identity provider, but in fact, you're right. So Fullsure recognizes only a set of identity providers, so you can choose if you have your own instance with identity providers to trust. In the case of the public instance, it's a community agreement as well, but you're right that, of course, Fullsure has a configuration and it specifies which identity providers it's trusting. So yes, that's true. Other questions? So the question was, if Sixthrope Python was going to support container signing? At the moment, I don't think so. I didn't see any community initiative in this sense. So I'm not sure at all. I don't think that's the plan as of now. Yes? So the question was, we should be able to sign containers with Sixthrope Python as it's also a file. So the thing is that you have the signature part and then you have the storage part in an OCI registry, which is a bit more complex and necessitates some other kind of implementation. So you need to know how to store the signature compared to where the image is stored on an OCI registry. So it's not something that Sixthrope Python is supporting right now. It's what, sorry? Yes, exactly. So the question was, if the client uploaded the signature into IPI? So not yet because it's not yet supported but they're planning on it. Yes? Okay. So I mentioned I can revoke the signature. So I don't think that's exactly the formulation I used, but Sixthrope makes auditing easier in general because thanks to the transparency log, you can know exactly if you have, for example, if you know when an identity was compromised, if that's the case, for instance, you can know exactly what artifacts to revoke. So it makes, you can't really revoke artifacts, but you can know which artifacts not to trust and you can pull down from your environment. Yes? The question is, would RECORE be able to record the fact that artifacts were compromised? I think this is not the case, but I need to check again, the structure of a RECORE entry, but I don't think RECORE has this capability as of now. If I can get back to you on this, I have to verify. Okay, so we're out of time. So thank you for attending.