 Hello, everybody. I know I've talked to a lot of you today already, but I get to say it from the stage now. Yeah, so I'm gonna spend about 25 minutes today talking a little bit about SigStore in the context of Python packaging, and specifically some of the next steps that lead to people who are both in the SigStore and the Python packaging communities are planning on doing for broader adoption in Python. So without further ado, yeah, my name is William. I am a senior security engineer at a company called Trail of Bits. Some of you may know us, for those of you who don't, we're a mid-sized cybersecurity company. In my mind, I think this is a small one, but now we're actually kind of big. We do a whole bunch of different things, but the things that I'm involved with are primarily open-source work in the package management, compilers, applied photography space. So generally things that companies don't have sort of specialized in-house talent for, they bring us in to help build out those tools and contribute back to the open-source community. And we do all kinds of things in the program analysis research space, too. And of course, this is my call to say, we're also hiring, so please join us if you are interested. We're pretty easy to find on mine. Yeah, so just to sort of set an agenda for today, I'm gonna sort of do, because this is a SigStore crowd and not a Python direct crowd, I'm gonna do a quick overview of the history of Python packaging and what it looks like and why it is the way it is. Alternatively, it's subtle. My package index is almost sold enough to drink. We'll then go through some of the pre-existing security practices in the Python packaging ecosystem and see how SigStore will replace nearly all of them, probably not all of them. We'll see where SigStore is currently being used, how you can use it in its current context, and then where we intend for it to be used. So sort of the next steps, as the title says, including things like the integration between SigStore and Tuff, within PyPI, within PIP, within your CI, and for sort of more complex verification policies. But before I do any of that, I wanted to sort of establish like a tone thing. So I think I can speak to everybody in this room when I say that we all care about both software and supply chain security and that ultimately we don't get to do those things unless we care a lot about adoption. And the hard truth is that in order to drive adoption, you have to convince engineers. And engineers, most of them at least, the ones who aren't specifically tasked with it, do not care about those things unless they directly relate to their job duties. And it's actually not their job too. Their job is to achieve the things that they're told to do. So it is therefore our job to make security as easy and as frictionless as possible so that it gets out of their way so they can do their job. And sort of everything keeps moving. If we don't do this, they will not use what we give them. They will also resent us for it. And they will misuse whatever we give them. We are all sort of like pernicious little creatures to them. We're given their way. And this applies, of course, doubly for open source because a lot of us are doing open source and we're paid for it. But for every one of us, there's people who aren't doing it for free because they believe in it. So we need to stay out of their way and make their lives better, not worse. Yeah, so as part of that, I want to do a brief history. So actually, before I do this, who in this room is programmed in Python? Okay, pretty much everybody. How many of you use pet? Pretty much everybody programs in Python, cool. Okay, so you probably know a lot of this already but I'll go through it pretty fast so that we're all on the same page. Yeah, so the reason I bring this up is because Python is one of the biggest programming communities in the world. It's one of the oldest for sort of like a modern programming language. And as a result of that, there's a lot of embedded history, packaging state that fundamentally cannot be changed. If you change these things, you're anger engineers and like we just said, if you're anger the engineers, they don't do anything. So way back in 2000, this thing called Distutils was added to Python 1.6.1. That's a version of Python that's sold that it's before the version that took us forever to deprecate. We're still using Distutils, as far as I can tell. Most of you probably have some piece of code in your code base that instead of using setup tools is using Distutils. So it's still there. This thing that we added drinking era ago. The year after that, we standardized the very first version of what's called the Python Packaging Metadata. This is still used. We have newer versions, but Python's tools will happily consume metadata 1.0. Shortly after that, the first central packaging index was established. And the word index there was really key because for a very long time, PyPI was not a file store. It was an index. And what that means is that it didn't, there were no distributions on PyPI. There was a link to a URL, that pointed to another server and you would navigate as a human being to that server and maybe there would be a file there that you could download. So this should tell you something about like how Python's packaging was originally organized and distributed. There was no automated way to do any of this. You as the application developer would go around the internet sort of like picking cherries from various websites, hoping they were online at the right time. And then install them manually. And then after that, the thing that we now call PyPI was established that was still just this bare index. No file hosting, just references to other servers. With that came metadata 1.1, which enhanced the metadata just a little bit, adds things like dependencies. You're after that, easy install. Some of you have possibly used this. This is like the predecessor to PIP. Easy install also comes with eggs, which are the predecessor to wheels. Both of these are more or less long gone. I think you can still use easy install with the latest versions of Python. Which again, should tell you something. I mean, this is not, you shouldn't be using it, but you can. And effort is put into supporting it. Moving on a few years, we get to 1.2. And then critically in 2008, we get PIP, which by the show of hands just earlier, many of you and the majority of you have used. And again, it should say something that, 14 years ago, the tool that you are all using today was created. So that was like sort of the prehistory of Python packaging. I would say the modern era roughly begins around 2012. So in 2012, we get PEP 427, which standardizes what we call wheels, which are like the modern package distribution format for Python packages. They augment, they don't fully replace source distributions, but they allow things like binary distributions. This, I'll talk a little bit more about what this enables, but basically it'll enable a lot of the things that we now can do with six store and with other tools. A few years later, we get the current PIPI repository API. So the thing that PIP actually uses when you do PIP install, before that, it was completely unstandardized. It would just sort of like hope and pray that the URL you hit resembled a PIPI index. So seven years ago, the thing that we all assumed just worked was actually standardized. Two years after that, we get metadata 2.1, which I believe is the current metadata standard unless they just released a new one, which I believe they were working on. In 2018, the modern PIPI backend was deployed. Before this, it was a code base that was sort of grown up organically from the 2003 code base. This is the one that is currently in use, and I'm a big fan of it. And finally, like more recent history, PEP 458. A lot of people in this room are very familiar with PEP 458. A lot of you worked on it. This is the signed repository metadata PEP. It's specifically tailored around Tuff. It was accepted, there were some efforts to implement it. They're still ongoing. The work that went there directly inspired a lot of this six-door work and is going to be used in conjunction with that. And then, sort of moving on, as of 2021, still nearly 99% of the top packages are distributed as wheels, but six of them still aren't. So 15 years after, no, 10 years after, it's still, there's a minority, but a very large number of downloads that are not using a modern distribution format. And then also PEP has a new dependency resolver. And then finally, we're here. So, you know, it's been going on for a long time and all of this has been moving to sort of like a glacial but accelerating pace in order to maintain this compatibility with this massive body of code that cannot be moved forward without a great deal of pain. Yeah, so on the sidecar of all of that, there's this that question of package security. We, historically, or not we, Python Packet Injects was one of the earliest ever made. Security was sort of like a non-thought, wasn't something that anybody was even thinking about. So these things, these three properties of authority and integrity and authenticity were just not concerns originally. You know, just a rehash for everybody here. You know, authority is users or identities can only publish packages if they control integrity. You know, you get the thing that you asked for. And authenticity is, it is what it says it is. So that's, it's not just unmodified, but it was actually created by the entity that had the right to create it. Yeah, none of these things was a priority and we'll see sort of how Python provided them originally. So this is actually the best one. Authority was something that was pretty much baked into, PyPI from the beginning, because you had these user accounts that could upload earlier, first metadata, but then actual distributions for packages. So you know, you could have owners, you could have maintainers, you got the other. Historically, the only way to off to PyPI was username and passwords, which led to a really common practice when CI became more common, if people were just dumping their passwords into GitHub and GitLab and et cetera, CIs. And then usually not getting leaked, but sometimes getting leaked. So that was pretty bad. In 2019, we developed two factor off on PyPI to discourage that. And I think since then, the uptake has been pretty good. So I would say here, the status quo, it was pretty good for PyPI. We're not, people are still using username and passwords, that's normal, but we're roughly on power better than most major packaging ecosystems. We're using modern two factor, it's TOTP and web often. Nothing crazy. Yeah, on the integrity side, like I said, PyPI was historically an HTTP index. You referenced files that were stored on HTTP or even HTTP hosts. That means of course, no transport integrity. TLS was not a thing. It was a thing, but it was not a common thing in 2001. These things were just sort of like trusted blindly over the network. At some point, I don't actually know when HTTPS was added to PyPI, providing transport integrity, but of course that requires you to trust the host. So full hosting of, well, it doesn't even require you to trust the host. You still have this transport integrity to PyPI, but then you go out to the HTTP host and you download it from there. So your individual hosts are still not providing it necessarily. Putting everything on PyPI fixed that. And then finally in 2016, we added hash checking mode to PIP. So you could actually pre-declare your hashes for your files ahead of time. PIP would fetch them, check the hashes and if they didn't mash, and if not all your files were hashed, it would fail. So that provides strong integrity. It's not great because now you have this terrible UX of hashes everywhere, but it's better than nothing. And then wheel distributions improve this also a little bit by adding profile hashes in this metadata record file. So here the status quo is, it's close to other package managers, but it's kind of hard to use. It's not really as easy to use as like package managers in like the Node.js or Rust ecosystems where this is all done by default for you in the packaging metadata file. You have to do it yourself, which is kind of annoying. I mean yeah, and here's an example of what that sort of hash looks like inside of the record file. You could see it's, you would never interact with this directly, but it's just there. Yeah. And then we get to authenticity, which is sort of the key thing in the sync store ecosystem. I think PyPI's historical approach to authenticity could be characterized as like maybe. You can do it if you want. So optionally for years, I don't even know when this began, PyPI has optionally supported PGP signatures on distributions. So when you uploaded a distribution, you could also upload a distribution name.asc for the signature file. PyPI would accept it, render it alongside, and then someone could optionally download them and verify it. Why do you trust that signature? That's not a rhetorical question. Why do you trust that signature? Anybody can sign for a package, anybody can upload it. PyPI's maintainers can just put their own signature there. You have to have some kind of web of trust or their trusted setup. Loading of trust doesn't really work anymore since the whole spamming thing in 2019. Yeah, so that really wasn't ideal. It turns out that pretty much nobody used it. So you can still upload PGP signatures to PyPI, but they're not in the UI anymore because they're functional useless. Stranger, this is like a weird vestige of Python history. I don't think anybody else knows about this. We also have optional support for Joe's style and PKCS7 style signatures. As far as I can tell, there's only one package ever that's been uploaded to PyPI that uses these. So people don't even know this exists. I found this out while doing research for these slides. Same problem with PGP, why would you use this? And also PIP, I think someone created an issue on the PIP issue tracker and they were like, how did you find this out? Close, we won't ever support this. So verification of these is purely optional, I'll put this back. You could use these at your own risk, I guess. So yeah, the status quo here is that authenticity in the Python packaging's ecosystem is the sigil at best and impractical at best. So yeah, that brings us to Sigstore. That's why we're all here, right? So what does Sigstore do for Python packaging? How does it improve all three of these properties? So I think the way that I think about it at least is the biggest thing that Sigstore does for Python packaging is it fundamentally solves all of the UX problems that happen with code signing, while simultaneously preserving the best properties of code signing. So the problem, like we said, with PGP and these PCS7 or Joe's signatures is key management. You have to hold on to this long-term, very sensitive key which if you lose or disclose by accident you have to rotate and then you have to have some kind of rotations, you know. It gets very complicated very fast. You know, you as a, packaging no longer have to do that with Sigstore. You as a verifier, as a consumer, no longer have to perform key ring maintenance. You no longer have to prune invalid or expired keys. Keys that are catered with poor crypto systems like, you know, PGP, you can just create a RSA 256 key if you wanted to and you know, you don't have to prune that out now. Sigstore will choose the right ciphers for you. Sigstore, sort of on the same topic, is in agile. You know, 30 years ago we thought cryptographic agility was a really good thing. Turns out we were really wrong about that. Inagility is sort of the name of the game. Now you really want the crypto system or the scheme to pick the best ciphers for you and Sigstore does that. Sigstore does not let you pick RSA 256. It does not let you pick CAST 5. It does not let you pick Blowfish, I don't know. And then finally, Sigstore signatures are rooted to a public identity. I think increasingly on the internet, we see people think in terms of identities instead of in terms of like a PGP signature. People think in terms of like their GitHub handle or the email metadata that's attached to the package. And so bringing the cryptographic promises closer to that model of what a package's owner is, I think has tangible UX benefits, it helps. It brings basically, it connects the theory of package integrity with factual practice and how people think about their packages. So that's good in my mind. And then sort of as a nice side effect, like I said, I mean, authority and integrity themselves aren't that bad right now in PyPI, but Sigstore will transitively strengthen these properties because once you get authenticity, you also get integrity transitively. You have strong integrity via cryptographic signatures. You get authority by having this, oh, sorry, this is a separate kind of authority that we're getting. So we're planning on making PyPI its own OpenID Connect identity provider. And so eventually you'll be able to have like a username separator. I think we settled on the explanation point as a separator for non-email identities. Can't remember. PyPI identifier, you'll be able to use that as a sign identity for Sigstore artifacts. I mean, like I said, integrity you get transitively through strong authenticity. Yeah, so that brings us to where we are currently. So currently, you've heard people reference, mention it throughout the other slides. We have this reference Python implementation, Sigstore Python, one of the cool things. I think this is actually, I forgot to update this, but as of yesterday, it's signed for the latest 3.11 release of CPython. So this is not related to Python packaging at all. It's completely its own thing, but I think it's really cool that they're using it to sign CPython itself. And then also we have this straightforward CLI that it really is that simple. You just pip install it and then sign and verify the two sub-commands. This does not yet solve the issue of that six point list. We need more UX work there. We need to think about how best to expose that. But for the time being, this is what we have and it is simple to get started with. We also have ambient credential support. This is something that we stole from Cosign. We have support for things like GitHub Actions, Google Cloud Build, et cetera. We will detect if you're in that environment and we will pull the OpenID Connect credential from that environment so that you don't even have to think about signing. It'll just do it automatically. I think that's pretty cool. And we actually use that in this GitHub Actions, Sigstore Python package. We automatically sign using GitHub's OpenID Connect and Sigstore Dogfoods is internally on our CI every day. And then finally, because via this package we can automatically publish signing artifacts to GitHub releases. And here are some screenshots showing that. So this is like the latest release. You see each distribution has a sig and a cert. Yeah, so that brings us to where we want to be. So right now, we can sign and verify things with Sigstore Python. We want those signing artifacts to actually be published to the index that Python package users use, which is of course PyPI. They have to be published alongside distributions. And then eventually, once they're on there, they need to be cross-checked via inclusion in a tougher repository per PEP 458. So to support this, PyPI has to undergo some changes. Right now, PyPI will reject any file that does not end with a few supported suffixes for distributions and for PGP signatures. So we needed to support.sig.cert or alternatively, once Sigstore bundles are fully standardized, whatever, I believe that Sigstore was a suffix that was chosen, that would become the supported suffix that we'll have PyPI accept. And then with any luck, PEP 694, which is the upload API 2.0, I think is what they're calling it, we'll enable this by having us attach all this to the metadata blob on the upload instead of having it be a separate file. So that'll simplify things nicely. Yeah, and then once it's on PyPI, it actually becomes really easy to deliver this to users. PEP can simply just download it with the distribution if it's available. However, this again will require a little bit more work because the simple index is frozen. So PEP 503 has to be superseded by PEP 691, which is a new index standard for the API. And once PEP begins to consume that new index, then we'll be able to deliver these files to users. Yeah, and that brings me to sort of the next thing, which is that server-side isn't enough. So it's one thing to have these siting artifacts on PyPI and at PyPI, I do a little bit of verification with them. We also want end users, people who are doing PIP install X or PIP install dash R requirements at TXT to be able to verify these materials themselves. They shouldn't be dependent on us to do it. So there's a couple of challenges there. One is that PIP is the universal Python package installer, which means that it runs everywhere, which in turn means that it can't have any sort of native dependencies, which six-door Python is full of because it uses OpenSSL and some Rust builds under the hood. So we have to think a little bit about how we're gonna do that. So that's on the roadmap to think about. But the key thing here is, and this goes back to what I was saying earlier about, staying out of engineers' ways, users should not have to be aware that anything has changed whatsoever about their process. This is a personal goal of mine is PIP install requests should not look any different unless something fails, unless an integrity or authenticity violation has occurred. If it does change, users will reject it based on the history of Python packaging and previous failures to use signing materials in Python packaging. And then finally, the bigger picture is that six-door needs to fit into the larger sign-in and high assurance constellation for all packaging ecosystems. We really want our efforts here to be a sort of litmus test for things like NPM, for cargo, sorry, for crates, for, you name it, Maven. We want them to be able to take from our mistakes from the things that we succeed on and do even better. So, what we need to get there. So it's, you know, there's sort of two big classes of things that I identify as things we need. One is sort of like general UX thought that needs to go more into six-door. It's one thing to verify signatures and it's entirely another thing to trust identities. People, many of you have spoken about this during your talks. Six-door is conceptually much, much better than blind verification, but its use needs to be practically better than that. It needs to be much more than just six-door verify than these materials. And you'd actually have some way of communicating to users concisely that they should trust this identity. They should trust this OIDC issuer. They should trust these particular OIDs that are on this cert, right? If you don't do that, they're going to fall back to either trusting everything, which is bad or doing nothing and going back to the status quo. There are some thoughts I have there about how we can make this better or how we can achieve this. Specifically, PIPI and the packages themselves have tons of metadata in them that we could use for cross-checking. And so this we could potentially use to avoid bugging users, basically, with verification information. We can just sort of glean it from the information they already give us and do things like, you know, the package maintainer email that's on PIPI. Can we cross-check that against the email identity in Six-door? The repository URL that's in the package metadata. Can we cross-check that against packages that are published using OIDC or via GitHub, via API token exchange? And then longer term, we want users to be able to configure machine readable verification policies so they can say, I trust these three maintainers, at least two of whom have to sign for the package, you know, that kind of complex policy needs to be consumable and committable somewhere public so that you can verify the policy does not change except for when you expect it to change. And our thought there is that it will fit in nicely with Tuff. And then the other big sort of fork there is threat modeling, you know, we all know Six-door is conceptually pretty complex. The core of it is simple, but as you begin to think about what verification means, what signing actually means, you quickly splinter out into different edge cases where, you know, you trust the email identity, but you don't trust the IDP. You trust the IDP, but maybe it goes down sometimes. You trust, you know, you probably trust the CA because we're all trusting Fulcio. It's part of what we're building. But a user shouldn't have to know any of that. A user should just be able to say, I trust it because X, Y, and Z, these are the properties I want to get out of it. And if I violate one of those, I know what happens. They shouldn't have to understand these RFCs that build up, that Six-door built on top of it. They shouldn't have to understand X509, V3, or the actual CT scheme to understand what they're getting out of it. And of course, they also shouldn't have to understand PKI or these ephemeral keys that we use or for keyless signing or any of that sort of thing to benefit from Six-door. So thank you, that's all I have. And please ask me questions.