 Hello supply chain security con. I'm Dustin Ingram and this is PyPI supply chain security. I'm Dustin. I'm a developer advocate at Google where I focus on Python, developer tools and experience and open source security. I'm also the maintainer of the Python package index and I'm a director at the Python software foundation. The PSF is a 501c non-profit founded in 2001 with 3.1 million in revenue per year. It's run by a mix of staff, board of directors and volunteers. We have nine paid staff including some of our accountants and our event staff. We have 11 member board of directors and we have a five member steering committee. And in addition to these groups we have hundreds and hundreds of volunteers that help us work on our various Python projects. Some of the key projects of the PSF include C Python which is the Python you probably are most familiar with. This is first created in 1991. We also run PyCon US which is the largest global Python conference. This started in 1993 and generates about 65% of the PSF's income. And we also run the Python package index which was started in about 2002. This generates 0% of the PSF's income but about 95% of the PSF's liabilities. The Python package index serves about 900 terabytes a day and more than 2 billion requests per day. It's the canonical repository for Python software and it's extremely popular. Our estimated cost would be about 2.3 million a month to run the index. And I go into a lot more details about this in the blog post link to here. Luckily a lot of these costs are borne by our in-kind sponsors including Fastly, AWS and Google Cloud. So we don't actually pay that much a month but the service takes quite a bit to run. I want to highlight some of the recent projects that we've done with PyPI and the packaging space in Python in general and how they relate to supply chain security. And I'll talk a little bit also about some in-progress projects later and also what some of the stuff we have planned as well. So one thing that we did in 2018 was a full stack re-grade of PyPI. PyPI had been around and basically had been the same service since the 2000s and we needed to basically modernize the web application that sits behind it. So with $170,000 of funding from the Mozilla open source support grant, we hired a team of contractors and we did basically a full stack re-grade from the ground up of PyPI. This was successful and allowed us to add a lot more features that I'll talk about in a second to PyPI built on this new platform. The first thing we did here afterwards was increase accessibility and access. We added internationalization localization. PyPI is now translated into more than 10 different languages. We also introduced API tokens for the first time before that. Authentication was just basic HTTP, username and password. And we also added two factor authentication, both TOCP and WebAuthent. So now you can use security keys to authenticate with PyPI which is great. And that was with funding from OpenTech Fund. We got about $100,000 of funding from Facebook and one of the things that we did with that in the past was add some prototype malware detection. So PyPI is an open index which means that anyone can publish anything to it and we've actually spent a lot of time making PyPI easy to use and easy to publish to. And while that's been great for our community, it's also meant that we've had a lot of spam and malware and folks just trying to publish generally not great stuff to PyPI. So Facebook helped us build this prototype malware detection system and something that we'll build on in the future. Another thing that we've done recently with help from Mozilla and the Chan Zuckerberg Foundation is we introduced a new and improved dependency resolver for PyP. So PyP is the canonical Python package installer and it's the command line tool that reaches out to PyPI and determines what you should, what artifact should be installed from a given request to install some software. We improved the dependency resolver. We gave it basically a proper dependency resolver. It allows it to actually do the right thing when you ask to install some packages. And we also did some UI and UX studies and testing around how people use the pip and how people use the resolver. We have a couple in-progress projects as well that I wanted to highlight. So one of these is with support from Bloomberg, we were just able to hire a packaging project manager for the next two years at the very least. This is a really critical role for us because basically a lot of Python packaging and the projects that I just mentioned were either run by volunteers or hired contractors. The PSF staff doesn't spend a lot of time working on PyPI and various features around Python packaging in general. These are mostly community projects. So we now have a packaging project manager and sort of community manager person that's going to oversee a lot of the future projects that I'll talk about and sort of help drive direction for Python packaging and the security projects we have coming up. Another thing that's in progress, which isn't directly related to supply chain security but generally improves the posture of Python as a language in the ecosystem, is that we were able to introduce a developer-in-residence role for CPython. So with support from Google, we have hired, the PSF has hired a full-time staff member to be a developer-in-residence and basically work on whatever that developer finds to be most impactful for Python in the language. Uggash Alanga is our first developer-in-residence. He's about halfway through his term as developer-in-residence and he'll be, he's made a lot of updates and you can sort of follow the work that he's done, both triaging PRs, making new PRs, digging into old issues, that kind of thing. A lot of important stuff for just making Python more maintainable in general. Another thing we've done recently with support from Google is we integrated the open-source vulnerability project into PyPI. So PyPI now is aware of vulnerabilities that exist on packages on PyPI and this was something that didn't exist before. Part of this was also creating a community-maintained advisory database of Python, known vulnerabilities in Python packages, not in Python itself, which is the advisory DB. So this work is now integrated into PyPI and will be used downstream. Another project in progress right now is introducing the update framework into PyPI. So this will allow us to secure PyPI data lines with sign repository metadata. This is based on a PEP that was written in Python enhancement proposal a few years ago by folks at NYU and elsewhere and this will basically introduce tough into PyPI to allow us to sign, allow PyPI to sign the package artifacts and metadata about those artifacts so that people can verify them on download. The last thing is more open-source vulnerabilities with support from Google. This advisory database is also being used for an in-progress auditing tool. So we'll now have a PIP audit tool that is able to inspect the local file system or environment that you're using some Python packages in and tell you if any of the packages there are victims of some sort of vulnerability. So the real interesting thing here I think is some future potential projects that we have coming down the pipeline. So like I said malware is a problem on PyPI, spam is a problem on PyPI. We had some recent press about this and you know basically a project that we're hoping to fund in the future is improved malware detection. Basically we do zero introspection of a package when it's published. Every malware report that we get is just someone going manually looking at it or some third-party researcher finding problems. We probably don't want to do detection in-band but we can do out-of-band detection, run analysis on packages on PyPI, possibly with tools from SSF including the package analysis tool. Another kind of thing that landed PyPI in the press recently was these dependency confusion attacks. Basically this is you know companies using private indices and people squatting the same package names on public indices and you know these companies installers were configured in a way that didn't prefer the private index over the public one. So one way PyPI can improve here is namespaces. Right now all packages on PyPI exist in this wide global namespace. It'd be like if GitHub didn't have organizations basically. Some packages do use an implicit namespace prefix so like some packages will be prefaced with things like Google Cloud for example but this isn't enforced in any way. It's basically up to the publisher to do. Introducing namespaces would basically remove or reduce the ability for these kind of dependency confusion or type of squatting attacks because that would allow PyPI to block a private namespace so a company can say this is the namespace for our private package just prevent anyone from publishing this on PyPI. Another thing which is a pet project of mine is that basically you know and I found this quote on Hacker news the other day everything on PyPI is just random artifacts uploaded by whoever ran a build script on the machine. It's nuts and yeah that's absolutely true. The way that distributions are published to PyPI is basically a really mixed bag. Some folks do this in GitHub actions or try to CI other you know Google Cloud build those kind of things. Sometimes they're just running these scripts on the machines building the artifacts there and uploading them and there's not a good way to derive the provenance of a built artifact from source repository or something like that. We don't have an easy way to directly link source to a built distribution in that way. So one project that we're considering the future is introducing a canonical build and published service. Not just tools to do this but an actual hosted service that takes away the need for the user to have to do this stuff manually. Basically give us a source file and then we'll build the publications and put them on PyPI for you. Sign them and do all the like best practices there. And this also helps us get around some of the problems of introducing new features like Tuff which is actually getting maintainers to use it. Usually you know support for these things land and then it takes a long time for them to be adopted. If we own a canonical publish publication pipeline then we could introduce these tools and everyone could benefit from them immediately. And lots more. We have lots of other ideas for projects. We actually maintain a list of fundables here in this repo. So that's all. I wanted to thank you for listening. If you want to sponsor our work you can go to pypi.org slash sponsor or you can email me directly di at python.org and you can follow me on Twitter, DI codes. Feel free to reach out. Thanks and take care.