 Hi, everyone. Thank you for attending this talk that will be about securing Python project supply chain. So a few words about ourselves. My name is Maya. I'm a software engineer at Red Hat in the emerging technology security team. And hi. My name is Fridolin. I used to work at Red Hat. I used to work at Datadoc. I'm an entrepreneur now. And you can find more information about us on Twitter, Mastodon, or GitHub. OK. So let us start this talk with a simple question, which is, why protecting your supply chain actually matters? So if you follow open source security or maybe Python news, you might have seen this information pass, which is that not too long ago, PyPI maintainers decided to temporarily deactivate user registration on the index and also the upload of new packages, of new projects. Because they got overwhelmed with the volume of malicious packages that got uploaded on PyPI. So they couldn't handle it. Everything was very overwhelming for them. So they had to suspend all those uploads. And maybe a little more or less recently in January, you might have seen that PyTorch, the famous machine learning library, was compromised. Some nightly bit of PyTorch was a victim of dependency confusion attack. So we'll see what a dependency confusion attack is later in this talk. But this is also a supply chain attack. And this is not a coincidence if you have seen all those articles passed recently, because a study showed that year to year, for the past three years, supply chain attacks have increased by more than 700%, which is very high growth rates. Supply chain attacks can cause a lot of damage to an organization, or a project can be both financial and also reputational. And if you have a weak supply chain, this gets you into a lot of legal and compliance issues as well. So some new regulations were put into place to try to secure software supply chains. And one of the most famous ones was issued in 2021 by President Biden. And this was the executive order 1421. It was called Improving the Nation's Cyber Security. And it was issued after the pretty infamous SolarWinds attack, which affected a lot of big organizations and some branches of the United States federal government. And this executive order basically tells corporation that collaborate with the US government that they should be more strict about supply chain standards they adopt. So for example, it pushes organization to adopt secure authentication to servers or maybe strict protocols and these kind of things if they want to sell software to the government. And this includes things like software bills of materials that we'll cover later, which are basically like a list of the ingredients that compose your software. OK, so now let's take a look at supply chain threats and vulnerabilities. Maya already mentioned infamous SolarWinds attack. It was an attack on SolarWinds Orion platform that is quite well used. It's a network performance monitoring platform. And what was done here, attackers basically uploaded malicious DLL file that was subsequently pulled by a build system, which produced software artifacts that were consumed by customers. These software artifacts were properly signed, so customers were not aware that there are some malicious behavior. And the effect of this attack was quite large. So more than 18,000 customers were affected. And more than 400 of US Fortune 500 companies were affected by this attack. You can see, for example, the White House or Pentagon or State Department or National Security Agency that was affected by this attack. What happened? SolarWinds stock price went down. That's not the worst thing, but attackers were able to access confidential information of customers. What could SolarWinds do better? They could follow Salsa framework. Salsa framework is quite recent. Quite recently, it went to version 1. Salsa stands for supply chain levels for software artifacts. So it's not a source. It's not a dense. And Salsa defines four levels, starting from level 1, where there are basically no requirements on the build platform, up to level 3, which is a properly hardened build platform. Salsa introduces this image. So you can see there's a producer that produces source code and the source code is stored in a source repository. Then build platform pulls sources, pulls dependencies, and creates a package or the resulting artifact that is subsequently consumed by consumers. Salsa defines threats in each step, so what can go wrong, but also defines how to prevent from these threats. So how to, for example, prevent from submitting unauthorized code or make sure that the source repository is not compromised. Now let's take a look at some toolbox to protect your Python projects. So the very first toolbox thing we will talk about is TAF. It's the update framework. It's called TAF because it solves TAF problem, and it is basically securing updates and preventing temporal attacks, rollback attacks, or key compromise attacks. The reference implementation is based on Tandy. That's an updater that was used in Tor. TAF is, let's say, more generic, but borrows many ideas from Tandy. There's also AppTain that is similar project to TAF that is used in automotive industry. It's used by companies that populate updates to cars. The reference implementation of TAF is in Python. You can find it under Python TAF. And one company, Datadoc, used TAF to secure agent integrations, the software that is shipped to customers, and uses TAF, and in total that we will talk about later, to securely ship software to customers. There are also efforts like PEP 458 and PEP 480, PEP meaning Python NNS Proposals, to secure PyPI itself. And we will talk also about it. TAF is also used in Zigstore to securely download public keys for instances of Zigstore. I already mentioned Intoto. So Intoto is a framework to secure supply chain. What it does, it basically defines what each step in a pipeline should do. So if a pipeline should, for example, write something or package something, then there is created attestation. So you are sure that each step in a pipeline performed desired task and is properly signed. So users who consume resulting artifacts can verify that each step in the pipeline, each step in the chain did its job properly. Now. OK. So now let's go to an important part of every supply chain, which is code signing. And for this part, I would like to introduce quite new projects when it comes to the space of code signing, which is Zigstore. So Zigstore was started a few years ago by different institutions and companies, like Google, Red Hat, and Purdue University, to make software signing more accessible and simpler. So it provides a very secure and simple interface to sign in kind of codes and containers as well. And to use it, you don't need any specific cryptography knowledge, which is kind of an improvement if you compare it to other signing standards like PGP, where sometimes a configuration can be a bit complex and you might need to know about the underlying cryptographic protocols that the tool uses to sign your software, which is not the case here. So one nice feature of Zigstore is that it uses OpenIDConnect to sign software instead of self-managed private keys. So OpenIDConnect is an authentication protocol and what it allows you to do is to bind your email address or any kind of identity like, for example, GitHub workflow run to your signature. So instead of having a permanent public key, bind to it, you can have something more identifiable for your end users like your email address and more personal. Zigstore has a client implementation in Python. It's called Zigstore Python and you can check it on GitHub. So it's a pretty good tool. It has a lot of integrations like, for example, with GitHub CI runs, you can use it as a GitHub action, you can use it as well as a CLI and I put an example on this side of what it looks like to sign with Zigstore Python. So it's very simple as you can see. If you want to sign a package, let's say a Python package in this case, the only thing you need to do is enter Zigstore sign your package and then it will redirect you to an IDC session. So basically this is a web browser page that opens up and you need to enter your credentials to an identity provider which is, for example, Google or GitHub which are currently supported, enter your password to authenticate and then it will validate your identity and bind it to the signature of your artifact. Here's the package. So you sign your artifacts and the second step is for your end users to verify the signature. So here again, it's quite simple. This is the second command. You just need to run Zigstore Verify here identity and you need to pass the email address of the signer which can be found on the signer certificates which is provided by Zigstore and you need to pass as well the URL of the IDC provider. So here for example, the signer identified with GitHub accounts so you will need to pass the corresponding URL and just what we call a bundle file which basically is some kind of verification file that contains all the materials you need to verify a signature with Zigstore. Okay, so before we skip to the next part, I would like to make a quick reminder about the difference between what is malicious and vulnerable. So a vulnerability in software is some kind of flow in a computer system that can weaken the overall security of the system but the thing with vulnerability is that they can be exploited but they are not always exploitable and actually some study found that less than 10% of vulnerability are actually exploitable and less than 1% of them are actually exploited. On the other hand, malicious software or malware is any kind of software that is intentionally designed to cause disruption in your system. So that includes for instance ransomware or Trojan horses or viruses, for instance. So to found out about vulnerabilities that exist in software libraries, for instance, you can use vulnerability databases. So here we chose two examples. The first one is OSV. It's a distributed vulnerability database for open source projects and what it does is that it aggregates vulnerability databases from different ecosystems like Golang, Rust, or of course, Python with PyPI data and it makes them available in a format called the OpenSSF Vulnerability Format. The second example we picked is GUAC. So it stands for Graph for Enlisting Artifact Composition and GUAC is a graph database that aggregates all kind of software metadata about security, like for instance, artifacts, identities, like S-bounds we'll talk about later and it stores the relationships between those artifacts and metadata inside the edges of the graph database. So GUAC is quite useful if you want to prevent supply chain attacks because it allows you to understand better the relationship between the different components in your software and how they are used together, for instance. If you try to apply this to the Python ecosystem, there is no direct support for vulnerabilities in PIP, the Python package installer. Nevertheless, there is a tool called PIP audit. It uses the OSV database that Maya mentioned and what it does, it audits already installed Python environments so you can issue PIP audit and it will show you vulnerabilities but also packages that introduce these vulnerabilities to your environments. There was also an experiment called PIP Cuddle. It basically resolves application dependencies without vulnerabilities or only with vulnerabilities that are acceptable ones. So it accepts a configuration file. In this file you state which vulnerabilities you are fine to have in your application and then PIP Cuddle resolves application dependencies and then you can install all the dependencies including transitive ones. There's also a project called security constraints. What it does, it consumes security recommendations by GitHub, so you need to provide GitHub token and it can generate security, it can generate constraints for your application so the resolution process then checks these constraints and resolves application dependencies without vulnerabilities. The PyPI is quite, let's say, bad when it comes to a number of malicious packages published each day. So PyPI maintainers claim that there are roughly 40 malware packages introduced each day and they need to be taken down manually. There's a data set called malicious software packages data set and it aggregates packages that were published on PyPI but were taken down because they were malicious. So if you want to experiment with malicious code you can do so, just be careful. There was also an effort which is an open source tool called GuardDoc, it scans Python source code and tries to find patterns in the source code that can be malicious. GuardDoc uses SEMGREP rules to statically analyze the source code and give you information whether the given package is malicious or not. GuardDoc is not used on PyPI but you can use it on your own or you can plug it into your system. Maya already mentioned SBOM. SBOM was mentioned also in the executive order issued by the US president. SBOM stands for Software Bill of Materials. What it does, it basically states all the software that was used to create or assemble some application. So in this listing you can find all the dependencies, their version and there are two formats that are used in the industry, Cyclone DX or SPDX, there are more but these are, let's say, the most used ones. If you have a software bill of material for your application, you can also use VEX that stands for Vulnerability Exploitability Exchange and VEX states whether the given vulnerability that is present if you are using it and vulnerability that is present in your application is actually exploitable. So if you have a vulnerability in your application it doesn't mean that an attacker can exploit your application because, for example, that vulnerability doesn't need to be on the call path or the application configuration prevents from exploiting the given vulnerability or you deploy your application into an environment in which that vulnerability is not exploitable. There were two efforts, one was OSVDEV, a force to, let's say, standardize VEX in the industry. They introduce a file that you can maintain in your Git repository and that file states information about libraries that you have, information about vulnerability and whether that the given vulnerability is maintainable or not. OSVDEV also proposed a way how to, let's say, manage multiple VEX files across repositories so you can check multiple files when you are consuming multiple libraries. There's also open VEX standard in the industry that was pushed by ChainGuard, ChainGuard the company and it proposes a standard in the industry to describe VEX for your application. So here is an example. It basically states vulnerabilities and also their status and what introduced the given vulnerability. If you want to run your Python applications, you can use Python container images. So Red Hat produces some Python container images, UBI or Federa-based source-to-i images. The main benefit of these images is that it's large RPM ecosystem with Wetted and very well-maintained software. You can use MicropyPen in these container images and on the other hand, there's also ChainGuard's Python image. It's based on Wolfi, they maintain their own ecosystem for packages and it uses multi-stage builds. So you have one container image that is used for building your application and then another one that is very minimal just with Python runtime to actually run your application. They try to minimize number of CVS present in the containerized environments. Another thing you might want to do if you want to check for potential vulnerabilities in your source code is use static source code analysis and we picked an example here of such a tool which is called Bended. Bended was started by the OpenStack security team at Red Hat and what it does is that it scans the files in your Python projects and then from this file, it generates ASTs, abstract syntax trees and it uses plugins to analyze the risk for potential vulnerabilities but you can choose which plugin you use. So for example, you can choose if you want to detect things like hard-to-do passwords, shell injections or crypto mining for instance. Okay, so now let's go over some initiatives that the Python community has taken recently to secure the ecosystem supply chain. There are a bunch of them but we chose a few important ones. So the first one is mandatory 2FA for maintainers of critical packages. So it is a list of packages that are widely used by the Python community and by developers and so PyPI maintainers chose to give away for free with the sponsor security keys so that the maintainers of those critical packages can securely authenticate to PyPI and upload packages in a more secure way. And more recently, they announced that in 2023, so this year, 2FA will be mandatory for every package maintainer on PyPI. They also have another initiative which is Trusted Publisher, quite recent as well. Trusted Publisher uses the OpenID Connect protocol again for users, maintainers of Python packages to use an OpenID Connect identity to get a temporary identity token which allows them to get a temporary access key to PyPI instead of the normally API key would store permanently for example and reuse in CI workflows to publish packages which is a bit more insecure. And one last measure is pretty recent. They chose to drop support for PGP signatures from PyPI. So a Python community member made an audit of how PGP signatures were generated and used in PyPI and found that they weren't that useful and were actually quite hard to maintain so they chose to just drop support for it. So now we'll go over more initiatives to come from the Python community. Okay, so let's take a look at improvements. We will talk about PEPs. PEP stands for Python Enhanced Proposals. That's basically a way how to describe what you want to do in the Python ecosystem and then the community decides whether it's good or not. The first one is PEP 458. It's about securing PyPI downloads with signed repository metadata. This one was accepted and it uses stuff, the framework that we discussed before and it basically secures downloads of Python distribution so you can be sure that you are downloading the right software from PyPI if you're a PyPI consumer. It's still work in progress. Then there is PEP 488. It's about surviving a compromise of PyPI so imagine someone compromised PyPI and uploads packages there on packages. How do you want to check that? This PEP describes a way how to do it. It's based on PEP 458 and it adds developer keys to warehouse or PyPI. Currently it's in a draft state. Also there might be a new PEP in few days so stay tuned. Now let's talk about dependency confusion attack. At the beginning of presentation we described one dependency confusion that happened in the PyTorch ecosystem. So if you are a user and you install Flask and Torch or PyTorch from two indices, let's say PyPI and PyTorch, you want to consume PyTorch from PyTorch because there are let's say special builds that you would like to use. In the Python ecosystem these indices are treated as mirrors so it doesn't really matter for PEP or potentially other installers which index is used to consume a package. So in your example, you would like to install Flask and Torch and also transitive dependencies of these libraries but which index should be used? So imagine that in this case you are consuming Torch from the PyTorch index and a dependency of Torch called Torch3tron also from the PyTorch index. But if you are an attacker and you upload a package to PyPI with the same name as the one on PyTorch index, then it can cause troubles because these indices are mirrors, right? So the Torch3tron can be malicious and you can consume malicious package. If you would like to detect possible dependency confusion in your Python applications, you can use a tool that is called Yorkshire, so a cute name, right? Then there is another PEP called for extending the repository API to mitigate dependency confusion attacks. That's basically the PEP that's addressing these dependency confusion attacks. It's still in rough state but what it introduces, it introduces a way how to create a contract between indices. So imagine PyPI says that project Torch3tron is trusted on PyTorch and PyTorch index says Torch3tron is trusted on PyPI. So there is a contract between these indices and consumers or installers of packages can verify that these packages are or trust each other so they can pull from PyPI or PyTorch. If there is no these tracks information then the installer can fail and notify about possible dependency confusion. Another PEP called recording the provenance of installed packages, that's PEP 7.10. It's based on PEP 6.10 recording the direct URL of installed distributions. So if you install a Python application and you install it using a URL, so let's say you use GitHub to download an archive of PEP, then PEP and other installers create a special file called direct URL JSON in the metadata directory called this info and track information that you installed PEP from GitHub. And this is the URL, this is the hash of file. Nevertheless, there was no way how to find out what you actually installed if you issued just PIP install, PIP or flask or whatever application or whatever library. So PEP 7.10 introduces a new file called provenance URL JSON that states what file was downloaded, what were the hashes, when you installed packages using their name and optionally their version. It also tracks information about indices. It's still in draft state but if you are interested in it feel free to check it. And now we will have an opportunity to win something. So for those who are listening to us there can be something good. So the rules are, we will ask a question. I will try to check who raises hand the first and then we will give something. Does it sound okay? Yes? Okay, let's do it. So the first question, which project mentioned in this presentation does this photo relate to? Yeah? Sorry? Salsa, yes. So we have a winner. And so that's right, you have like mild salsa deep together with HHS. Okay, so another. To which project mentioned in this presentation does this photo relate to? Wow, yes. Yes. Can you guess also the price? We actually borrowed this idea from Guac people, so developers behind Guac. Yes, so Guac, grab for understanding artifact composition. Okay, to which project mentioned in this presentation does this photo relate to? Anyone? Yes. Into the ring, into the ring. Wait. Okay, so the price is, yes, so salsa is the correct answer. And you get a ticket to salsa lessons so you can choose between Maya or me. Okay, okay, there is also plan B, so. Yes. So hot salsa it is. Okay, to which project mentioned in this presentation? Yes, we have. Yes, it's Yorkshire, so. It could be also Gardock. So what do you want? It's a lollipop. Okay, and now the tough one, to which project mentioned in this presentation does this photo relate to? No. Yes. No, no, that's not it. Okay, a hint. You win chocolate. We wanted to ask for signatures of zig store guys, but maybe next time. Stickers, oh, that's a good idea. And I think we have some space for questions, so if you have any questions, feel free to ask. So I'll repeat the question. So I mentioned that there probably be some pep, like new pep. So I collaborated with one PIPI maintainer, it's Donald Stuffed, and there is something written, so let's see if it will be public or not. There are also other engineers involved, so for example, Trisha Angu is behind Tough, and other people, so let's see if it go public. Yes? Okay, so I think if you want a complete answer, William Woodruff published an article about it, which is pretty explicit, and he gives details about the whole audit he did on PIPI about GPG signatures. I think it's called something like GPG signatures worse than useless, so it's pretty explicit as well, so I encourage you to check it if you want a really complete answer on why GPG signatures were not worse, maintaining anymore. Yes? Oh, yeah, sorry, I just repeat the question for, so the question was, why exactly are GPG signatures not considered worse, maintaining anymore by PIPI? Sorry, yes? Okay, so the question was, if I understood, if we consider chat GPT as a tool to prevent malware, right, and there is a tool called Package Hunter that can help detect vulnerabilities, is that correct? Okay, runtime vulnerabilities, okay. I have not considered chat GPT personally as a tool for this talk, at least we didn't provide any example. I'm sure some tools use it now, but honestly, I didn't search into it. I don't know if you did. Okay, no, we haven't considered it yet. Okay, thank you.