 So thank you. Yeah, we'll talk about how PEP458 can protect PyPy, but not only PyPy. That's our idea here. So I'm Kyro, and here Martin. So let's get started. Yeah. Before talking about the PEP itself, we need to understand a bit. The nice thing is to talk was about the malware in PyPy and what's PyPy. Actually, PyPy, it's a web platform that's built with the name Warehouse, and it has two use cases, basically. First one is developers that want to share their software, their libraries. So they make it public in PyPy, and we install it using PEP, and most of you have done this. So yeah, we have the client side that will be for two, actually, for the developers and the users. And to show how important it's PyPy, I will give some numbers. So PyPy has a modern 550,000 projects, 700,000 users, and it's service daily 900 terabytes and 2 billion of requests. So who was in the pre-host talk can be how this is 0, 5% it's big. So yeah, so when we talk about PyPy, PyPy is a distribution platform, but in the end, it is a repository, right, serving things to download. And why should we protect repositories? So Marty will continue. We are protecting repositories because they're high-value targets. As Carol already said, repositories are something we all rely on for our daily jobs, and our users expect that we are going to build packages, products, and so on. And the question we should ask ourselves is what happens if an artifact is tampered, if change somewhere somehow is modified? And logically speaking, who are the vulnerable ones? The first level, the people who will be directly vulnerable of a tampering package, of course, are those who directly rely on our projects. But what is even more scary is that it doesn't end there. Many times, the vulnerabilities are such that the vulnerability can be shown in transitive dependencies as well. So as we saw on the previous talk, when you rely on a vulnerable package, you can have the problem yourself as well. So what is a traditional way to protect a repository? Of course, sign all the artifacts. Given how hard it is, it's a big cost you take. But when you sign all the artifacts, what is really good is that you receive a checksum for each of the artifacts. So if your users are careful, they could notice if a package was tampered. That's good. And it was a solution for many repositories. But sadly, there are many problems with it. And Kat will share more about it. So yeah, signing artifacts, it's not enough. It's the picture. It's an illustration for that. But why signing repositories are not enough? I will give you some case here. Let's say that you have a key that's signing your package. And probably some people in the organization has the key to do the signing. And what happens if someone leaves the company? So yeah, this key becomes a bit vulnerable. And let's think about the other case. What is the key? Get compromised. OK, it could be because someone leaves the company and has access to that key. Or just the key was leaked. And this is not something uncommon. If you see this news here about that, one key was leaked. And it was about two months ago. So all the updates get compromised in that case. But we have a solution for that. You generate a new key and re-sign everything. So it looks OK. But let's think in scale, right? What if you have millions of packets, like by API? 8 million packets. 8 million artifacts. You need to sign everything again. But remember, usually you need to have CDNs, mihors. Everything needs to be synchronized across the world. And sometimes the mihors offline need to take more days to get synced. This is what happened with Deben. There are a lot of mihors. Mihors in some universities that nobody is taking care of. Of course, someone is taking care, but not like a company holding that. All right. And how you notify the users about the new key? Because you have a new key. But what would happen with the user? They would just fail to download. Because then they need to pick up the new key. Sometimes update their system or do something. All right. So then we come with tough. Before explaining PEP 458, let's understand tough. It's the update framework. And this has a good solution, a framework, to solve the problem with the signature. It's sign everything. So that's a good solution, right? And of course, Cairo, not of course, but Cairo here is wrong, specifically this situation because tough actually doesn't sign everything. And that's the big difference as compared to a traditional solution. Tough signs only metadata files. And there are a small number of metadata files you need for that. And what is really good is that tough provides you with a clear process of key revocation. And the users can easily understand which are the new keys for which they should trust. And that's something they shouldn't do by themselves. This is something that toughs do for you automatically. Also, it's really nice that tough provides freshness, consistency, and integrity. And one can ask, how does the stuff do that? I would just say it on a high level because we don't want to go into too much details about tough. But we would say that tough has a verification policy. So each signature it has for an artifact, it has its mechanism to say if this key is signed by a trusted authority, or if the key is still valid because there is an expiration for each of the signatures. And we have discussed tough. Now let's say a little bit about P4 58. P4 58 is already accepted proposal, an old one, to be honest. And it proposes a minimum design of tough and how it can be integrated into warehouse. Or warehouse is a software that powers PyPI. It gives you a rollback and freeze protection. And also, it has, as we said already, inbound explicit revocation. This comes from tough again. And the good part is that they won't be any particular changes for the user. So you just, as before, you install your package as it is. But the problem here now comes that we have been dealing with this for a while and tried to implement this. And there was a PR which already did that. But as we saw, it's really hard to implement tough repository. And we'll hear from Cairo. Yeah. Tough is tough. Yeah. My first contact with tough was like I was amazing about what it does. But when I start to work on this in the P4 58, to be honest, was very complicated. And I say, oh, this is really difficult. And it's not better now. I work in that product. And I still struggle with that. But OK, we need to do it right. So the problem is the tough specification is very complex to understand. And the specification mostly covers the tough client part, not the repository side. So then you need to understand some behaviors of the client to build the repository. And this implementation has really high cost. Because you need to do a good design based on the view of the client. So you need to have developers working on that, maybe a product manager. Because this is not an implementation that is done in short term. It's a long term implementation. And as this is a long term, it adds a lot of code base to your code base. And it means maintaining bug and keep it in working. And besides that, we have another problem that's about the tough process. Because you need to generate the metadata, sign it, to predate it. You need to do it in the correct way as designed by the specification with the view from the client side. And it can lead a lot of inconsistency in that metadata. So then we come up with some solution for PEP5 data. So we didn't implement the PIPI. That was the solution. But I'm kidding. We tried to do better than that. So what we did, we originated in VMware, internal open source project that actually, from the very beginning, it was open and public. And as soon as possible, also what we did, we donate this project. It's under OpenSASF. That is also under the Linux Foundation. So it's now in a neutral organization. But of course, we have a draft PR now that implements PEP458 with our stuff. So it's still a draft, but we are working that. So what was the motivation for the project that we want to talk now about our stuff? Of course, was PEP458. Because when I was struggling with the PEP458, implementing tough repository in the really big environment that's warehouse, I said, OK, what if this effort can be used by others? Organizations, companies, other repositories like Rube Jam or Debian. And actually, I was with this thinking and UC that is from Google and a tough maintainer. He came last Open Source Summit here in Europe and said that repositories are more alike than they are different. Then it was the reason of this project, actually. So what we tried to do in that project is abstract the tough specification complexity. Make it easy to adopt tough using our stuff. So of course, as I said, I always struggle with understanding tough. Luckily, we have some tough experts helping us to this. Lucas from the New York University is a tough maintainer and our stuff as well. We have Joshua, that's a tough maintainer. And UC that I mentioned, they are really working as contributors and maintainers and advisors for us. And we have also here our friend Costa, say hi, Costa, that is also a maintainer for the project. Yeah, I want to do this with you. So what is about the design of our stuff? Yeah, we want to make it easier to deploy. So how we deliver this is container image and a command line tool, because we want that also scale and keep the consistency of the metadata. That's because we put these words together, scalability and consistency together, because you cannot scale and make the metadata inconsistent. And we want to make it easy to integrate. So all the integration in our repository or distribution system or platform is done through the REST API. So to explain how the API integration works, Martin will take over. So our stuff can be put in many places, because our stuff can be utilized and integrated through API calls. So I will give you first an example with a distribution platform. Warehouse can be considered a distribution platform. And the usual workflow in this situation is that you have your typical Python developer who pushes a package to warehouse, warehouse does some parsing, and then integrates it into the public repository. And in order to integrate our stuff into this workflow, you don't need to change any of the steps. You just need one additional process which will make an API call with the package information to the tough metadata. That's all. And this is we speak from the server side. And then we will speak for the client side. Then I will give you another example. If you want to integrate our stuff in the CI CD, the typical workflow here is you build your project, run the CI checks, and of course, run the CD to deploy it to your public repository. And where do you integrate our stuff? Well, in the same way, you have a job, you have an action which just runs, which calls the API and sends the information to existing our stuff deployment. Well, that's great. But let me show you a real example. This is from demo repository we did. And we have tried it, tested it. It took something like 30, 40 lines in the GitHub action to send the metadata to our existing our stuff repository, our stuff deployment. And I can give you a link to the repository if you want to have a look, of course. But let's move on and think about a little bit. So we have said that it's easy to deploy our stuff, but it's not only to deploy it. Let's speak about how you can manage the our stuff operations. So the our stuff, we are talking now about tough management, that we need all the process like bootstrap, creating the first metadata, signing it, updating it, do a key rotation if someone leaves the organization and you need to have a new key to make it transparent to the user, not affecting the clients. So we have delegations out. So this is very complex. So we are trying to abstract it with a very nice CLI. So that's what Marty will show. So the idea is that we have the API and you can do more of the operations through the API. But if you are new to tough, you don't understand the framework. You haven't used it. You can use the CLI and there is no requirement for previous stuff knowledge. What is also nice, you have examples. What is bootstrap? Why do I need keys and what are those keys used for? Then when you understand better, it's nice that this is a step-by-step guided process. We ask you on each step what do you want to do, actually. And when you decide what you want, in the end, we give you summary. It's a little hard to see as I see it right now. But the idea is that you have a summarization of what you just created, the public value of the key, how many keys you're using so you can have a better understanding of what's happening. And we can say that the way that Rstuff was designed is with flexibility in mind and to add as minimum as requirements as possible. That's why we can say that Rstuff is artifact agnostic. What does it mean? It means that you can use Rstuff in your system to, I don't know, deploy to so your users can download legal documents, maybe even movies, maybe even whatever. You can use it for everything. Also, we don't care what language you use to call the Rstuff API. So we can call it with all of those and, of course, others. Also, as we have mentioned, we don't also care how do you send or how do you upload your artifacts to the public repository. There are many ways to do it. Choose one, but please just call the Rstuff API in the correct moment. And finally, as we have already said and kind of have set, we deploy Rstuff as a set of containers. That way, it can scale. And you can use it with your orchestration tool or on-premise private or public cloud. And yeah, so let's now hear about the main Rstuff features. OK, yeah. Now what we have as features are you can bootstrap, start your repository with all the signing process. So of course, if you have already an existent target artifacts, you can import it. For example, PyPI, you can import all the existent packets. You have an API to integrate that do add, remove. And also, you have the key revocation process. And also, if you need to generate keys, it has a tool to generate the keys. So now I will present you how it's going on with PyPI. So yeah. So the first part, it's like a developer that will be able to upload a new version of a package. If you see here, it will use the twine that's usually used by developers to upload a new version. And the new version will be available to the PyPI. So the user will be able to see it available. The new version, the beta, is there. So later on, the user can download the PyPI using PIP. You see that it looks a bit different than logs because I'm proposing it now. You see just the calls to the metadata. So it's validating the signature. And let's see what happened if these get compromised as we saw before. So what if someone go there and do a tampering in the package. So let's say here I'm creating a fake package and that to the repository. What happened when the user will try to install it? So the user will download it and get an error. Here, the error is too ugly. It's an exception. But this is what we want to show is that you can see that there is a tough exception here saying that the metadata, it's not valid. And we are showing that the size is not matching with what should be in the metadata. Of course, PyPI also does this protection now, but these are just showing in a tough level. All right. So implementing the client side is quite easy. That's what we show here. It's more or less using requests. As you see, the nice thing is to have everything built in. And you can use this not only with Python clients, but you have libraries like Go, JavaScript, and others. So you could even protect a portal with that. So here, Martin will tell how the flow of the client works. So the flow from a client perspective consists of four steps. And the first, as we expect, is that the user wants to download an artifact from PyPI. Then actually, before downloading the artifact, PyPI will download the tough metadata. The reason is that it will use that metadata to gather the information about this particular artifact. What it needs, what it expects, and because this is signed metadata, if it's changed, the process will stop on step two. So we are sure that this metadata is trustworthy. And then after we have this trustworthy information about what to expect, we download the artifact and compare it against what we know. And now we want to share with you our future plans. We have discussed what we have until now. And we want to be even more flexible by first allowing more key vote and story solutions, AWS QVote, Azure as a story solution, also Google Cloud and so on. We're open for suggestions, of course. Also, we want to allow custom role delegations. The idea is to allow organizations to better have control over how they delegate the trust. And also, we want to improve the metadata management. And finally, this is an exciting feature we are working for, is distributed signing. This is really important for big organizations, international organizations. Because currently, if you have an important ceremony, you need to do it in-house, in place, physically. Right now, we're working to do it in a process to do it wherever you are. You can do it on multiple steps. And we're excited to share with you that our stuff beta is out. It could be tried, and we hope that you will see it. We will try it. This is a link to our documentation. We're open for questions. We're open for chats. And yeah, go ahead, Cairo. Thank you for attending this talk. It's a new product, but we see a lot of benefit for not only open source products, but also for organizations, companies that want to provide secure download the data for data, OK? Thank you. Thank you very much. We have five minutes for questions. So please raise your hand. If you have any questions, I'll come over and can ask. Anyone? Thank you for the talk. So with the progress of the PEP and applying this to PIPI, how close are we to seeing our stuff being added to PIPI? Sorry, can you ask again? So you've been doing a lot of work with our stuff and working on the PEP. How close is that to completion? Are we going to be seeing this in PIPI anytime soon? Yeah, that we hope. To be honest, this PEP, it's since it started to be created more than 10 years. And yeah, but now we have really, I see many organizations put an effort on that, not only VMware, but the New York University and Datadog is helping us, everybody's trying to contribute in somehow. I see that soon we will have more results. Please follow the PR and try to also help there. It's good. Thank you. Thanks for the talk and for the tremendous amount of work this seems to have been. I have maybe it's a very naive question, but you say that it was artifact agnostic. Do you see our stuff working with something like Docker registry as well? No, do you understand it? Sorry, can you repeat? Can you repeat the final? Yeah, do you see our stuff working with things like internal Docker registries? Internal registries. For Docker images. For Docker images? Yes. Yeah, it could be, as we said, it could be deployed on-premise, so you don't have any limitations on how do you deploy our stuff. So you can do it, yes? And we can help you with the advice or design or whatever. We'll gladly help you with this. I will complement also your question. I saw one user case for our stuff that it's someone that uses Docker container image from a registry, but could be a private one or a public one. So you could use the RCI CD instead to deploy, pull the image directly. It could use the metadata to fetch the image instead fetching directly to the repository. So you could use internally or in cloud as well. Thanks. Thank you. Hey, thank you for your talk. My question would be regarding some private repositories. So as far as you probably know, there are options like Gprog, Artifactory, and so on that are available for companies if they want to self-host their project. Python, yeah, Python package index. So do you have any information regarding integrating your stuff with such products? Yeah, it's also feasible because what you say, for example, to the configuration of our stuff, what's our my base URL where my artifacts are. So it will be the address for your Jfrog Artifactory, for example. And the metadata and the client will know where to fetch it. Even it's a Jfrog. You can integrate it with Jfrog as well. Thank you. Any more questions? So I see this working for organization because essentially you have a package that is signed by VMware and I trust VMware or no, it's not like this. So how it will affect the user. So me as a user will contribute to a package and sign it. How it will really work. Maybe I don't get really the connection there. OK, yeah, the big difference is about the signature that actually usually you sign the package. You no longer sign the package, you sign the metadata. So as you are a final user, you shouldn't have any change because what you have is what doesn't matter which client you use, the client should first go to the metadata and then fetch the artifact. So there is no change for you. And also for who is uploading some package because what happened is for now, for the PEP458, the PIPI will sign the package for you. Of course, the PEP480 that was showed before, yeah, this one will be the leader of products. You will have the delegation to also be able to sign. They can create a product, can create a core, a core room, that let's say you have three maintainers, at least two maintainers need to sign it and then you sign it. Just one thing that the metadata has hash and length of the artifacts, of each of the artifacts. That's how it protects because if you have a good algorithm to calculate your hash, you're good to go. And a good, of course, key that you use, a good, yeah. So thank you very much. This is all the time we have. If you have any more questions, please shout to Cairo and Martin on the hallways or on Discord. Have a nice day, folks. Bye.