 Good afternoon, everyone. Thanks for joining us today. My name is Joshua Locke. I'm a senior principal engineer in Verizon's open source program office. My upstream work is focused around software supply chain integrity, where I work in several CNCF and open SSF projects, but I'm here to talk to you today about the update framework. It's tough. In prior life, I worked on open source Linux distribution build tools, so I'm a big fan of rigor and determinism in software engineering. That's what brought me into this line of work. I'm here with my friend and colleague Lukas. Hi there. My name is Lukas. I'm a research engineer at the New York University Secure Systems Lab, where, oh, thanks, where I've been working on software supply chain security projects for the past couple of years. That includes in Toto and Tuff, both projects in the CNCF, and yeah. Back to Joshua. Thanks. Yeah, so today we're just going to take a look behind the scenes of the Tuff project, talk about the organization and some of the things we're doing, and some avenues for contribution, hoping to attract some new contributors to the project today. So this is our agenda. We're going to have a really high level overview of software supply chain security and how Tuff kind of fits into that model. We're going to look at what the Tuff project is kind of made up of, talk about some of the recent work we've been doing and how that's enabling Tuff to be used in more places and indicate some of the places that that work is happening effectively. So I'll hand back to Lukas to talk about the 101. Yeah, so Tuff, short for the Update Framework, is a software supply chain security project. We won't go into details about how it works, but still briefly talk about the software supply chain and how Tuff protects it just to get everyone on the same page. You'll hear many different definitions of the software supply chain. Also here at the event, one definition I like is that the software supply chain is the series of steps from inception of a software project to the delivery of the resulting product. It's a very generic definition. What the software supply chain also is, is an attractive target for attackers. For one simple reason, attackers can maximize their impact. A single compromise affects millions of end users or clients. It's also attractive for attackers to attack the supply chain, because it has such a such a big attack surface. Modern software supply chain or modern software products have many, many dependencies and are really supply chains of supply chains. And one more reason why there have been so many software supply chain attacks recently is that, or for a long time, is that it has been the security of the supply chain has been neglected for a long time. But this is getting better. Tuff is located at the last mile of the supply chain, where the content ends up in a repository or in a container registry and is then fetched by a client for installation or deployment. So Tuff protects that process. More specifically, Tuff protects integrity, consistency and freshness of content. And it does so by using cryptographic signature. This is nothing special most tools that protect something use cryptographic signatures to do so. Tuff can do more than that. It allows you to delegate trust at scale. As I mentioned before, modern software is composed of many components owned by different entities. So you really want to be able to delegate trust at scale, so that a client who trusts the owner of food doesn't also have to trust the owner of bar and vice versa. And the separation of trust also allows you to reduce the impact of a compromise. So if the cryptographic keys for food have been stolen, bar may still be fine. Another feature that is very defining for Tuff is that it allows in-bend recovery. And it just works without the user having to know, having to look at revocation lists or anything. This works transparently. As promised, this was a very overview style primer on how Tuff works. And today, we want to focus more on what Tuff is, back to Joshua. Thank you. So Tuff is a community project. It's an open source project. It originated in some peer reviewed academic research, identifying flaws in popular Linux package managers. The core techniques were kind of developed in collaboration with Tor or the onion router to provide an application of data for that piece of software, which is focused on privacy and security is tantamount within that ecosystem. So that was some of the early work that led to Tuff. Tuff was effectively a generalization of the techniques developed for Tor so that any software system with a repository and clients could integrate this functionality. And over the years, it was enhanced with several research papers to adapt to community style content repositories. But the key point here is that this was really from the very early days like a collaboration between academia and practitioners with a very strong bias towards open source ecosystems with Tor and PPI being prominent examples. But Tuff and its concepts have been very widely adopted and adapted for multiple use cases. It's on automobiles via the uptime project. It's on IoT devices via the founders team smart devices through Google's future OS, cloud infrastructure through AWS bottle rocket, ecosystem package managers and so on and so on. And the core of the project is these three kind of symbiotic things that they're each distinct git repositories, but we have a specification which describes the processes and procedures for achieving this secure client repository update process. There's a particular focus on the detailed client workflow, which is very well defined. We have a kept or pep style augmentation process where we can propose and peer review possible additions to the specification. And then we have a reference implementation, which is an exemplary input implementation demonstrating how to implement Tuff and the primitives and procedures within the specification in practice. We're going to talk about that a little bit more later but these three projects have a very symbiotic relationship. They all feed into each other. There are other implementations of Tuff, the specification, several of which are open source, but not all of them have this same kind of interactive nature. And then the other thing that Tuff is is it's a small group of friendly, diverse, professional and considerate folks from academia and industry and very welcoming to new contributors. And so that's tough the community and I'm going to hand back over to Lukas to describe some of the things the community is working on. Thanks. So Trasva has shown us that Tuff is widely used and adopted, which suggests some maturity to some extent. Another data point for this is that Tuff became one of the first or became the first security project to reach the graduation status in the CNCF. So Tuff has been around for a long time and is in a good shape, but that doesn't mean that Tuff is done. So Tuff is a very much living project. There are plenty of avenues where Tuff continues to improve or in other words where you or interested people can help to improve Tuff. And yeah, I'm going to walk you through some of those areas. So one big part of Tuff is its specification and everyone who has had the pleasure to read the specification will agree with me that it is not immediately obvious what it, how everything works. So it's fairly complex. That means that clarifying the specification remains an ongoing effort for us. We're trying to revise the terminology, revise the routines. Yeah, we have, I have listed a couple of GitHub issues there where we are trying to make this easier to grasp. One reason why the Tuff specification is challenging is that it leaves many complex decisions up to the adopter. And this is actually a deliberate design decision and is what makes Tuff so versatile and useful for many setups that try to protect content. On our visual list is something like secondary literature with detailed deployment recommendations for different use cases and also where we can explain the rationale behind certain properties of Tuff in more detail. Other efforts include defining a mathematical model for the server and the client parts of Tuff and using formal methods to prove the security properties of Tuff. This is also something that no one has ever done but a lot of people would be interested in. And in addition to those general improvements are, is something that's called TAP development. Joshua already talked about this TAP process. This includes design, discussions, technical writing, implementing proof of concepts for the ideas in the TAPs, conducting security reviews, integrating it with the specification once the TAPs have been accepted and then adopting it in the Tuff implementations. Some ongoing efforts, some ongoing TAPs that we've been working on recently include signing with six store identities, optimizing metadata sizes in Tuff and using different mechanisms of key revocation, for instance. Next, so as I said, Tuff is not done yet. This includes the Tuff implementation. I'm talking about reference implementation mostly because that's what I maintain. We're constantly improving user experience, developer experience. We're trying to add new abstraction layers to make building Tuff applications easier. We're adding new features, providing new signing mechanisms and so on. We're adopting the TAPs that I talked about earlier. And adopting TAPs in the implementation also feeds back into the TAP process, into the designs. And then Joshua has already mentioned this, there are other implementations than the Python Tuff reference implementation. We have recently added a Tuff JS to our organization. There is a new Tuff implementation and they all have their own roadmaps where people can engage and help. Yeah, and having all those different implementations also calls for interoperability and conformance testing, which is something we could really need help with. So there are plenty of things everyone can do to help make Tuff implementations even stronger. And I haven't even talked about things that are built on top of those implementations. So the implementations or in my specific case, the reference implementation provide very strong foundations to build applications on top. And some of the applications that also people from the Tuff team have been working on are listed here. They all deal more with the repository side of Tuff. We will talk about that later. But first, yeah, we want to take a step back and see how we even got into a place where we can vary easily and with a lot of fun build applications on top of Tuff. Yeah, so I'm going to talk a bit now about Python Tuff, which is the thing you've heard us refer to as the reference implementation. Last year, we completed a significant overhaul of this reference implementation. And I think that effort was a really good example of the considerate and rigorous work that Tuff community engages in. We spent probably around 18 months from scratch to create a new architecture and implementation of that architecture for Tuff. And why would we do that? Why would we throw away a bunch of code that's been around for a long time? We had long had a sense that the code needed a major overhaul. The API had several functions which were triggering like pilot warnings for the default number of arguments. The API was frankly very sprawling and unnecessarily branchy with like lots of deep call stacks. The repository library implementation we had required all of the metadata to be loaded to an in-memory database. So for large repositories, this was the significant slowdown. We were prototyping PIPI integration and seeing like tens of minutes for operations to complete. And then the decision to kind of sunset the legacy code was really reinforced when a security vulnerability was reported and it took us around three weeks, I think, to verify the issue and develop tests to have confidence in the fix that we had created. So the thing we did is we moved kind of slowly and considerably and we really tried to make things better. And we ended up with a really nice architecture. There was a lot of contemplation and collaboration. There was some heavy lifting done mostly by Yusie who's in the front here and Lucas on stage with me. And we ended up with this design that instead of taking the specification and mapping that directly to code, they came up with these neat abstractions. So instead of just mapping pros to code, they had modeled the implicit abstractions in the code as software classes. And they also did a bunch of old fashioned software engineering. So modernized the code to use. Modern Python with standard coding styles, standard test suites, static typing. And this made the code much nicer to maintain and work with. And then there's some neat tricks in there like the repository simulator, which enables much more dynamic testing. So we've moved away from having static test files living in the Git repository to some code that can simulate the different files. And we've got this quite nice to use, very ergonomic Python API. And a bunch of the folks working on this wrote some blog posts about the details and I can strongly recommend those for anyone who's remotely interested in. I always enjoy reading about good software engineering practices and there's some good software engineering practices in those posts. And I can say that without too much kind of bravado because I didn't write most of that code, so it's other people's work. And then we made this 1.0 release and we had this much leaner code base. We went from around 5,000 lines of code to less than 1,500 lines of code. And so that made us feel better. We could read about things much more, but we were all aware of the fact that this was brand new code that we were asking people to trust for a security project. And so the CNCF very generously funded a security audit through the open source technology improvement fund, Elstiff, which was executed by X41. And the results were really great. A significant confidence boost. There were four issues you can see on the slide that were kind of deemed security relevant and we quickly addressed those and kind of added most significantly like a parameter to a shared library we used that restricts the permissions on the files on this when they're written. And yeah, this was like, we were elated with this basically. We were expecting some really negative results and they couldn't find much. One of these issues I think is interesting because it's actually an issue in Python itself that they reported against our project. So that's, give us all a pat on the back I think. And we're continuing to do development taking a similarly considerate and rigorous approach to our ongoing work, both in the library itself and using that library as a building block. I'm going to hand back to Lucas to talk about that. Yeah, one of the key realizations during the redesign of the reference implementation was that it is quite impossible to design a universal tough application. And that's because tough is a building block. And when I say tough is a building block, I don't mean the client. The client has very clearly defined workflows in the specification, which look pretty much the same for any tough setup model or some minimal custom configuration. So as a consequence of the shelf, client implementations as included in Python tough and another tough implementation can be used pretty much anywhere. It's so easy to write a client that I'll show the snippet again that Joshua already showed. Yeah, those 20 codes lines of code do tough for you on the client side. They work with any, any repository setup. Yeah, because the repository on the other hand can vary a lot. The specification doesn't really define one repository setup instead it defines primitives, it defines roles responsible for different tasks and different content in a repository. It defines metadata formats that represent those roles. And then it defines how roles delegate trust about content to each other, how they interrelate. And only a very minimal set of those roles is mandatory in any tough setup. But beyond that, there can be arbitrarily complex delegation trees with many, many more roles. And tough can also vary a lot in regard to key requirements availability of cryptographic keys. For instance, in an environment where content changes rarely and always in a very controlled manner, it might be reasonable to sign the content or the related metadata with offline keys, like cubic keys, plugged into your computer, even with a quorum of those. But this is definitely not feasible in a setup in a community repository like like PyPI where new content gets uploaded every couple of seconds by many, many different acres. And regardless, both of these setups can use tough, can benefit from tough and and its security properties to varying degrees. So the nature of the specification that provides these building blocks only on the repository side makes it not trivial to implement a generic repository application. And at the same time, not providing a generic repository application is not sustainable because adopters really need expert knowledge in order to adopt tough in their setup. Also, the right setup definitely determines the security guarantees. So you can have a correct tough setup, but still not achieve all the security guarantees that you want. So there is definitely potential for human error. And acknowledging this fact that tough is a building block allowed us to shift the mindset from to kind of an application layering. So at the very bottom, we have this low level metadata API that is, for instance, provided by Python tough. It still requires expert tough knowledge to use it to set up a right tough repository, but it like the API makes makes it very ergonomic to do so. And then on top of that, we have a minimal repository abstraction, which is sort of the clue between the low level metadata API and an application. Its usage still requires expert knowledge to some extent, but it already encourages correct usage more and makes it even more ergonomic. And then on top of that, we like the bottom two layers have made it easier to quickly write new applications that represent one specific use case of tough, but are generic enough to be used by by multiple adopters that then don't need don't need to adopt the tough repository setup by themselves. And Joshua will talk a little bit about some of those applications. Thank you. Yeah, so we're going to talk about a few of the ways we've built on tough primitives and abstracted some of the repository designs we've seen in the wild to make it easier for folks to adopt tough without having to understand quite as much of tough as they may have had to in the past. So the first one I want to talk about briefly is we have this project called repository service for tough. And this started out several years ago, it was proposed to integrate tough into Pi PI to provide kind of repository signing. And this was went through a Python enhancement proposal proposal process pet 458. And there was a flurry of work and then it kind of faulted for various reasons. So after we had, well, actually, the one of the motivating factors for our Python tough rewrite was to support this integration. So when that refactor was completed, we began the work again in earnest. And one of the folks on the team Cairo had implemented the changes and he had submitted a PR and we were really struggling to get any kind of review on it. And we were realizing as we're looking at it that these are large and fairly invasive changes and they're really difficult to review without the tough expertise and asking any kind of overworked open source maintainer to learn a complex system in order to review changes to their project is I guess, fairly offensive. So we had a challenge, how do we make these changes easier to adopt? And the design came up with effectively abstracts over the complexities of the tough repository design, tries to simplify the integration and in this case through a rest API and consolidate some of the operationalization best practices for managing tough repositories at a large scale, as would be required by PI PI. And so the result is this repository service for tough application. And this is sprung up in a relatively short amount of time and was recently contributed to the open source security foundation or the open SSF by VMware. And it's operating under the securing software repositories working group, which brings together a lot of the people operating these repositories like PI PI Ruby Gems, MPM and so on. And so we're hoping to see more interest from the working group members. And we have some future plans to implement related tough repository styles to enable things like developer signing. So an arbitrary API maintainer can kind of sign their package before they upload it. And that makes it resilient to repository compromise as well as kind of on path attackers. And then another neat project is a little bit newer called tough on CI. And this has its origin story in the six star project. So if no one, if you haven't heard of the six star project, six star is relatively new. It's about one year old, I think effort to make it much easier for developers to do kind of software signing. And it uses existing identities like your email account or your GitHub account or whatever and issues short lived certificates. And it's a fairly complex, as you can see from the diagram, like software system and the root of some of these services are these keys, which all of the clients need a copy of in order to verify like the chain of security claims in a six star system. And of course, any client with a copy of these keys need to know that they've got the latest copy of the keys and that the keys are not tampered with and that they have a consistent ring of keys, which all starts to sound like some of the properties that tough provides. And the folks working on six or made that connection. And they started coming up with this neat orchestration mechanism using GitHub actions so that they could have people contributing to the six star project from around the world all do the kind of quorum or threshold signing that tough supports through GitHub pull requests and automation through GitHub actions. And this is really neat. But it's also very hard coded for the situation. And it turns out that people want to do things like privately deploy six star to their corporations and trying to replicate that setup for their own use is tricky. And so you see here in the front realize that this could be generalized much more and implemented in a much more templated way so that it could be used by other projects of different kinds and other instances of six star. And he started this tough on CI projects to provide the root signing mechanism of six star for more general use cases. Yeah, that's another neat instance of tough application. These two systems are very different in how they apply tough with the number of things they're trying to sign and the number of people doing the signing operations, which exemplifies the versatile nature of the tough specification. So having some high quality first party implementations of tough repository models is really helping the project understand and the users of the project understand where this thing can be used and how it can be put to use. Okay, and back to Lucas, do you conclude I guess? Yeah, this concludes our talk pretty much to recap. We heard about tough providing building blocks for securing the last mile of the software supply chain today. And what maintaining tough involves, that's combination of security engineering, collaboration, project management, technical writing, software engineering, all sorts of skills are required and are appreciated if donated to our group. Yeah, tough community is small but very passionate, very engaged and definitely welcoming. And I can say from my own experience that it's very rewarding working on those projects with very skilled people. I have learned a lot. I've learned a lot there. And yeah, I hope to see more people. We have a couple of coordinates here on the slide. Most activity happens on GitHub under the update framework umbrella. We have a home page, the update framework IO, where you'll find more information about the whole project. Communication mostly happens on our CNCF Slack channel, hash tough. And we have a fairly low volume email address where we also announce our monthly community meetings. We haven't uploaded the slides yet, but they will be available later. So yeah, hope to see you somewhere on these platforms or in our next community meeting. Thank you. And if you do have any questions, we've got like four minutes. Yeah, and we will also hang out here a little bit. So maybe we have more than four minutes. Thanks. Maybe I'll ask you a question. And anyone hearing about tough for the first time today? Yeah. Yeah. Yeah. So the first user of our stuff was effectively integrating it into their CLCV pipeline to make sure that the artifacts they were producing through that process were kind of securely delivered to the orchestration system. Pulling, when you're pulling in a bunch of the internet as well, it does get a little bit more complicated, but you could certainly, you could use like our stuff to effectively, once you pull the thing in, you have the, you get the protection that the thing isn't tampered with internally or in transit from your, an internal mirror effectively. Absolutely. Yeah. Sure. I mean, there's trade-offs there. But this is one of the reasons why we contributed this project to the OpenSSF. And we are talking to the folks that, so some of the people working on Maven, like folks from Sonotype who work on Maven Central, they are a part of this securing software repositories working group that is sponsoring this project at the OpenSSF. And that's certainly like our goal is to have groups like that either adopt this software or, you know, an implementation of it. If, you know, some people are quite particular about, I don't want to take a Python dependency from my Java ecosystem, for example. So, but that we hope, this project is very well documented and we hope that we have both an implementation that folks can use, but also kind of an architecture that folks can replicate if they don't want to take the software that they need. So it's, it's our goal to get this integrated into repositories like that. Well, yeah, we'll hang around if anyone does want to ask a question without the microphones would be just out there. But otherwise, yeah, thanks for attending.