 All right. Hello everybody. I'm Marina Moore. I'm a PhD candidate at NYU, and I've been a maintainer of the tough project for the past five years or so. And this is my colleague Trishank. Hello. Trishank Ardekopasamy, staff security engineer at Datadog, have been involved with tough and various capacities since my research on it at NYU. And now using it at Datadog to secure some of our own products. Marina, please. Yeah, and we're going to talk today about secure transport for your software supply chain with tough. So we'll talk a bit about what tough is and then some various case studies about how it can be used to secure transport of different elements in the software supply chain. First of all, I'll start with a quick definition of what a supply chain is, just so that we're all on the same page. A software supply chain by this definition is a collection of systems, devices, and people which produce a final software product. And there's an example here of kind of the different steps that happen in a software supply chain, things from source code to testing, to building, to then deployment of that code. And there can be other steps in the chain as well. There have been a number of attacks on supply chains in recent years, unfortunately. Just to motivate the problem, here are a few examples. Everything from dependencies to repositories themselves to source code being attacked. These are just from the past couple years. I won't go into detail about all of them. And at the bottom of this slide, I do have a link to the tag security catalog of supply chain compromises, where these came from and which have a lot more detail for folks who are curious. The main point is this is an area that attackers have discovered, and so we want to make sure that we also are protecting it. And today in this talk, we're not going to focus on the whole thing. We're going to focus really on this last step. Once you already have an image that is built, you have a packaged piece of software. How do you then distribute that securely to the person who's going to run it? And a threat model for this, for this talk, is attackers who can access a couple different pieces of the system. They can perform a man-in-the-middle attack on the network, which means they can read the traffic all to the traffic between the repository and the user. They can compromise keys used to sign updates, so the keys themselves, they can get lost, computers can get stolen, all that kind of thing, or compromise the repositories or servers themselves. And given this environment, we want to be able to protect the freshness, consistency, and integrity of our software. And to do this, we use this concept of compromise resilience, which means that even in the event that all of these things we talked about, these repositories, keys, or developer accounts are compromised, we want to reduce the impact of the compromise and allow us to cure recovery, even if and when these compromises do occur. How does Tuft do this? So I'm going to do a very quick overview of kind of the architecture of Tuft that the key pieces to understand, and for more detail, I do encourage folks to look at our specification or other kind of more detailed intro talks. So do a quick intro so that we can get into some fun case studies. So for this design, we're going to start with a particular package. We're calling it Foo, that someone wants to distribute securely. And also, this doesn't have to be a piece of software. We'll get into this more in the case studies, but other things could also be securely distributed this way, which should become clear as we talk about it. So anyway, the first thing we want to do is protect the integrity of this content. One kind of way that this can be done is by doing a cryptographic signature on the content itself. The problem with attaching the signature to the content, it's kind of two big problems. First of all, if you attach, if you put this bundle of signature with the content, and then you want to sign it a second time or change the signature, a change the signature changes the whole contents. So these signatures kind of have to stack in a funny way. So what we do instead is have a separate object that actually includes the cryptographic hash of the thing that we're protecting, and then there's a signature on that instead. So we have this Foo.json, which the Foo developer has, and they say, okay, we're releasing Foo 1.0, we're going to list it in this metadata file, sign it with A. The next problem we have is that, so we now can distribute Foo securely by distributing the signature, but how do we actually know the trusted key A? Like how do we know who this is that's supposed to be signing this Foo metadata file? And what we do is we add this thing called targets metadata in TuF, which points to the targets, the different packages, or the different metadata about the packages, and it lists the trusted key for each package. In this very simple example, we just have one package. This is a very small thing, but you can imagine this listing, you know, tens or hundreds of different projects as well as the keys that are used to sign those different projects. And that of course is also signed, and there has to be a bottom-channel sometime, right? So if you keep signing things, saying who else is signing things, we eventually have to figure out where this all is coming from, and the way we do this securely in TuF is with something called multi-signature trust. So this root of trust, our root metadata file in TuF, is signed with a collection of root keys, and these root keys can be the Yuba keys or offline keys, HSMs, that are stored in different physical locations. I've protected that way to kind of give a maximum root of trust, and this one, this particular metadata file can change less often, right? If anything needs to change with who's supposed to sign what, this metadata itself doesn't need to change, because it only indicates B, which then signs for A, and so on. Okay, so what happens now if this food developer wants to release another version of Foo? Foo 1.0 is still valid, and they still want to, people still can use it, but they also want to say, there's also this new version through 2.0. So they can add it to the the metadata file. And so on. The problem is, how does a user know if they're getting the most recent collection? If they want the most recent version of Foo, how can they guarantee that they're getting a food.json that includes the most recent version that they want to download? And to ensure this, we have what we call repository consistency. So we put version numbers on each of these metadata files, included in the signed metadata. Targets.json, we haven't changed since we created it in this presentation, so that's V1. We made one change to Foo, so that's going to be V2. We then list these different versions in what's called a snapshot, and sign that with a key listed in the root. So this is also similarly protected, and then we know if we check this snapshot and all of our versions are listed in this snapshot, we know we have the current state of the repository at this time. We do want to also make sure this is current, right? So we have finally, our last element is freshness, which is just another metadata file that has the current time and a hash of the current snapshot to make sure this is all currently up to date. This is our very fast overview of TuF, and hopefully that's enough to get started, and we can learn more about some of the details of this as we go. The TuF project is this one specification that describes all that stuff I just talked about in a lot more detail, as well as the implementations in a variety of different languages, including Python, Go, REST, etc. And more than 20 deployments by different, both companies and open-source organizations. A few of you are on this slide here. And I'll do some very quick project updates in case you've heard of the project before, what's new, what to look for. In the past kind of like five years or so, there's been a lot of work on the client side of TuF. So basically there's two sides to TuF, right? There's this repository that creates all that metadata that we talked about, and there's a client who actually verifies all that metadata and makes sure that all the correct things are signed by the correct people. And so there's been a lot of great work on the client side. And so in the past year or so, we've kind of shifted to doing some great work on the repository side to make this super super easy to deploy in different cases. So two projects that I'll mention are RESTuff and TuF on CI. RESTuff really focuses on kind of this high volume case, and it has a lot of great features for scalability, inspired by kind of the Python packaging index use case. And it's used there as well. And then TuF on CI kind of is a different kind of implementation that focuses more on this high security, low volume type of thing, which is more about actually distributing things like more like using it as a route of trust, distributing other keys, kind of smaller scale repositories. And for example, the SIGSTER project is looking to transition to TuF on CI for an easier way to run that. Next we have Gotuff metadata, which is a rewrite of Gotuff, inspired by some of the work that happened in Python TuF before, just to make the implementation a lot more readable, maintainable, usable. So if anyone here has been using Gotuff, this would be a great call out to look into that, give us some feedback, let us know how that's going. We're going to start transitioning over to supporting this more full time than Gotuff. And finally, I'll call out Gotuff, which is not exactly a tough implementation, but it's an implementation inspired by TuF to use TuF-like concepts to secure a GetRepository and manage things like, do things like key management and trust management for GetRepositories. I'll also highlight that RS TuF, TuF on CI and GetTuff are actually all incubating OpenSSF projects now, so that's pretty exciting for all of them. And finally, I'll talk a bit about some new proposals to an enhancement to TuF. We have this process called TAPs, or Tough Augmentation Proposals, where we propose, talk about, and then implement new features in TuF. This is a quick list of all the ones that are currently active, and I'll go into a couple of the new ones just to kind of see some of the things that we're thinking about as a project. First of all, I'll talk about TAP 16, which is Snapshot Merkle Trees. The idea here is to make the snapshot metadata in TuF more efficient at scale. So if you remember from the brain dump a couple of slides ago, Snapshot Metadata in TuF lists the name and version number of every different metadata file in a TuF repository, which when there's like 10 metadata files is totally fine, but if you start having millions of these different metadata files in a single repository, clearly this linear scaling starts to get in the way. And so the idea here is to use a Merkle Tree to still tie all the different, like to still have a single snapshot, like a single tree that encompasses all of the different versions, but make it much more efficient for a few people to actually download and verify the information that they need. Basically it's logarithmic instead of linear, is the short version. And we have a quick graph from some experiments that show that this is true. There's an extra information there, but basically the line at the top is without using this, the line at the bottom is with using this. It makes a smaller snapshot. And then next I'll talk a bit about top 18. The official name is ephemeral identity verification using SigStores Fulcio for TuF developer key management, which is a very long way to say that we're going to simplify developer signatures. Basically right now in or before this tap I guess in TuF, each developer who's signing one of those metadata files listing all those few images has to have an actual cryptographic key. Either they have to have a UV key or they have to store this on the computer. They have to keep it secure over time, which can be tricky for developers, especially open source developers who don't want to have this extra overhead. And so the idea with this tap and with Fulcio is that instead you use an ephemeral key tied to an existing identity, something like a Gmail address or your GitHub identity that you as a developer already have, and use that instead to sign your updates. So there's a lot more information about how that works in this tap. And finally I'll talk a bit about tap 19, which is for content addressable systems and TuF. To use the TuF model in systems like Git, IPFS and OS tree. And the interesting thing here is that there's certain security guarantees that already exist in a content addressable system. And so this really looks at how you can combine those with TuF in a way that's both secure. You get kind of the union of the security features while simplifying some of the TuF elements because you don't need the same security property twice. And there's a lot more details in that tap about how these things interact. And next I'll hand over to my colleague to talk a bit about some case studies about where TuF has been used in practice. Cool. Thank you. That was great. Yeah, so as promised, let's talk about some case studies of where TuF is being used, especially in, we hope, interesting different use cases and contacts. So one of the first ones, Uptane, one of my personal favorites, if I may, you can actually make tough work for software updates for cars. And I don't have time to go into details, but let's just say that installing software, updating software in your vehicles was not the same as installing software, say, on your containers. One key difference, for example, is that in your containers, you have some control, even if it's through your package manager, you do the so-called dependency resolution yourself. Not so for automotive use case. Just to give you an example, you and I could have the same exact make and model of the vehicle, but because you paid for a premium package, let's say it's full self-driving, same vehicle, same hardware, but you get different software than I do. And so the key idea here is that you can use two repositories to do this, because if you think about it, what chooses which software gets installed and which vehicle? As I like to say, it's a robot sitting in a cloud in the sky, and you don't want to give the robot the power to make up what software gets installed. The short of it is that the robot only has the power to choose software that has been signed off by human beings. So it has the power to choose what software is installed, but it is constrained to being able to use only software that has been authorized by human beings. Anyway, there's some interesting details here. So if you're interested, basically the idea generalizes to anything where you require fleet management, not just vehicles. So that's Optane. Another interesting use case is actually to secure US legal documents. There's a version of Tough Call, just like there's a version of Tough Call Optane, which actually technology is upstream back to Tough. There's a version of Tough Call, the archive framework, or that for short, partly led by Justin Capos, who also happens to be one of the creators of Tough. And so the idea here is interesting. So as Marina mentioned earlier with tab 19, Tough can be used to sign off on Git commit trees. So the idea here is that different jurisdictions would manage their own Git repositories for legal documents, and clearly the commit tree evolves over time. What you're using Tough for here is to secure this yet another Git repository called the authentication repository, and that would point to the latest heads of each of these jurisdiction repositories. And that's secured by Tough. So different pointers to different Git repositories over time. So that's Staff for short. And if you think about it, you can generalize it to Git repositories in general, not just to secure US legal documents, but anything you like. So in this case, as Marina alluded to earlier, there's a project on OpenSSF called Git Tough, and the lead researcher and developer happens to be here in the room, Aditi Asaki. It's based partly on solving a problem that a sister project was also in total. The lead researcher and developer who's Santiago Torres Arias wrote an entire paper about how there's all this attacks against Git repositories. And so Git Tough is designed to solve some of this problem. Suffice it to say that just signing a Git commit is not going to solve all of these problems. So we look here. What Aditi has done here is that he signed off a policy. It uses the idea of delegations from Tough to say that anything, any changes to your main branch has to be signed off by keys belonging to Aditi Asaki. And in this case, he's actually using ephemeral keys, a short-lived keys. What you're actually binding to here, it has to be the keys have to belong. The keys could change as often as every 10 minutes, but the keys have to belong to Aditi Asaki. So that's an interesting application of Tough, inspired by Tough, shall we say. And speaking of short-lived keys, there are many ways to do this. One of which is open pub key. We don't have time to go into details, but it's an interesting way to distribute short-lived keys, which remember could change as often. In this case, let's say it changes every 24 hours. And the idea here is that you would use OIDC. Let's say you're using Google as your IDP provider. When you authenticate, and say Aditi earlier, authenticate himself, let's say Google or GitHub, either way, when he authenticate himself, he would use the same protocol to bind his ephemeral key at that time. So when he gets the ID token back, the IDP would also have signed off saying not only is this Aditi Asaki.in, but indirectly here's also his ephemeral public key at that time. So that's open pub key. But for our use case, the interesting bit is that Docker is planning to use not only open pub key to realize short-lived keys to sign their official images, but now how do you solve the problem of knowing which OIDC identity should be trusted to sign which images? And this is where Tuff also comes into play. So they're planning to use Tuff to sign off on the so-called trust policy, where you can say different images must be signed off by different OIDC identities. And from there, you can find out which ephemeral keys you want to use. Speaking of ephemeral keys, many of you might have heard of Sextor, which is yet another sister project. Sextor is a collection of a few different tools. So you might have heard of certificate transparency for TLS. What is the idea there? The idea is that you have a transparent log, or it's also known as a tamper-evident log. Just think of it as a pen-only log. It's a data structure where you can only add things and that is verifiable. If someone tampers with it, they would get caught because they're being independently audited all the time. What can you do with such a data structure? So it's being used to record the history of every TLS certificate ever issued. It's worked very well for Google and friends. And now Sextor takes that same idea and extends it to software artifacts in general. Okay, how is that relevant to us? Sextor, as I mentioned earlier, uses a few different technologies under the scenes. So there's at least two different transparent logs. We don't need to go into details there. The point is they have at least two different transparent logs. They have the Fulcho key server, which is another way to do ephemeral keys. The point is each one of this has its own key. Remember what Marina talked about earlier is that if you have so many different keys, how do you distribute these keys securely? How do you rotate them? How do you revoke them? Well, as you may have guessed, and they're tough. So behind the scenes, that's what Sextor has been doing. They have their own tough metadata where you can see through a single set of root keys you can revoke and rotate, transparently rotate. Your users don't have to be aware of how this all works, but transparently rotate the keys to the entire system even 10 years from now, using the root keys of today. You could change even the root keys itself. So, you know, root key for each transparent log and Fulcho and so on. And they have this tricky problem of solving the tough root key ceremony for it, not just the root keys actually, but also distributing all of the tough metadata. It's been through several iterations. So if you think you have a similar problem where you need to use tough to manage and distribute keys for your own system, dozens, maybe even hundreds of keys, you can take a look at how Sextor did it. Here, there are five key holders for Sextor, distributed all over the planet geographically, different time zones. Marina happens to be one of the current key holders. And so you can go talk to her also about how they've solved the ceremony. The plan, as Marina mentioned earlier, you could use this sister project called Tough and CI. The plan is to use that to make some of these things easier. Moving on, let's talk about how tough plays well with different sister technologies. So at Datadog, as some of you may have used, some of you may be using right now, it's a monitoring and observability platform, or you can use it to monitor your infrastructure, your applications, and your services end to end. And at Datadog, you install the agent as part of it to collect metrics and logs and so on. Now, this agent has got hundreds of integrations out of the box to talk to known applications and services, databases, right? And the way we install and update the integrations is secured by Tough. It actually uses three different technologies. No time to go into details. But we use a sister project called Intoto to solve the supply chain security problem, where every integration that we build has to have been signed off by developers themselves. In this way, we trust but we verify how our CI behaves in packaging these integrations. And we use Tough as the secure transfer protocol, the team of the stock. Tough as the secure transfer protocol for software artifacts. We use it to securely bind Intoto policies, which evolve over time. And the same trick that SixTor does, we also use Tough too. We've done this several times now where we rotate the keys to the entire system and no one notices. And more recently, we gave a talk about it at another KubeCon talk five years ago in Seattle. And more recently, we talked about, so we have this Intoto, we have Tough, and we added SixTor on top of it, remember transparent logs, so that we record in an immutable way the entire end-to-end history of how every single data-dog agent integration was ever produced. So now we can say, why don't we take the same technology, the same tech stack, and apply it to open source registries in general? And this is what we're beginning to do with registries. Marina and I happened to be co-authors of a Python enhancement proposal, PEP 458 in short. The idea is, how do we begin securing open source registries? And this is the beginning of it. It proposes using Tough, where in the beginning, registries such as PyPI will sign all packages on behalf of developers. This is better than TLS in the sense that you can securely recover from a compromise, and that users who have not installed packages that have been tampered with, they can securely rotate to a new set of keys and move on. The takeaway point here, though, is that previously, when we tried integrating Tough, it would take at least a few thousand lines of code. You not only had to be an expert in how the open source registry, such as PyPI, works, you also had to be an expert in Tough, not ideal. And thanks to our friends, remember Marina talked earlier about RS Tough, repository service for Tough, we can abstract it away where it becomes a simple matter of integrating with a few hundred lines of code, much more manageable. You can completely abstract it away as a bunch of configurations where you tell it the shape of what you can choose the security model and basically abstract it away. You don't even have to worry about how Tough works. And it takes care of things like scaling and key management for you. So from a few thousand lines of code to a few hundred lines of code. And we're pleased to say that people at RubyGems have also experimented and achieved similar results. There was a previous PR from our friends at Square, actually, who helped us to try to integrate Tough with RubyGems a few years ago. Again, you can see similar results. A few thousand lines of code to a few hundred lines of code. So particularly shout out to Cairo and Josef Siminek and friends who've shown very impressive results. And to end with the overarching theme of the talk, Tough as the secure transport protocol for supply chain. How would that work? So we propose this idea that we call robusto in general. Think about it. We have mRNA vaccines for COVID and so on today. If you were to take a vaccine but you didn't know who made it, you didn't know whether anyone tamped with a bottle, opened it and closed it back, you don't even know the ingredients in it. Would you take it? Presumably not. But we do this with open source software every single time. So we can do better. And the proposal here is to use three sibling technologies in supply chain security. One is in Toto, which you can think of it as a manufacturer, a vaccine manufacturer telling you how they made the vaccine, what ingredients they used, how they composed it, and they have their own seal of integrity and authenticity. Tough then is the secure transport protocol on top of it that does things like, well, yesterday we trusted three different vaccine manufacturers, but today unfortunately we have to revoke one of them, and today only two are good. And so you can update them. Tough basically becomes this intermediary of trust. And finally, last but not least, you can record using transparent logs such as six store and you can even use short live keys to do it. You can record the entire end-to-end history of how every open source software artifact was ever made. And the idea here, I don't have time to go into details, but we presented a talk about it called Robusto at PackagingCon recently. But if you use stuff in a repository such as RubyGems, the key idea is as follows. You basically divide packages in the beginning, remember with things like PAP458, in the beginning to make things as simple as possible, there's a trade-off. You don't require developers to sign their own packages. The repositories would do it on their behalf. But the trade-off then becomes that repositories have to do it in a scalable online manner. Machines have to be able to sign new packages at any time. So if you compromise the repository, all of these packages are at risk. But here's the good news. We did studies a few years ago where we found that you can slowly but surely move packages here. And if you think about it, there are simple policies that can give you a lot of bang for the buck. So take the few top, few hundred top critical open source packages, five minutes left. Okay, good, thanks. If you take the few critical open source dependencies and move them here, just a few hundred, out of tens of thousands, if not hundreds of thousands of packages, just take the few top critical few hundred of them, move them here, secure it. At least 80% of your downloads are protected. So again, no time to go into details, but note something interesting here. Through a union of each supply chain security technology is good at some things but not everything. But if you put them together, you get this very strong property that your users might be looking for. And so you can come talk to us after the talk about it. Anyway, let's go into a quick demo. I don't have too much time. But anyway, this is the RS stuff demo that we talked about earlier, where Cairo is using RS stuff to upload a new version of RS stuff itself. It's a bit meta to a staged version of IPI here. So it's using twine. You can see twine is the tool that Python developers use. You can see there's no change to them. It's as if there's no difference with or without tough. So the user experience for developers and users remains the same. So he just published a new version, a beta version of RS stuff. And now what he's going to do is use pip, which is the package manager for Python packages, and install it. And there's no difference from the user experience too. But they're being protected by tough RS stuff behind the scenes. They're just not aware of it. So both developers and end users are not aware that they're being automatically secured. I'm just going to speed it up here. This is where things are good. If you go into verbose mode, you can see that there's tough metadata here. But let's move on to the tough metadata. Let's move on to an attack. So say a CDN has been compromised that PyPI uses. So the CDN, now, someone hacked the CDN, they're tampering with one of these Python wheels. So they're trying to pull a fast one. Will RS stuff catch it? Let's see. I hope it works. Yes, yes, it does. So the length wouldn't match, the hash wouldn't match, the signature wouldn't match. You get the idea. Anyway, I'm going to move on from the demo because we're running out of time. So thanks. Thanks very much for your time. We really appreciate it. We have two minutes left. Before we move to Q&A, tough wouldn't have worked without the hard work of many, many people for which we don't have the time to acknowledge. But thanks to all of you. And if you have questions, we're happy to answer them. I don't know how loud this is going to be. Okay. Yeah, thanks for the talk. I saw that you mentioned that there's going to be an update to the Go API for tough. I also know there's a Go tough CLI associated with the old repo. Do you know if that's going to get updated as well? Yeah, we're definitely talking about that. I think we want there to be a CLI. We might move it to a separate project so you can maintain the core functionality, core API separate from the CLI. But there's definitely interest both from the maintainers and the community in having the CLI. So yes, it's a short answer. Thanks. So great talk. I just want to semi-correct one tiny thing, which is the TAF work is really a lot of work by David Greason, Renata, Radana and others out of the Open Law Library. BJ, whose picture was up there along with mine, we just get the credit because our universities have better PR departments than in like, you know, startup or nonprofit that does this amazing work. So we're really blessed to work with them and really proud to be protecting a lot of different local community governments that are using this. Absolutely, thanks for the correction and that's one of the creators of TAF, by the way. Cool. I think we're on time. Thank you very much again. Thank you.