 Hello everyone, my name is Marina and I am a PhD student at NYU and I'm going to talk to you today about some of the work I've been doing for secure software distribution. So why do we care about secure software distribution. This is an attack factor that's being used a lot, especially recently to distribute malware to large numbers of systems all at once. This includes the recent solar wind attack, which relied on software updates as well as a wider software supply chain attack to distribute that malware. In addition, the malware was distributed through Microsoft Windows Update, not Petia, which caused all kinds of outages in hospitals and major companies also spread through software updates. More close to this community, the Docker Hub password database compromised was part of a Docker Hub compromise. And many images were compromised there, although this was very much mitigated by an earlier version of this work. In addition, all these attacks have huge economic damages, including this one example where malware distributed over software updates in South Korea caused almost $800 million of damages. So what are we going to do about this. So in the pattern in this space I've been working on the NodeV2 effort, which is an effort to create a more broadly used secure distribution system for registries by building on to the work done in Notary and addressing some of the limitations of that system. These include avoiding trust on first use for keys and really consolidating the key management to make it much easier to use and use securely. Using metadata directly on repositories to avoid users having to run additional services in order to use secure distribution. This space is especially interesting from this perspective in the way that it's different from traditional software repository systems in that we also have to secure ephemeral clients which don't have any state on them that we can use to store information to be used as part of this update process update or distribution process sorry. We also have shared repositories with private data, and you don't have this private data to leak to other users through the use of metadata. And finally we have this idea of scalability, where some registries have have millions of images and want to make sure that the security solution we have scales to registries, even when they're that large. So we have a threat model as you know the place to start with how to design a security solution. So that that model we looking at these previous compromises that I talked about we kind of make this assumption that it's not a question of, you know, if your repository or your registry will be compromised. It's a question of what will you do when that happens and how protected we be when that happens because even these companies with huge, huge security budgets and people focusing on this this still happens to them. And so we want to make sure that even even when this happens were protected. So you assume that the attacker can compromise some but not all signing keys, and that they can have control of the registry for some period of time. In addition, the attacker is able to watch traffic to or from the registry and alter this traffic. And the goal in the system is to prevent users from installing a package that doesn't have a currently valid signature. This includes of course arbitrary software installation or when the attacker can install arbitrary malware onto users machines, but it also includes things like rollback attacks where the attacker is able to convince users to install a previous version or version that no longer has a valid signature, or maybe was has a signature with a revoked key any of the situations we want to avoid, as well as a few more subtle attacks and ways that the attacker can try and convince the user to install a non optimal version of the software. So the first approach that I'll talk about for how to secure software distribution in general is using cryptographic signatures. And these are a piece of probably any system for security distribution that you'll see but this is kind of looking at looking at systems that kind of focus on the cryptographic signature aspect. So what these signatures do is they attest that the person with the private key who signs an image has access to this private key, and they attest to the contents of the image. In TLS these this is often done through keys that are stored on a repository on a server somewhere, and then images up in there, and the user can make sure that that the image that they're downloading the same one that's hosted on the repository. So this can also include keys that are controlled by individual developers or individual, you know other machines that that develop code, like CI CV systems. And, and then these individual developers I'll just call them developers for simplicity can sign that sign an image or artifact locally, and then upload that artifact and the signature attached to it up to a registry. So this approach is that whoever has control of the keys in the system has full control over the system, and can, and can sign arbitrary software arbitrary malware, as the case maybe, and given to users to install it. So if the, if these keys are on a repository or a registry. I'm using those terms a little bit interchangeably, but it's basically the whatever server you're hosting your code for the purpose of this talk that's what I mean by either of those terms. But if an attacker compromises this repository, there and there's an online key on this repository, they're able to sign any any images and give them to the client. The developer controls these signatures and attacker and an attacker compromises your developer key, they're also able to sign arbitrary images. And this is in part because users don't have a good way to know which signatures to trust and which signatures to trust for which particular images. And so they often have a key ring. And if any key in that ring is compromised, it can be used just to sign arbitrary malware and they'll then trust and install it. In this system, there's often no good ability to revoke keys and ensure timely revocation of keys. There's a quick summary of those attacks that I just talked about. If a developer key is compromised, the malicious developer key can be trusted by all users, including even, you know, to sign malware. If a repository or registry is compromised, the attacker can alter content and show older versions of images that were previously signed by developers, even if those developer signatures are no longer valid. If a mirror is compromised, it has pretty much the same abilities as a registry to alter content or show old versions. And then if the attacker is acting as like a machine in the middle or on the network, they can save and replay old signatures even after a vulnerability is discovered or the software is otherwise no longer signed. So next I'll introduce the update framework or a TAF as we like to call it, which is a framework for secure software updates that I participate in the research and development for that was designed with compromised resilience in mind to get address some of these issues. As part of that key revocation and delegation are first class primitives they're really built in from from the bottom to make sure that they're always used and they're always easy to use in any of these compromise situations. And because of all the additional security focus on security. We also just we also in in TAF focus on this idea of invisible usability, which means that even though there's a lot of security going on behind the scenes. We don't need to know exactly what's happening, unless something goes wrong. So for the most part, they all they know is that they're downloading software and it's being verified. They don't have to specifically do a lot of steps. It's all part of an automated process, trying to make this easier for both developers and users of software. And using TAF and I'll go into a lot more detail about how tough works in a minute but I just wanted to summarize those same attacks. So if a developer key is compromised and in a system using TAF only the files that that developer key is specifically like the artifacts that are specifically assigned to that developer could be compromised. Using this developer key and that's only if a threshold of developer keys are compromised and I'll go into what that means in a minute and in addition developer keys can be revoked by more trusted roles at any time. If a repository or registry is compromised in a system using TAF, the attacker would be able to to a limited extent show old metadata. But this is mitigated a little bit by a client's verification that any metadata they see is newer than metadata already on the system. And this is also recoverable using delegation for more trusted roles. For mirrors, the mirrors actually don't have any trust in a system using TAF and so the mirrors aren't able to change any content without detection. And then somewhat similarly a machine in the middle or attacker on the network is not able to alter anything and any old signatures will be detected by the clients. And so they can't be used to install old versions of the software. So how does stuff do this so it uses a few principles in order to achieve kind of this level of security. The first principle is this idea of separation of duties where any one person or anyone signing he is only trusted to do a certain set of things within the system. And the way this works is it starts with a root of trust or root role, which serves as a root of trust for the system. And this role then delegates or provides the public keys for some other top level roles. These are the timestamp role which provides a notion of timeliness, which ensures that, for example, invocations and metadata are always timely and consistent with what's currently on the registry repository, a snapshot role, which ensures consistency of images. And then top level targets role, which is where you start to provide actual information about the artifacts and images on the registry. And this targets rules also able to provide further delegations to other individual developers or teams to prevent any key sharing even at that level to make sure that each key is only trusted for the specific piece of the system that it is a testing to. Next we have threshold signatures. The idea here is that, especially for more high security roles. The role isn't trusted unless a threshold of signatures all signs the same piece of metadata for that role. So in this example here the targets metadata isn't trusted, and that's three out of the four keys for that role have all signed the same metadata. So let's just make sure that if they're different people trusted the role they all sign it and it takes it would take in this example three key compromises for an attacker to be able to sign arbitrary targets metadata. Next we have explicit and implicit your location of keys. So the implicit your location of keys is just that all keys in the system have a time, a timestamp, and they all they're all the expire after a certain period of time. The explicit revocation means that any higher level role in the system can explicitly sign you metadata that removes a signature that removes a public key for a lower level role so any key in the system can be explicitly revoked at any time. And users and because of the notion of timeliness users would know right away when a key has been revoked, and they'll be prevented from using a revoked key in the verification. And finally, tough minimizes the risk using offline keys. So for especially again these more high security roles, especially the root role and also the top level targets role. Tough encourages users of the system to use offline keys for these roles, because as we've mentioned one of the assumptions in the system is that you know your servers will be compromised at one point or another. These keys are not on any servers if they're just, you know exist physically in some lockbox somewhere, an attacker using just the internet can't possibly compromise them. And they would need to do some kind of you know oceans level and oceans 11 heists kind of, you know, physical attack to actually get access to these keys, which just really increases the security of your system. So together, when there's a compromise of your system, tough protects it using a combination of all of these properties. So the timestamp and snapshot role are both on the registry or repository. So if the registry repository is compromised. These two roles would probably also be compromised. But in this situation, the actual targets roles and the actual keys used site images aren't compromised. So the attacker isn't able to change any of that information. And also the root role is able to be is can be used to revoke the time step and snapshot roles and kind of reestablish trust in the system once you get back control of your registry without any manual intervention on the client systems to kind of Reestablish trust it's all kind of automatically done after the attack. And then if any single developer key is compromised, only that one package with that one thing that the developer was trusted to sign would be compromised. And again that can be revoked by any of the higher level targets keys, although we have to the root role which could just revoke all of them if need be other probably a lower level one should revoke at first just to prevent the overhead there. In addition to all of these kind of existing features of tough and kind of this whole kind of philosophy around secure software distribution, in order to kind of adapt this more for the new to effort and the container registry specific scenarios. We have a couple of new features that I'm going to talk about here today. So the first of these is client pinning of targets keys. So the idea here is to reduce trust in the registry by allowing the client to define the public keys that they would like to trust in order to sign specific targets files. This means that even the root role on the registry repository itself doesn't have won't be able to override this without the client knowing about it so the client will know. And this can always change whenever a new developer is signing it whenever something changes like that. And this can be especially useful in open source projects where you want to keep track of who is currently signing for this release. And also just you want more control over, over the process. It's also good for unlisted packages so kind of these private packages that might not be covered by the registry is top level targets better data, but it's still kind of listed on the registry. So we provide the client away to list public keys for those files as well, and kind of work with the security measures of the top level roles of tough the root time stamp and snapshot roles but provide kind of a separate chain of trust for specific targets. Another new feature for no degree to that we've been discussing is this idea of succinct hashed in delegations, which is kind of a way to reduce the size of delegations, especially, this is especially useful for larger public registries where a lot of the packages are signed by the registry are not offline by developers. And in this case that the registry can sign can automatically sign for these images using online keys by by separating the packages into bins based on the hash. And this just reduces the size of the metadata when you're doing a lot of online signatures for event for those really large public registry use cases. There's even more about tough we have our, our website, as well as the specification which goes into a lot more detail about how the all of the aspects of the system work. There's also reference implementation for tough, which you can find from the website or you can contact me and I can get that to you. We're also available on the CNCS slack. One of the channels one of the tough channel as well as the Python tough channel, where we talk about, you know, the reference implementation specifically. For the no degree to this is an ongoing design process so if you have any interest in secure distribution this is kind of I think a good great place to get involved. We were on the CNCS slack and a lot of the work that I presented here is included in a tough prototype design piece which is is there's a link to here. And yet we love to work more with, with folks and see how we can solve all of our use cases and get everyone's packages signed on registries and so feel free to email me or contact me on the CNCS slack. And I will be available live for questions. Thank you everyone.