 Yo, hello, I'm Trishan Kartikovisami from Datadog. And I am Santayat Taurasarias from Purdue University. And I'm Asar Ali from Google. And we're all here today to talk about a state of the art of supply chain security. But first, let's set a little background for the problem. Why are we all here talking about this? So at Datadog, we have the agent which some of you may have installed. It monitors your infrastructure, your apps, your logs, and so on. And there's hundreds of integrations that come out of the box. It gives the agent extra superpowers, as it were, to measure even more stuff. And the problem we had was that we needed to decouple releasing the agent separately from the agent. Because we wanted the customers to be able to try new versions of the integrations, try bug fixes, and so on. And so how do you normally do this? Well, you use CI CD. As many of us do here, I suspect. And so there's a lot of benefits to using CI CD. It's much less error-prone than human beings in building, packaging, and signing your software reliably. And trusting this robot, as I like to call it, sitting on a cloud to do this for you. Well, and so there are downsides. 99.999 whatever percent of the time, nothing can go wrong. Everything is good. Life is as well. The problem is that a million little different things can go wrong. For example, your developer could have a key compromise. Or your VCS could get compromise, the source control system. Or your CI CD itself could get compromise. Or the image registry you use to run your CI CD jobs. Or your key and file service that you use to store your artifacts and your signing keys. I think you get the idea. The point is that, like I said, a million little different things can go wrong. And the problem with this previous idea is good old-fashioned CI CD by itself is that there's no compromise resilience. One single attack and the whole game is over. It's an all or nothing proposition. So what we want to do is that we want compromise resilience. So it's probably impossible to build an imprenetrable fortress. But what you can do is build layers of defense around the defense in depth, as some of us call it. And so for example, you could think of building your medieval castle here with layers of mode surrounded by alligators and guys with plain provas just for extra measure. So attackers look at your infrastructure and say, you know what, that's not worth my time. I'm going to go get this other easier target here. And so we have this idea that we would like to tell you about today, something that we call transparent compromise resilience. And it uses these three pieces of open source technology. And so the first piece of technology is called endotto. And that's the bit that secures your software supply chain from your developers down to all the way in your CI CD. The second piece of technology is called tough. And it is the compromise resilient distribution of your software supply chains as well as the associated artifacts. And the third piece of technology is the six door, which then gives you transparent compromise resilience. So you can see the history of every single little thing, how every single little artifact you ever released was produced from end to end. And people can even audit this stuff for you. And so the metaphor that we like to use, I hear vaccines are very popular these days. A metaphor we like to use to try to put this in more concrete terms is to say that endotto is the part that gives you supply chain security. So we use vaccines. Endotto is like Pfizer and Moderna are saying, hey, here's my signature saying, here's how much mRNA is in my vaccine, how much lipids I use, what sort of adjunct and who I bought it from, how it was all composed together. And so you know how this vaccine is produced, right? And so the tough is a bit like the FDA saying, why you should trust Pfizer or Moderna or J&J or whoever else you have tomorrow in the first place. And finally, sick store is kind of like the CDC, the record registry keeping the history of how every single little vaccine was produced, what lot number, what expiration date, how it was produced from end to end, from the moment that the ingredients were put together down to when it was recorded on sick store. And so, and now Santiago will take it away to talk about how endotto works. Thank you Trishak. So as Trishak was talking about, there's three components in total, we're able to track everything with top, we're able to distribute what we're tracking as well as the artifacts that we're protecting. And finally with sick store, we're able to transparently collect historical information about everything that was produced in an ecosystem. So to take it from the top, imagine that you're maintaining a project and you're developing your application at home and you need to put it somewhere so all people can collaborate. So you would want to put the source code but also information about the source code and how it was written and who wrote it into a place that's discoverable by people. With this, you're able to prevent attackers from breaking in or by buying the source code or targeting particular pieces of your distribution to introduce malicious code. Now imagine that beyond that, you also wanted to include a CICD system to continuously build all of the source code that you're writing. So what total takes care of is to creating evidence that tightly links together every single operation that you gave me pipeline so that when consumers are about to consume your software they know that nothing was performed outside of the specification and that everything has been performed to the letter by the right party and there hasn't been any tampering in between. So going back to the metaphor of vaccine distribution. Now imagine a pipeline that we may be very more familiar with a physical pipeline that's producing, I don't know, a little Go file, mRNA.go that is taken into a manufacturer, a CITD system in the cloud, maybe in GitLab. It will be constantly updated and manufactured as new ingredients come available. That eventually gets into a vaccine rollout plan like the ones that throw many different countries to be put into a vaccine site for everybody to consume and to then eventually put a selfie on Twitter to confirm to everybody that they actually took the vaccine to combat this pandemic. So in total in this way is a way to record key all of the operations that were taken to place from the very first moment which wrote the source code for mRNA.go all the way to the selfie on Twitter. That said, in total what it ends up providing is this property of software supply chain security. And I'll hand it back to Sean to talk about how with in total and talk you're able to communicate this information discoverable and resilient way. Thank you, thank you, Santiago. Yeah, so let's talk about a second piece of the puzzle which is a compromise resilient distribution of the software supply chains that Santiago was talking about in total supply chains. And so tough as you can think of this as a key or transport protocol that distributes all of the supply chains as well the artifacts. And so going back to the metaphor of vaccines again, tough is like the FDA telling you, hey, why should you trust Pfizer and Moderna or GNGA in the first place? And then you can go off and figure out their separate rules for the supply chain and so on. But you have this one central route of trust and from there you can bootstrap and figure out your way through the rest of the system. So you can think of tough we don't have time to go into all the details but they give you a rough idea of how tough works. Basically we listen to grandma, she told us we told grandma we have this very tough problem of trying to solve software updates from nation state attackers. And so we asked grandma for advice she gave us a few different design principles that we then put together and call it the update framework. And one of them is separation of duties basically don't put all your eggs in one basket. That's it. And other principles include things like consistent snapshots of things so that people, attackers can do mix and match attacks try to give you different package defences that don't belong together. Timeliness, are you getting the latest information? You have one key ring to rule them all so to speak. You have this one rule key and you can figure out the key through the rest of the system you can even slash and burn them. Doesn't matter. You can always figure out the latest keys from there. Coal storage keys. I think that's what the crypto kids call it these days. So you can use things like keeping your keys and like nuclear proof bunker somewhere in the Swiss underground somewhere. So that attackers can't get to it when they get to your infrastructure. Use two men rules. Grandma said always use separation of keys. In fact, it's very interesting what they do. The keys are physically situation far enough that one person cannot launch the new at the same time. So we use the same principle here. You need two different keys at least. Cryptographic agility. Use different hashing and signing algorithms at the same time. So you had your bets. So even if one of them is broken, you're still okay. Hopefully the rest have a different design. With apologies to the Nintendo corporation. That's basically in a nutshell. It's a bit like playing Pokemon. You got to catch them all, right? Okay, so yeah, that's basically how it works. And now Audra will talk to us about six store. So six store plays like a couple of different roles in this whole ecosystem. So I'm going to talk a little bit about what it does in general. And then we'll talk how it works in the data dog pipeline. So six store's main goal is to basically make a transparent and easy to use way of securing supply chain. So in the analogy that we have laid out before within Toto Tough and six store, what it kind of plays in this whole pipeline is that it provides a place of discoverability on the end user for finding when those artifacts have been signed and sort of providing like immutable history of signing events. So let's say like, for example, you might spin up a monitor and say, okay, I want to make sure and double check really just to keep myself up to date with everything that whoever I'm consuming artifacts from is actually signing when they need to. So without the six store piece, you kind of lose out on a history of records. And with that records, you get audit trails, you get transparency, you get any kind of like, way of sort of tracking down where things are going wrong. Because as we know, like, no matter how secure you make things, there's always going to be another hole somewhere. And so without that audit trail and without that record keeping, you kind of lose out on exploits that you know are going to happen somewhere and you need a way of finding out and tracing down what exactly happened. So this is kind of the compromise resilience to another level. So let's see a little bit of how there we go. So going back to our vaccine analogy, this is kind of the role of the CDC. So it's great, Pfizer is there, the FDA is there. But what really builds trust is a relationship over time with records, with data that backs up the FDA assertions and Pfizer's assertions. So this is kind of the role of that. And so I really find is that this, brings things from like 90% to 100% and more. So you're able to kind of see that history. You're able to, it provides people with accountability. And so that's the kind of role that I see this playing in the intodo and tough complimentary realm. So let's talk a little bit about what SICSR does in general. And there we go. So what we're gonna do is kind of start from like a basic principles. And I'll talk a little bit about how it can be used in like the, in different, more complex ways. So there's three main components of the ecosystem that I'm kind of gonna walk through from just starting from an artifact, which is this like notebook like thing on the left here. And how you might distribute that in a way that the end user at the bottom over there can check for integrity. So let's start with that artifact. The first step is the actual signing piece of that. So that's gonna happen with a tool called cosine. And again, all of these components that I'm talking about are totally modular and like, you know, it's pick and choose, do what you want. So cosine is a part of the SIGSTOR ecosystem that provides that signing and verification. So it's a really easy signing tool that tries to make, you know, most of the process as automated as possible. So everything you're gonna hear about from, you know, here on is happening under the hood. No one needs to know if they don't want to. So you can use a generator keys. You can, you know, bring your own keys. You can use hardware keys. And what you can do is, you know, with that public and private key, you can, you know, sign a container or really cosine is for container signing, but in the whole SIGSTOR ecosystem, you could really sign any kind of artifact. So all right, that's good and well. You can generate that signature and you can give it to the end user and end user can use the public key and signature to do a verification. But now building on top of that, part of the difficulty in signing artifacts is managing keys. Like key management is a very, very hard problem. We all know that keys get stolen, keys get leaked, keys get published on websites, keys get compromised. We know it's just like kind of a fact of nature. So if it's going to happen, some ways that SIGSTOR kind of tries to mitigate this problem and make it something that developers don't even have to think about is with this automated key management piece called Foolsio. So Foolsio is this PKI system. It's a root certificate authority that is going to live also in the SIGSTOR ecosystem. And what it does is it's totally free also, totally public. Is it generates code signing certificates based on an authorized identity provided through like an open ID connect flow. So what you end up with is maintainers or distributors can generate a one time or like a femoral use key pair. So these public and private key pairs use that to create a signature and then send their public key up to Foolsio, which Foolsio will sign off on generate a certificate for and provide that back to you. After that code signing is done, the distributors or maintainers can just throw away their keys. There's no need for key management at all. And what this results in is that verifiers can just simply verify based on that identity and as long as they check the Foolsio piece of this. So that's awesome. We don't have to deal with private keys, but now there's one more piece that exists also in conjunction and is kind of necessary with the Foolsio piece and that's called RECOR. So RECOR is a transparency log. You may be familiar with this as like a type of ledger, but it's also a time stamping service. So what this log contains is basically storage mechanisms for any type of signing artifacts. So like maybe this includes just general like container signatures that we have on cosine. Maybe this includes jar. Maybe this includes RFC 3161 timestamps or maybe this includes in total and tough metadata. So what this allows is anyone can go and upload signing artifacts onto this with a totally immutable history, which is by nature of this transparency log. And it's searchable. So what you basically have provided is a searchable and totally immutable record keeping of all the signing that's going on, which is awesome in case someone ever wants to monitor or trace down exactly what happened and when. And also like, in the future, like you can think of just kind of expanding this out as giving you this whole global view of the ecosystem and that you can kind of use to build connections between different things. So this entire piece is kind of how this ecosystem works in conjunction. But again, we can kind of pick and choose components as we want. So the great part is that like, everything could happen under the hood starting from the cosine flow, using all three of these components, keyless signing with Fulcio and verification on the transparency log. But you can also just use, for example, the transparency log to provide end users that you have with like a transparent way of saying yes, you did sign things when you wanted to. And that's exactly what Trashank is gonna talk about like next with this data dog integration is using Rekor as a way of hosting their tough and in total metadata as just a way of saying, hey, we're definitely signing things transparently. So I'll hand this off to you, Trashank. Great, thank you. Thank you, Asra. There we go. So yes, as Asra alluded to, let's see how this actually all looks in practice. How would you actually do this? And we did this for the data dog edge integrations as I mentioned the background problem earlier the motivating problem. And so I'm very proud and very pleased to say that we were the first in the industry as far as I can tell and still the only one to build a compromised resilient software supply chain. And so this is what our in totals of a software supply chain looks like. We don't need to get into the gritty details but the basic idea is that we're basically packaging or integrations as Python wheels, which are basically just zip files containing our source code. And our developers are signing a source code checked into GitHub in this case using the UB keys using hardware root of trust. Okay, so you can easily find your keys. And so what happens is the CI CD then comes in the middle of packages, all that stuff puts it in the zip file, as I say, Python wheels and then signs everything using signs all this in total metadata and the Python wheels themselves using top and distributed to our end users. And crucially the important thing here is that if you look at the end transparently what's happening when users install one of these is that the agent transparently called stuff and in total in the background to verify the software supply chain. It checks for example that the Python wheel was produced by the CI CD but on top of that it actually unzips it and checks the source code was literally signed by one of our developers using the UB keys. That's what gives us compromised resilience here even if someone tampered with CI CD they can't forge our developer signatures. And this is the tough distribution model. Again, we don't need to get into the details. It looks complicated but really just trying to load balance metadata here. We've been collecting, as I said we've been doing this since 2018 before software supply chain security was cool. And we have three years worth of data on releasing this software releasing the Python wheels using and along with the associated in total software supply chain. And now I'm also pleased to say we're the first in the industry as far as I can tell to do transparent compromised resilience to take it to the next level as Azra was talking about. So we have our developers sitting some of them sitting in the New York Times building must of course and releasing new Python wheels using the UB keys. So you can't exfiltrate even if you get into the laptops you can't exfiltrate the signing keys. And secondly, every signing operation literally requires touching your UB keys. So our developers release a new source code that needs to be sent out to our users and our CI CD packages them using in total sending attestations about it. And then everything gets packed together in tough using tough to securely deliver both to our end users. And then finally now we're beginning to record every single artifact, every new integration that was released on SIGTOR on your own record. So that everyone can query it and see exactly how every single little integration was produced end to end from our developers to you. So, and now Azra will show a very cool demo. All right, so yeah, the purpose of this demo is to talk a little bit about, you know Trishank mentioned three years of history of putting tough metadata and publicly verifying it but the question is like, did he really do it for three years? So the question is, can we actually like, you know find a history of those three years of metadata? So what I'm gonna kind of show you is a little demo of how a verification on end user would work using in total tough and SIGTOR components. So, to do, all right. So what we have here is the data dog downloader. So what this does is I'm just showing you from the start we're starting with a trusted root.json in our current metadata. And then what we do is we're gonna run a downloader that's gonna do tough verification and in total verification. So that what we have in our resulting metadata files is now verified tough and in total metadata. And you can see I've just printed out the current timestamp that we've retrieved through this download process and we see we're at version 3000, right over here. So now going to our SIGTOR part of this what we can do is we can run a search query using RECOR on all of the tough metadata that's currently on there signed by the current root.json that we just pulled from the downloader. We find a couple of entries in the last couple of days for this demo. We verify that they currently exist in the log just because we don't necessarily trust the index indexing service. And we actually pull that entry from the log find out its metadata content and we find out a tough looking piece of metadata of a timestamp roll with a version 3000, which matches before. And just to make sure that it really truly does match we can take the hash of that F25C60 as you can see printed below and it totally matches the one that we downloaded from the verifier. So that's kind of a speed run demo of how this entire process works but the main idea is that we did it with speed because it should be easy to do. So there's not too much going on for you to deal with but all of these pieces are there for verification and are totally transparent for you to kind of check and click through. So we kind of hope that people start integrating this into their system and making it like an immutable history for people to kind of search query monitor and also for your own benefit making sure that no one's signing things on your behalf that shouldn't be there. So you know, you can make sure that no one's signing with my email ID token by checking my recore log. So recore is a really cool tool to kind of search and verify and pull things and investigate. So now I'm going to hand it off back again for some security analysis. Thank you. Thank you again, Astra. That was a really cool demo. I especially like how difficult it was to really do all this. It looks onerous, right? All this stuff and in total checks in the background and searching six store and it looks totally unusable. Yep. I don't know. I bet anyone can do it. And if you want to, like, I mean, the nice part is that it's all scripting. So, you know, we all know how I write scripts which is look at other people's scripts. No, no, exactly. That was the whole point, right? It can show people actually how easy this really is for land users to do, right? Because we've hidden away a lot of the complexity for you. But the also nice part is that even though this complexity is hidden, it's totally public. So the more skeptical you are, the better if you want it to get under the weeds, you can. Exactly. Cool. So, yes, let's next take a look at the security analysis. What's the bottom line? What do I get for my money here? Well, let's take a look. Newstay or DR, what can go wrong? Well, nothing has obviously gone wrong. Life is good. But when you have a developer key compromise, in theory, yes, malicious source code could be built automatically and distributed. But there's an important thing here. First, we've significantly increased the bar for attack. So the first thing you can do is to increase the threshold needed for developers to, so for example, you could easily require at least two developers to sign off on exactly the same source code before you even trust installing the integration. And the second thing, which we are already doing, as I mentioned before, is that our developers are using UB keys and you can't run away, the keys are generated on a UB key and you can't exfiltrate the signing keys. And then two, you literally need to touch the UB key in order to authorize every signing operation. This significantly increases the bar for attack. Now, what happens if our VCF, in this case GitHub gets compromised, no problem, don't lose sleep. In fact, we've accidentally detected accidental denial of service attacks here, where it looked like an attack but it wasn't, it was a bug in like, for example, not a bug but a feature in Git, rewriting new lines on from Windows, for example. And then there's CI CD compromised. Again, don't lose sleep because we always, always, always double check the CI CD. We never blindly trust it. We always, the root of trust is basically our developers. What happens if the image registry gets compromised? Again, no sleep lost or the KIO files ever. I think you get the idea by this point. And so some conclusions. Too long didn't read. Just use in total tough and six store, okay? To get transparent compromised resilience. That's all you need to remember. You don't need to worry yet about how it works but this is the basic idea. We want transparent compromised resilience for all, right? And this is how you do it, this is our claim. And again, going back to the metaphor of vaccines just to make things a bit more concrete in total is like Pfizer and Moderna or J&J telling you exactly how, how every one of their vaccine was produced. Tough is like the FDA telling you why you should trust each of these vaccine makers in the first place, right? So the bootstrapping and trust and secure distribution of these vaccines. And finally, six store is like the CDC keeping a permanent transparent record for everybody who wants to walk in and check how was my vaccine produced? I want to know for men to win. And so that's partly part of the power that six store gives you. And now Azra will talk about how this is coming to all of us. Yeah, so I'm gonna talk about this exciting new development which might not be as new when everyone is actually watching this. But what we are trying to do is bring the benefits of tough trust routes to everyone. So transparent compromise resilience for all. So what we intend to do is sort of integrate tough routes that you can kind of make yourself on GitHub and have our trust route. Like by bootstrapping our own trust route, you can build your own trust route through this GitHub approval process. And that will like naturally integrate with six store tools. So for example, let's say you had a project like for example, Disturless and they wanted to use tough or maybe use in Toto too for their supply chain and publish metadata based on their current layout and have people in cosine actually verify the entire thing end to end. So what they can simply do is just do a cosine, like if I was an end user, I would just use cosine to use a verification and then point that and pin that at a certain type of delegate which is a delegation from our own six store route to the Disturless's route. And that way as an end user, I don't have to deal with finding the right metadata. I don't have to deal with pulling the current metadata. I just know that I'm gonna pin on this delegate that is trusted by six store, which in turn I trust and then I'm like single line able to verify an image. So that's the kind of idea there. We basically wanna bring trust routes to everyone and make it so that projects can transparently and openly utilize all of these tools in conjunction without having to set everything up themselves. So now I'm also gonna hand it off to Santiago for some other improvements. So lastly in the research front and like the North Star front, we want to do three major thrusts, three major pushes. One of them is to integrate projects such as SkiLine to have a hardware roof trust semantics into the tracking of supply chain metadata so that you can know every single thing that was used in the operation of a supply chain action. Now the second one is to actually start discovering this complex interrelationships between software as it is used to produce more software. So we can track how something, how a software artifact was partaking in the creation of a new container, for example, and start understanding how this has influences in the trustworthiness of a particular software artifact. And finally, with all of the metadata that we're collecting, we can start creating data science based solutions to identify the threat surface in the global software supply chain and start making more proactive actions to reduce our attack surface or other vulnerability surface. And of course this is all work and it's not only the three of us, but the people from multiple organizations are working very hard to make this a reality. We wanted to give them a shout out in this slide. And without further ado, I think we're ready for a Q and A, aren't we? Yes, I think so. Thank you, thank you everyone. Really appreciate your time. Thanks.