 Great. Welcome to All Things in Toto. I am Santiago Torres. I work at Purdue University. I'm an assistant professor and I do research on software supply chain security. That is, I care about how software is made securely and I care about how software is consumed securely. And here's my colleague, Marcella. Hi. I'm Marcella Malara. I'm a research scientist at Intel Labs. My current research focuses mostly on software supply chain security, but more generally, distributed systems and security. So this is right up my alley as well. Should we get started? So we want to sort of set the stage in a common language to talk about the software supply chain, software supply chain security. So we're going to start with some basics. The software supply chain, the way we see it, the way in Toto sees it, is a collection of all the systems, devices, organizations, actors, individuals that are involved in producing a software product. So a supply chain attack is essentially a compromise of any one of those elements of the software supply chain, any one of those components, and in particular, when they take advantage of a weakness. The common goal, of course, then being some sort of alteration or modification, mostly malicious to the final product. And we always like to point out this very kind of impressive statistic, right? Over the course of three years between 2019 and 2022, Sonatype essentially identified the rate of growth of software supply chain attacks. And over this three-year span, a 742% growth is astronomical. So this really is a very crucial problem. And so we can point out a few different examples. Some of you may be familiar with these. SolarWinds is probably the one that most folks here have at least heard of, right? The SolarWinds hack to be very specific, right? SolarWinds being a company that, even though they had open sort of software and a lot of these security measures in place already, right? They still had a loophole or weakness in their supply chain that was then taken advantage of by attackers. So to address all of these issues, right, there's kind of a growing ecosystem around this problem. And we bucket the solutions into three main areas. Evidence gathering, information discovery, and policy or validation. And we've highlighted a few of the CNCF and open SSF projects that are sort of making headway in the space. You're probably familiar with at least some of these. In Toto in particular has also seen a lot of integrations and adoptions in recent years. So these are the projects specifically in the CNCF. Then in open SSF, there are several standards that are emerging here. And this is sort of more broadly an ecosystem. And these final ones that we're popping up are really sort of in Toto friends, as we call them, right? These are the adopters of the in Toto standards and tooling so far. And then finally, most recently, I think this has been within the last year, right? NPM and GitHub, for example, we like to highlight especially because they have sort of really adopted and integrated in Toto at a much larger scale, right? NPM being the largest package management service and GitHub being one of the largest CICD services. So working together has really sort of expanded the reach of in Toto really. So I'm going to hand it off to Santiago now. So by now you may be convinced that in Toto is involved in all of these different initiatives that you may recognize. And that in Toto is probably a fundamental part of this software supply chain security problem. But you may be asking yourself, what is this in Toto thing anyway? And well, in Toto really is a way to talk about how a software product is made and to see essentially every single step that was carried out. The way that in Toto works is that it allows you to wrap all of these different technologies that you're familiar with. If you're familiar with Salsa, well, you can use in Toto to communicate Salsa predicates. In the slide that you're looking at right now, it looks a little bit daunting. But really what it really means is in Toto allows you to bind information from software supply chain predicates. Like in the bottom we have a subject, for example, an S-bomb. Well, a subject, for example, a container and a predicate, for example, an S-bomb. We're saying I am attaching this S-bomb to this particular artifact in the software supply chain so that you know that this S-bomb is associated with this particular container. It also allows you to do other things such as authenticate that particular attachment so that you know that the person who is attaching that S-bomb to that container is the person that you trust to attach that S-bomb to that container. Well, this allows us to do sort of like a very simple graph construction. We're building the supply chain through its evidence. So in this case, for example, we are able to allow all of the people that are producing a piece of software, in this case, say a Debian package, to produce attestations as things are moving forward. And eventually, using a policy language, whatever you're more familiar with or more inclined to use, OPAC, you can look at all of those attestations and say, well, is my package packaged by the Packager? Is my software written by the developer? Is the build system that I'm using, the one that should be building this binary to begin with, is my Travis CI deployment actually running the tests on the same software that was put in my package or is it running tests on something else? And through this, then, we're able to get a very sort of highly granular, very visible understanding of how the software is made. Now, all of these little pieces of evidence that we were looking at just earlier, these are called attestations and they are essentially little atoms of how the software supply chain is connected together. We also call these predicates. These predicates, again, allow us to talk about different properties of a subject and the subject, for example, is any piece of software or intermediate piece of software in our software supply chain. So, I don't know if you noticed, but this really does look like a grammar and this is not a coincidence. We're trying to answer and ask questions about the software supply chain so that we can better determine how much we trust the piece of software. Questions that you can ask yourself is things such as how solar winds solve the solar burst compromise. This is blatantly stole from their white paper. I hope they don't sue me, but this really lets us get some visibility in how really they took a look at this problem, the largest compromise in the history of the United States and how do they think they will solve it using in total. Essentially, what they're doing is they're using two pipelines. One of them is in a completely isolated environment with a completely different stack that is rebuilding the software. The claim here is that if these attestations and the other attestations that are coming from the production pipelines agree on the result of the binary, then the likelihood of a backdoor being in the build system is very low. This means that the hackers should have broken not only into the production environment, but in some other siloed, isolated verification pipeline that they put in a different location, such as the case of Datadog, that they also build somewhat of a more linear pipeline to verify how software traverses their supply chain. At the very left-hand side, they hand out a hardware security module to every developer and they use it to automatically sign every single release that they are producing. When that release is pushed into the CICD system, then the CICD system also produces a bunch of different attestations. You can think of this as salsa attestations, and I don't remember if these are actually salsa, but you can then bind these two things together. The developer actually produced the software that is being built right now, or is it building something else? Finally, when you get to download the agent that's running in your infrastructure, we'll download these two attestations and then it will do a couple of checks. The first one is, did I download the thing that's a subject in the latest attestation, and the other thing that's actually very interesting? Can I go all the way back to the original source code, the Python files that I found, and match them to the developers that are building this software in the New York Times building in the Datadog office using a hardware token? So that way they know that no single line of code was introduced, nothing was really modified by the CI. The CI is really just running the tests and packaging the software. Another example that I wanted to highlight is, and it's something that's really exciting to me, is how can you combine six-door salsa and in-toto to bind GitHub Actions and NPM? The way that they do it is really, on their workflow, they have their NPM GitHub Action authenticate the runner and then spit out a salsa attestation that's being signed using six-door. To answer the question, well, is this GitHub Action, the one that produced my piece of software, or did somebody try to replace the NPM package on the registry by building it themselves in a different computer? These are not all the questions that you can answer or ask within totos. You can ask very simple questions like, who did what? Who was the developer that did this particular change? Or who changed which particular container? Which tools were they using? Where are they running these tools? And is this what they should be doing to begin with? Is their CI configured in the proper sort of way? You may have also noticed other tools that can produce attestations today. We have a very rich integration environment. Trivi, for example, can spit out both at Nesbom and I think of a vulnerability scan attestation that then you can attach and ask yourselves, did anybody run a vulnerability scan in this container before I admitted it into the cloud? When did they run it and what was the result of it? Salsa is probably something that you're very familiar with and you can answer the question, how was this particular software product built? A little bit more loosely, can you ask yourself, did somebody run the tests? Did they even pass? Or was there a runtime trace? Essentially showing me evidence of this container running in an isolated environment when it was building this piece of software. And really anything that you could collect throughout the software supply chain, the idea is that we want to push that forward and verify it. Take everything from the left-hand side to the right-hand side to verify properties of the software supply chain in both a cryptographically strong, but also in an automatic operationalized fashion. Actually something I'm very excited is Intel donated some code to the Intel framework that allows us to answer even more complicated questions that I have just now, and Marshal is going to show us. So this is really an overview of the demo that I'm about to show you. So the idea here is when Mary the maintainer submits a pull request to a Git repo, right, this is the CSED pipeline, and specifically if you're using GitHub, the GitHub Actions workflow will trigger the build, generate the build output, write your artifact or container, whatever it is that you're building, and in addition also the metadata that is required to generate the intuitive attestations. Thanks to work that I've done and with the intuitive community together, as well as prior work right from other communities and other community members, we then sort of again through the CSED workflow can actually sign and upload these attestations to the six store transparency log and verify the inclusion proofs, get those artifacts, get the attestations on Camila, the customer side, and verify that the software artifact is actually compliant with our expectations, with our policies using the attestations that were generated during the build and signed by Mary's organization. So as Santiago was alluding to, part of what we want to spotlight in this demo is the Sky Predicate or the supply chain attribute integrity predicate, which essentially seeks to bind explicit attributes about any aspect of the software supply chain to evidence of these attributes. And there's a few more fields in this predicate. I'm not going to walk through the JSON today, but hopefully get the idea that we are able to not only generate data, but start sort of deriving meaning from this data for verification purposes. So the sort of questions we seek to answer in this demo are including, was the build tampered with, right? Was the legitimate version of the source repo used? These are questions that Salsa Providence can answer. Did the build produce an SBOM? Did the build produce Salsa Providence? These are the types of questions that the Sky Predicate can then, in addition, answer. So for this, I will... You can see my screen, yes. Break out of here. Do we have a drum roll? Yeah. There isn't anything actually live running right now. This build takes about 30 minutes, so I'm not going to have everyone sit and wait for that. But essentially, so this is a fork from a team project that my team maintains at Intel, private data objects. We build several containers for a distributed system type application and smart contract type application. And what I really want to showcase here is the SBOM generation for one of our containers, the client container in this case. We generate the various information that we need for the Salsa Providence in addition to a few extra sort of logs and whatnot. So after that, right, the workflow, right, here's our builds. Here's the stock Salsa Providence GitHub Generator workflow that I was able to just plug into my workflow and generate Salsa Providence with. And kind of just as part of this tool, we are able to generate six store signed in total attestations that have the Salsa Providence predicate in them. And so if we go over here, yes, you can sort of see this is the actual recorlog that was generated for the spilled. And here's a Salsa Providence attestation that you can verify, you can check against to obtain the signature from. And so then similarly, we built a similar workflow for Sky. So these are available at our Sky Demos and Todor repo. And here our idea is to generate what we call attribute assertions and what these are really just sort of individual mappings between attributes and evidence files. And so as you can see here, we're generating an assertion saying that, yes, we have Salsa Providence, generating an assertion saying, yes, we generated an S-bomb, and our workflow will sign off on this again and upload it to six store, much in the same way that the Salsa GitHub Generator does that. So again, we can walk over here to our recorlog, find, for example, the inclusion proof for our Sky attestation. And if I pop this open, we can see here this is our client container commit over here. And here we have the has S-bomb, has Salsa, and I'm also asserting that, well, this container was not built hermetically. I'm not going to go into what that means, but we can talk about that later. So essentially, what's kind of important to point out here is that our evidence is the JSON blob that we generated, or JSON blob. Yes, our JSON file that we generated, and similarly for our Salsa assertion, we're saying, yep, here's the Providence that we generated. So if we go back to our workflow summary, we'll see, hey, here's all the artifacts, including our SPDX file, our signed Sky attestation, and so forth. So that's on the generation side. On the verification side, let me pop this back for a second. Right, there is steps five and six that need to happen. I'm just going to point out that for the sake of brevity and making sure that the demo goes smoothly, I will not be showing steps five and six today, but these are kind of important steps to do this pre-verification essentially on the authenticity and integrity of the attestations coming from six store against what it is that you get from your GitHub workflow. So I'm going to be showing step seven next. So I open up my shell, and I wish I had... Is this font size readable, or should we make it a little bigger? It's good? Okay. Thanks, yeah. So if we look at our attestations, we have our build attestation and evidence collection attestation. These are essentially the Salsa and Sky attestations. This is how the Intodo attestation verifier consumes them, so I need to rename the files. And then similarly, if we look at our evidence files, one of the evidence files for one of our Sky assertions is that same Salsa attestation, right? So I'm going to just run the verifier for a particular layout. So the layout is essentially the Intodo policy definition. And we can go over that in a moment. But essentially, this tool performs a whole host of verifications, right? We are checking the signatures on our attestations. We are evaluating policy rules on each predicate in the attestation once we verify the signature, and essentially just making sure that every single rule applies, right? In this case, yay, verification was successful. And if I... the layout... Again, I'm not going to go through all of this, but over here, right, we have our functionaries. These are the signers. We have all of this key information. And so here, we have the build step that generated the Salsa provenance. And then down here, we have the evidence collection, which generated the assertions about our build step, right? So you might see our rules down here say, you know, we want to make sure that our attestation says that we had a Salsa provenance or Sbom generated. So then we run our Sky Generator or SkyGen check evidence tool to essentially check the evidence in one attestation against that particular actual evidence file. This is a slightly shorter check. The idea is to eventually integrate this into the attestation verifier as part of that flow. But what we're doing here is really just... It's essentially checking our Sky in total attestation against this particular policy and against the Salsa provenance file, right? In this case, we just want to make sure that, yes, our Sky predicate has the right attributes that are being asserted and a couple of sort of additional fields and attributes instead of our Salsa provenance, which again is our evidence here. So that concludes the demo. Hopefully that demonstrates kind of clearly what is going on here. And, yeah, handing it back to you. Okay, so with this demo, you saw really the power of Intoto. You can check not only that Salsa exists, but that Salsa is up until certain level and that there's evidence to support that the Salsa level that you actually recorded matches the Salsa level that you expected. Now, Intoto is a community project and we're not just doing a marketing pitch, but we want to also talk about how you can participate, become part of a community and pretty much help us grow this project that really to secure the software supply chain we need to cover the whole supply chain. So Intoto is really a specification with a series of implementations and a process to enhance the Intoto specification. These are called Intoto enhancements. What I mean with enhancement is if you have an idea of a way to improve Intoto, you can talk to us and we can encode this. This is heavily inspired by pretty much anything that ends with EP in the open source world. The enhancements then are floated back into an implementation and we try to keep this ecosystem steadily growing to both sandbox certain experimental features but also to support pretty much people that have been using Intoto for, I don't know, four years already. The major implementations that we have are Python, which is usually our stable implementation. This is the one that we try to pretty much fix security issues but not add any sort of weird new features. And then we have the Go implementation, which is the opposite. It's pretty much bleeding edge. It's the one that we use to test new features. We also have implementations in Java. This is mostly used for the Jenkins plugin. If you're using Jenkins, you can install an Intoto provenance generator. And the Rust implementation, which is used by another project called RebuilderD, which is heavily connected with the reproducible builds project. A way to verify artifact reproducibility is actually using Intoto to verify attestations about multiple builds on the same piece of software. ITS, again, are the way that we grow the community or grow the feature set of Intoto. We're moving really fast. As a matter of fact, this slide is outdated. ITS 9, 6, and 7, I think, are accepted. We have 10 and 11 now that are the ones that you saw in the Sky demo. If you have any idea, really, reaching out to the ITS maintainers, one of those is the best way to get the conversation going. When are you here at the station? Even if it's in the south side of the station format, they're probably talking about this document. This is where attestations were referred to find for Intoto, and that's how they inspired the wrapper around Salsa. Intoto is a common language. Again, what we're trying to do is accommodate all of the communities, especially CNCF project, so that they can talk about the software supply chain or a particular software of the software supply chain and fit it with a bigger picture of software supply chain security. Yeah. There's a lot of attestations. It's still growing. If you have an idea of an attestation, for example, if you have a tool that you think could produce a piece of evidence that would help us assert new information about the software supply chain, well, reach out to us. We're really, really excited to work with new communities and find things out. We're both scientists, so when somebody comes with something we didn't think about, we really laser focus on that, and we really love to hear from these perspectives. A lot of the predicates were actually not built in-house, but rather some tool was generating something that's useful for us to make a more secure software supply chain, so we brought them into the conversation and then added support in our implementations or directed people to use that predicate as well. Something that does happen is that a lot of these attestations can also help to define a common language between all of the tools. We have a lot of different vulnerability scanners. If we are able to determine a predicate that accommodates all different vulnerability scanners, then we are better able to write tooling to support the treaty and SIFT and really anything out there. To define your attestation formats, you need to reach out to the attestation maintainers. You can pretty much do what you saw on the slide before, which is open and a pull request. Have a couple of comments. We have multiple stakeholders looking at this particular subset of the internal community to help accommodate and grow the predicate offering and also define best practices on how to use them. If you want to use, say, Sky, we have a whole Sky demo repo that can help you verify things like, oh, if I wanted to use Sky to verify a secure boot on a server, there's already a sketch of something you could start playing with. I want to verify Salsa level using a runtime phrase or a Skyatt station, then there's also some code there that can help you visualize how it works. Well, to wrap up, and I think we are... Yeah, we probably want some time for questions. We want to build more expressive policies. Ike 10 and 11 is what you saw on the demo. And what we're trying to finally close out is the loop of how we produce all of this evidence from all of these tools, all of the CNCF projects and open SSF projects collected using all of the tools and then verifying them at the time of deployment or pretty much just verifying at whatever time is possible. The internal community is pretty much a thriving community of people that are suggesting new heights, people that are building tools, people that are building demos, people that are maintaining the different implementations. Do reach out to us. We will definitely find a home for the type of work that you want to do in the community. There's really a lot to do in order to accommodate this very expansive goal. We have an internal community meeting at the first Friday of every month. So last Friday was the November one, but the December one should be in a couple of weeks. We're also in the CNCF Slack. We are also in IRC. It's not bridged, but if you want to do a contribution and hosting the bridge, that would be great. A mailing list and also, of course, the GitHub organization for you to reach out to. We also participate in Google Summer of Code. So keep an eye on that. We usually have a lot of ways to onboard up-and-coming students into the organization. This started as an academic project, so a lot of the mentoring will be done by both Purdue and NYU and the New Year's Institute of Technology. And I think that's it. We do have time for questions, right? We have about five minutes. So yeah, thank you. There has been, and that's a great question. So I'll repeat it for the sake of Mike. So we show the happy path, right? You collect all of this, you verify it, it passes the policy, everybody's happy. But has anybody tried to look into a way to subvert this sort of attestation process so as to introduce a supply chain vulnerability, though in total it's in place. So yes, we have been looking at it. There's a formal security analysis. We've published the paper back in 2019. But more importantly, we had a third-party security audit that was concluded earlier this year. Am I right? Earlier this year, I think like around March or so. I can't believe it took over November. So yes, they also found ways to do it, and really it boils down to implementation aspects. Like there are some places in which you can introduce ways for the attester or the attestation code to attest for a lie. But we have been doing a lot of work to isolate these components, right? The thing that's actually carrying out the software supply chain action, and the thing that's observing and rubber stamping the particular attestation. Yeah, I think we have time for a few more questions. Yeah, sure. So one of the things I noticed in your demo was that it seemed like it would help some of these assertions or attestations would help with S-bomb discovery. They had kind of pointers to download locations of S-bombs. In addition, I recently was looking at the security and site CMO specification, which also has a way to discover where to download S-bombs. And I know that S-bomb discovery in general was like a pretty big problem. Would you see those two specifications maybe working together, or how would that, Very good question. Thank you. I think they would probably be complementary. I also think that in Sky could probably express some of that other specification. But in general, that is just one of the applications. So Sky is very specifically designed to be as general as possible. So this just seemed like one of the sort of most straightforward applications to aid with, like you're saying, S-bomb discovery, attestation discovery in general. And I don't know if anyone here is familiar with Guac, for example, but we're sort of trying to show here if we were querying any kind of attestation database or attestation log, where we are retrieving some of these attestations or signatures that we can actually generate in attestation about what this log is also telling us. So in addition to discovery, we're trying to also sort of add an extra layer of integrity to that. So I think it went beyond your question. But I think we have time for one more. And a quick shout out to Archivista as well, which is another project to help with discovery of attestations. It's really like optimized for asking anything about a piece of software and then telling you everything it knows. I think there's a booth on... Sorry. The number of project is Archivista. Yeah. I think they have a booth, the company's testify sec, in the pavilion. We have time for one last question. I think it needs to be quick. Well, I don't know if this is a quick question, but I noticed that one of the steps that we skipped was step five of getting the actual attestations for the resource. And I'm curious what ecosystems have support for automating that, because I know OCI has recently added that, and NPM I think has added some of that. But are there other ecosystems that are integrating that right now? Yes. So the ecosystem that we've been exploring has been mostly the six store recore ecosystem. Unfortunately, they don't have strong support for actual attestation storage, right? They're primarily for signatures. And so there's some quirks with the APIs and those kinds of challenges that we ran into. But that's more sort of on the engineering. I think that's more an engineering problem than an ecosystem problem, per se. But I think, again, that's where a tool like Archivista could come in, which is tailored specifically towards storing intuitive attestations and SBOM and those kinds of metadata. And so I think in lockstep with some of these other ecosystems, they could really sort of work there and integrate. Thank you. Thank you, everybody. Thank you.