 Good afternoon, and thanks for attending the most exciting talk at the OSS Summit today. I'm a little biased, I admit. But it is the end of the day, so I hope that no one falls asleep. It's going to be pretty action packed. There's a lot of information. And there's a lot of content, so let's jump right in. So this is securing your infrastructure as code pipeline. And I am Jesse Sanford. I'm a software architect at Autodesk. I focus on platform engineering and secure software supply chains, and I speak at conferences like this, contribute to open source. And when I'm not in front of a computer, I like to sail and get outdoors as much as possible. So a quick agenda. First, we're going to talk about whether or not infrastructure counts as software in today's world. And next, we're going to level set on some definitions. And after that, I'll pose a supply chain security risk scenario, and we'll look at a mitigation technique that's a variation on talk I did at KubeCon North America last year. And finally, we'll look at an incrementally better solution before we get into some questions. So we're here to talk about IAC, supply chain security. But I have a question. Is infrastructure software? Can I see a show of hands for anyone who thinks so? All right, a little mixed bag here. All right, well, I would posit that with the massive adoption of public cloud offerings and platform as a service, or as a services generally, that more and more of what we traditionally thought of as infrastructure is actually software, or at least software-defined. And so even if you were to push back on networking and other plumbing, what about your databases? What about your file and blob services or your caches? And even if you were to push back on those, increasingly these things are controlled by automation. And that automation is definitely software. I mean, we call that automation infrastructure as code for a reason, right? But what is infrastructure as code, really? Well, first of all, it's code. So it's codifying the intent of a developer, good or bad. As code, it's repeatable, diffable, automatable. Second, it's usually executed in highly privileged environments, creating everything from networks to user accounts. So common examples, cross-plane, Pulumi, Terraform. And there's others, like cloud provider specifics, like cloud formation from AWS. Next, there is software supply chain, which is very similar to what you think of supply chains elsewhere. It's all those upstream things that you build and use in your product. In software world, it's usually code. Sources are compiled. You may have or may have not had any hand in creating it. There's a lot of code that runs in and around your production stacks that you didn't write, OSS packages, language packages, Kubernetes, the OS running Kubernetes, hypervisors running your OS, developer tools, build tools, all that stuff. And increasingly, much of that software is open source software. And you don't necessarily know who wrote it. And you don't necessarily know if you can trust them. And you don't know how to hold them accountable if they break that trust. All of that's actually a very good target for attackers. Now, with those definitions in mind, let's set up this problem space. Autodesk has been on a platform journey for quite some time now. And critical to that platform journey is our deployment platform, which we call CloudOS. CloudOS handles our application delivery. And it does throw through the use of a declarative format, which we call an ADF, but it's essentially a YAML file that contains a manifest of infra and app artifacts to be deployed. And that ADF starts its life out as YAML, as I said, and then goes through some homegrown tooling built in Python and Go and JSON it, and it's turned into Spinnaker pipelines and also Terraform, which in turn produces those cloud resources that we use by our products. What about that Terraform, though? Where does it come from? Well, frequently it's created by our internal teams, but they often rely on open source providers like those produced by Hashicorp and others. And how do we really know who produced those providers or those modules that they find out in the wild? What if they're produced by bad actors? To be fair, it's not just Terraform that has this problem. Almost all IAC tooling comes with some pluggable way to extend its functionality. And usually there are numerous community providers or extensions. Here we see crossplane in the hot seat. And IAC is great. And it comes with a lot of benefits, as previously mentioned, but it does require a lot of permissions to do its job. It needs that power by design, but it's very important that it's secure. So what can we do about that? Here we see a familiar CI-CD flow that I borrowed from the Celsa project. And the steps that you see here should be familiar. I'll fill in some icons that are more infrastructure as code specific. You can see that we have GitHub storing our source, Jenkins building our modules, and a Terraform registry down below, which are hopefully visible, that's storing those modules. That's kind of like the public Terraform registry. It could also be an internal GitHub. You can source Terraform modules from GitHub as well. And you can see those modules are getting pulled in as dependencies there in the middle to be used in other modules before being deployed out to our cloud provider. And so now let's take a look at where those threats are. Same thing also from Celsa project. There's a lot of red there, a lot of attack vectors. Pretty scary stuff. To help defend against this, at KubeCon North America last fall, Jason Hall and I proposed a way to create a walled garden for our ISC and its dependencies. We showed that our current solutions are more like this than this. And in that talk, we showed that by centralizing the execution of our ISC and layering on Kubernetes RBAC, we could choose who has access to install and manage that ISC and also who can use it. The solution we proposed was for allowing cross-plane to actually verify packages signatures using cosine, signing and verifying those packages. And the OCI package format was handy because both cross-pane and six-door cosine are able to read and write it. And so we showcased the work that we'd been doing with the cross-plane community to enable that. But what if we don't have the ability to use cross-plane? Can we still make use of these tools? Terraform has a lot of inertia in a lot of companies. Autodesk is no different. And we absolutely can. And this is what I hope to show you today. But to do this cross-plane or not, we still need a funnel. You have to control what's being used to manipulate your environment. And you need a centralized choke point. And the only way to do that is to centralize your reconciliation. And for a funnel, we could use something like Spinnaker, which is what we have, or Jenkins. However, to achieve any security gains, you'd still need to verify that those Terraform modules that are being used in those builds are good. And Terraform doesn't have any way built in to validate those modules. It does have weak trust of providers based on storing hashes of first execution. And if you don't know, that's very similar to the SSH error message you get when you log into a server after an IP address has been recycled. But it doesn't do anything for the configuration composition of modules themselves. And again, we talked about this last KubeCon. If you're interested in the security gaps in that model, go ahead and watch that replay. And I demonstrate a dependency confusion and typosquadding issue that could lead to some very bad things. So what can we do? One thing we could do is introduce static analysis scans. We could check that all the modules being used for a particular execution are checked for known misconfigurations and vulnerabilities. And this would work, but it requires a lot of duplicate scanning. Even if the scanners are quick, just the idea of adding additional scans to every deploy is controversial. And our developers really bemoan any additional time spent in CD pipelines. So each module is not just one module, either. It also adds some unknown number of submodules, and all of them could be good or bad. And we can do better than that. And thankfully, there is a better way. By using the TF controller from Flux to centrally execute our Terraform, we can build a pipeline that largely mimics what we built for crossplane. This is because the TF controller can utilize Flux's built-in OCI registry type to manage Terraform modules, basically wrap Terraform modules in OCI images. And we can also require that a passing bone scan is done before sending that OCI repo definition off to Flux. And additionally, when Flux reconciles that definition, we can require that the module image be signed by SIGS or Cosign before pulling it. And finally, at deploy time, we can know that only signed and attested modules are being run against our AWS environment. This is our wall garden. So I mentioned it a couple of times now. Let's quickly review what SIGS or Cosign and the rest of the SIGS or tooling is. There's a number of different component parts, but the three critical pieces are Cosign, which does the cryptographic signing of the images and knows how to interact with OCI registries. Fulcio, which is a special purpose certificate authority that can issue short-lived X509 certs in exchange for OpenID Connect tokens. And Recor, which is a transparency log which records that exchange and also assists in the verification of those short-lived certificates later on when doing the signature validation. So a demo. I had a bit of an infrastructure failure earlier, and so I'm going to resort to slideware in my apologies. I think that the images are going to be high contrast enough, and we'll get the gist of it. So here you'll see a Jenkins file that's going through a bunch of shell steps. I hope that some of it is visible. But at the top, we clone a repository. And then below that, we move a TF file within that repository to a main.tf. And then we run checkoff on that main.tf. So it's doing the static analysis on that main.tf with an output of a Cyclone DX JSON format, which is a format for S-bombs. And then we take that S-bom and we cat it out so that we can see it, but then we also push the flux artifact that builds that OCI image with flux to a GitHub OCI registry, which is just under my name. So it's basically taking the contents of that Terraform module and wrapping it in an image and pushing it to the OCI registry. After that, we're down here now. Sorry, guys. After that, we tag that artifact so that we can find it later. And then we run cosine on that image to allow for to have cosine sign the image. And then we again cat the cert that cosine uses the short-lived cert so that we can see the contents of it. And then we take that S-bom that was produced earlier and we attach it through an attestation to that image with cosine as well. And then we validate that the attestation is there inside the OCI registry and output the contents of it so we can take a look at it. So that's basically what we're going to walk through. So here you can see Chekhov doing its static analysis. And it's outputting that Cyclone DX S-bomb format here. If any of you are familiar with S-bombs, it usually lists the contents of the software. Since this is a static analysis vulnerability scanning tool, it also lists any vulnerabilities it finds. And you can see the components that are contained in the module on the right-hand side, so the hashes of the files that are contained as well. Finally, at the bottom, you see those vulnerabilities that I was just mentioning that Chekhov found with the default policies that are built into Chekhov. Of course, you can write your own policies and then run it against that. And those will also be output by Chekhov. This is very useful info, though. Already we have something that's very useful. So now for flux, we create that OCI image like I was mentioning before. And we push it with this artifact command. And we push that to that GitHub repo. And we can tag it with the main tag so we can find it later and reference it. Now we see cosine signing that image. And it's essentially saying that, hey, this image was built on our Jenkins. And we know that this TF module was composed and built on known-good Autodesk Jenkins process. And we can see that the short-lived public certificate that was output starting to kind of be catted down below. And that's the certificate that was issued by Fulcio. And we'll output it to the console. And inside that certificate, if we inspect it, we see that the subject alternative name is the spiffy ID that we expected for this Jenkins node. So we can customize that to meet the needs of our workloads. And it also shows the OIDC discovery URL that we expected. So we know that this particular certificate was issued correctly. Now we can see cosine signing and attaching the SBOM as an attestation in that Cyclone DX format. And this is actually stored on the image inside the OCI registry. So the attestation and the SBOM, the contents of that SBOM, live side by side with that image inside the registry. So this is very handy because you can now query anything that can pull the image can query the contents of those attestations. We can actually see the contents of that SBOM by using that cosine verify tool that I was talking about before. It actually will pull that OCI image down, pull the metadata of the blobs that are associated with it. And then you can look at that attestation right there straight out of the OCI repo. And it is the same SBOM that we showed earlier as expected. So now let's try to use that image. We have this Terraform module stored in an OCI repo. Flux knows how to use that. Let's try to ask Flux to pull that image. And here you can see a Flux OCI registry type, an instance of the OCI registry type, referencing that image that we had pushed earlier. And when we apply that, we expect to see it show that the artifact was correctly pulled and stored above. But oh no, it did not do that. It actually says down here that it failed to verify the signature using the provider cosine keyless. And as it turns out, that's expected. But at least we know that it blocks modules from being used that it doesn't trust. Unfortunately, right now, using private recore and full CO servers with the TF controller doesn't work. This may change. There's already this old issue in the backlog. Hopefully we can get some more attention on that. There's no technical reason why that can't be achieved. We were doing it with crossplane package manager. We can do it here as well. But until then, you can use the public six-store infrastructure. There's public good infrastructure that six-store hosts. And you can actually use your own identity federation as well. You don't have to use the identity providers that they have built in. So you can kind of make it your own. But still, it's going to expose some information. And cosine will tell you about that. It'll expose the information that's in those certificates. And the instructions on how to do that are actually pretty easy to find. You'll see them on the right there. They're actually directly inside the GitHub repository for Fulcio. It's as easy as creating a pull request really to get set up. But there may be another way. We can probably make use of admission controllers, like Caverno, to fill the gap in the meantime. I haven't tried this proof of concept yet. I know that this is actually really small. And unfortunately, I did not get a high contrast version of this. But what you see on the left is a Caverno policy to verify flux OCI. Not just OCI, but I kind of focused in on the OCI repo type. But flux sources can be verified with Caverno. And on the right, you see that Caverno can also do cosine signature validation as well. And you can plug in custom recourse servers for that. So the two of them combined should allow us to achieve that intended goal of enforcing that only signed and tested packages are able to be used by the Flex TF controller. If anyone's a Caverno expert, I'd love to pick your brain afterward. This is something that I'm going to probably be attempting very soon. But there is one thing that bothered me when I was producing that demo. And there is that gap between the step where you actually do that checkup scan. You do that static analysis. And then you attest to that and store that attestation. There's a window there where, depending on how you have your job set up, depending on how your supply chain is configured, someone could possibly spoof that SBOM. And so then you have an image inside of a repository that's supposed to be trusted with a fake vulnerability scan, essentially. But we can do better. And so to help with this, I'll introduce two other tools, Witness and Intoto. Very quickly, these tools help you use attestations to cryptographically prove all of your individual build steps and prove that they were known good. Both Intoto and Witness use the Intoto attestation format. Witness just allows you to do unplanned attestations, or attestations that are more ad hoc. You still are able to do a validation of those attestations later with Witness policies. And so it's kind of like you can build your pipelines and have a more diverse array of pipelines that are not necessarily all planned in advance. And so Witness can also be used in conjunction with another testify-sec project called Archivista, which stores Intoto attestations in a graph database and allows you to query those attestations centrally, which is actually very handy when you're building up attestations from many different places and also validating those attestations in many different places. So looking at that previous example that I gave, you can see that we still have our execution of the scanning and the signing of our S-bombs and attachments there at the top, kind of in the center, except now that we have this observant little Witness owl focused on watching them. And you can see that it's creating these attestations that are stored inside of Archivista, our little squirrel or chipmunk there. And after that, you can see that we have kind of like split out our CD process down below. And now we're kind of using Spinnaker to do our deploys. And this could just be another Jenkins job, but the intention here is that it's kind of out of band and not done as triggered by the CI process. It could be, but again, you need this extra step to check that S-bomb, to check the assets to attestations coming from Witness. And this allows us to introduce a verification step before deploying. So we have that step to check that those attestations are good and true before we do the deploy. And we've actually, you know, Witness hashes all the files as well, like when those commands are run. And it basically takes the attestations that it creates contain all the environment variables and a lot of the things that it can watch and create attestations for. And then we can compare the hashes of the files from the inputs to a command and then the outputs and then the inputs to another command to make sure that what goes in and what comes out matches what you expect on the next step. In addition to tracking the exit code from the commands, which might indicate with a static analysis scan if there's any vulnerabilities. And then we can take that, like for instance, in this scenario, we could take the hash of the S-bomb that's output by check-off and then look at that hash as that file is input into the command from cosine to sign that attestation. And not only that, Witness stores all of that in Archivista, which again is that central store. You can query from any different places. Again, out of band inside of your CD process, it's available to you. So again, just like before, we run that check-off scan for static analysis, except this time we wrap it with Witness, like you can see here. Up at the top there, it's like Witness run, and then you pass in some flags. And it does its handshakes to get its certificates itself. And then down here, you can see that same check-off static analysis scan get kicked off. And then down here, you can see that the attestations that are built up running that command are stored in Archivista at that hash. So that's kind of like the way to look up in Archivista what that attestation is. And if we inspect that attestation, which we just dump here afterwards, we can see that Witness takes notes a lot of things like I was telling you before, environment variables. You can see here starting at the bottom. When you run Witness in debug mode, it actually can be very noisy. So you need to take account for that and make sure that you understand how much data you're going to be pulling in. Also, we can see the commit hash from that current working copy that this is being executed in. And at the bottom, you see on the left, you see the start of the hashes for all the files in the working directory. So it basically walks through all the files in the working directory and hashes all of them and stores those in the attestations as well. On the top right, you see the SHA, again, that's just kind of like continued over to the right-hand side. You see the SHA of that main.tf, that main.tf that we were running the check-off scan on. And we can also see that the check-off command was run in that forked bash process. And you can see how it was executed there as well. And finally, you can see the exit code from it is zero, which you'll notice the or true statement there as a hack because I needed to have Jenkins not fail at that step. And then we can see the hash of the S-bomb, right, that's coming out of that check-off scan in the outputs of that attestation. You'll see that there at the top. So this is still from that first command, the first run of witness with check-off. And you can see that the sbomb.checkoff.json file has a hash of that, right? And we can also see the next step, which I'll show you shortly, the S-bomb, sorry, the attestation from witness for the cosine step as an input, you can see that same hash, right? And so we can basically, the cryptographically match the output from one command is the input from the other command. And this means we can prove that that sbomb that was attached to that image is exactly what it should be and no one could have attached some sort of falsified one. Finally, we can see that if we query our Covista instance, which is a running a GraphQL database, we can see that the metadata for that attestation, which we can then use to traverse the graph and find out all the information about all the bits and pieces that we saw in those attestations before. And so any tools that can access this GraphQL database can query it and then obviously you can build up policy engines based on that. So let's come back up to the surface a little bit. That was a bit of a deep dive. Let's close out with some general principles for securing our ISE. First is definitely centralize the execution of your ISE. You need to have a funnel or a choke point just like in firewalling, that's probably the most important step to getting a handle on things. Second, know your ISE sources, right? Limit them in whatever way that you can, even if you have to put in egress firewall rules to stop people from pulling from the open internet. And third, sign and validate your sources using tools like Cosign. It's a way to make sure that nothing's been tampered with throughout your build process. It's the same thing with other software supply chain security, the same rules apply here. And fourth, scan your ISE with static analysis tools. There's lots of open source ones out there, check out PF scan to name a few. And then finally, run your ISE release privilege. This one's more difficult, so that's why I leave it to last. A lot of times you'll find that folks will have, even if you have a central funnel, they'll basically be running it with a user that has all power over their cloud environment. It's tough, but you really shouldn't do that. And we all know that. And then finally, oh yeah, that cloud provider should have the facilities to be able to help you with that. You can see the cloud trail and then work back from there. With that, I'd like to thank you all for attending. I know that was highly technical. If you like this talk and like to see more, I have a lot more material that would go into this and how we can expand on this in regards to how to include Spiffy Spire and machine identity so that you can get keyless ephemeral ambient tokens. And that's kind of what you saw behind the scenes there. That's how that Jenkins node was able to achieve the authentication. Thank you. So I think we have about eight minutes for questions if anyone's interested. If not, we can, oh God, you wanna come get the microphone? I can repeat it if you can. So particularly with Kyverno, and I actually, I'm more familiar with Gatekeeper, particularly with Kyverno, it has out-of-box facilities to do cosine signature validation on OCI images. And that's basically a way to kind of handle that for all different tooling that you might have running on your Kubernetes cluster. And additionally, as another sweet spot, it has the ability to scan for flux resource types or repo types, right? So you can possibly combine the two and have the ability to say, flux, you're not allowed to use anything that hasn't been signed by cosine. Yeah, I mean, I know that my security team has promised this for a long time and I know that we have some implementations that are doing that. You know, it's an arms race when you're doing application allow listing, right? Or sorry, blocking access rather than application allow listing. So I think that what they've investigated in there, I know that there's, I'm not sure about any open source tools, although it's probably right for the picking. But I know that they have some closer solutions that we pay for that will monitor that cloud trail in like kind of observe only mode. And then slowly over time, it'll ratchet down any privileges that it doesn't see and you can kind of set the window. So if it doesn't see anyone executing a particular API call in 30 days, then don't allow those users to do that anymore. Right, yeah. Yeah, and unfortunately, you know, all of those tools are pretty fallible. I mean, there's gonna be both, you know, hits and misses. Yeah, go ahead. Beautiful, yeah, absolutely. And is it built in to be able to revoke them as well? These are stale, right? Oh, that's cool, yeah. Right, right, right, right. You gotta go through and pick through each one of them across all of your policies and do it yourself, yeah. Yeah, that's tough. Yeah, no problem. Yeah, so there is that recording from last year from Kupka North America 22 that Jason Hall and I did. And it actually goes into great detail about why crossplane makes it a little bit easier than Terraform, just by virtue of it being already being required to be centralized is one of the biggest pieces of that. But additionally, it has a package manager built into it that we can hook into some of those events like I wanna pull this and basically say, well before you pull it, check for the signature and so on and so forth. So that's basically where we injected the cosine validation, right? We put it into that package manager. And so that plus the ability to apply Kubernetes RBAC across the crossplane types, we can then say you're not allowed to use these things. In addition, you're not allowed to run anything that's not signed. So it's like a closed ecosystem. Yeah. Yeah. I mean, again, there's no reason why that can't be achieved with Terraform. Terraform unfortunately doesn't have some of these facilities built in to do things like check signatures on modules. It does have that first execution hash that it stores in your state file for providers. That's helpful. But again, how often do we ignore logging in the SSH message and then, it's not the host you expect? Anyone else? Thanks for attending guys. This is a bigger audience than I expected this late in the afternoon. It's beautiful outside. We should all get out there. Thank you.