 Yeah, okay. Where's the staff? Okay. All right, hi, everyone. My name is Brandon, and I'm here with my co-speaker, Puff. And today we're gonna talk to you a little bit more about Tecton build pipeline and how we're achieving cell-cell level three with that, with the help of Spiffy and Spire. So to start off, I think most of you have seen this slide, like, for a hundred times. So I'm not gonna spend too much time. They didn't really want you to see this slide anymore. So Salsa is the supply chain levels for artifacts. Basically, the idea is how do you supply the artifacts in your supply chain? The relevant parts of Salsa in this talk, since most of you probably are familiar with concept like that since we're at Security Supply Chain Con, is the Salsa level. So Salsa levels go from right now, Salsa level one to Salsa level four. And you have level one, which is like your build processes must be fully scripted, automated, and generate provenance. All the way to level four, where you wanna enforce that you have hermiticity, you have reproducibility, and a whole lot more. And today what we're gonna do is we're gonna try and target and see how we can achieve Salsa level three with Tecton. And this means being a Salsa level three builder focuses really on guaranteeing the integrity of the provenance, making sure that it cannot be falsified. So for the uninitiated, the part is Tecton. Tecton is an open source built pipeline that runs on top of Kubernetes. So what it does is it uses Kubernetes as an orchestration platform for all the build steps that run as containers and ports in Kubernetes. But what it essentially is really is, similar to if you could have actions, Travis CI, also called CI, it is a build tool that lets you define what your build steps are and to be able to generate a build. So let's talk a little bit about Tecton. And for this entire presentation, we're gonna use the minimal example. So Tecton lets you define multiple types of build constructs. The smallest of those constructs is something called a task run, which you can imagine is just a series of steps. So imagine GitHub actions, single, you have just a bunch of series of steps, check out, build, generate provenance, something like that. So what happens is in the flow of Tecton, a user first generate is a task run and what happens then is two things. Now the Tecton pipeline controller detects that the task run is created and so what it does is, okay, let me go down the list. I'm looking at the first part of the build step. I'm gonna create a pod to run the build step and at the same time, I'm going to update this object, the task run object to say that, okay, I started the first build step, it's running now and what happens is after the pod is run, the controller then takes the results of the pod execution and then updates the task run object, right, because we need to know what are the certain outputs, what are the information that came out of the build step that we update the provenance fields and so what happens is that this cycle continues and so it does this until all the build steps are complete. At that point, it marks the task run is completed and that piece of metadata can then be consumed. So another component that we wanna introduce is something called Tecton chains and really the main role of chains here is to be an observer of the Tecton pipelines. We wanna be able to generate a signature or attestation of what happened in the pipeline. So basically any artifacts that are generated within the Tecton pipeline, the process will be observed by chains so chains can then create an attestation to say that this particular artifact was built through this pipeline and these were all the steps that happened, these were the outputs that happened and that's all part of being able to identify and really audit the build process of a particular artifact so that we can trust it. So just with Tecton pipelines and Tecton chains today, what we see that we can actually easily get to cell level two. As a builder, it meets all the build requirements and provenance requirements for cell level two, but we see that we are missing one important requirement that will get us to cell level three and this is the only one that's missing which is non-falsive viability for provenance. So as a quick recap, what exactly is non-falsive viability? So non-falsive viability is really about the built metadata that is produced by the process should not be able to be modified by a malicious actor such that the sign provenance is not accurately reflecting the build, right? So in the cell cell three definition, this is broken down to three different points. The first two talks about how the key is stored and how the key is managed. So in this case, you know, a signing key must be stored in a secure key management system. The provenance signing key must not be accessible to the built environment, make sure it's isolated. And the great thing here is that Tecton chain kind of handles these two cases already by having the signing key be in chains and not in the pipeline. And in addition, you can configure Tecton chains to also use a remote signing service such as Vault. Where it gets tricky is this last part. So the last point in this definition says that every field in the province must be generated or verified by the built service in a control plane. And essentially what this means that is if we look at the cell cell provenance document that describes the built, we should be able to say that every field in it was not able to be maliciously modified by a bad actor. And in particular, this is a difficult problem in Tecton for a couple of reasons. The provenance metadata fields are difficult to lockdown and this is because of the way that users interact with Tecton as well as how Tecton usually is deployed and provided as a service. So for example, we saw in the initial flow that Tecton users create the task-run object to initiate a build, right? This is part of the Kubernetes API is part of the mechanics. And what we see is that they have access directly to these objects because they need to create them and modify them. By the same time, the controller is the one that's adding all this metadata to it. In addition, if we really wanted to kind of push it a little bit further and talk about deployment models, usually Kubernetes classes are managed by different entities. So if we wanna really think about a more advanced threat model where Kubernetes admins possibly have access to the API server, how can we lock them down? And this is an example of the task-run object. So you can see if you're familiar with Kubernetes, basically, there are multiple parts of the object that is a specification where you determine what do I want this task-run to be. By the same time, you have this section called status which talks about what happened and the controller updating the information in the field today. And so you can see there are multiple steps here. You have one part that says the build succeeded. You can imagine a malicious actor taking a build that failed and making it such that it succeeded and changing the hashes. Another thing that we see here is at each build step, we have outputs that talk a little bit about the artifacts hash or information about the artifact that is being produced. Now the important thing here is if a malicious actor can then go in and modify the hash, basically they can create provenance for another artifact that wasn't actually being built. So what we can see here is that the task-run object becomes kind of a main attack point for malicious actors. And basically anywhere around the entire process of the build, where there's a reconciliation, wherever you run a new pod, you wanna update the status of it, a malicious actor can come in and modify these fields. So the question now is how do we solve not for survivability? And any great solution comes with a tree step model. So the first thing we have to do is we wanna determine who is able or which entities are able to modify these particular fields and we wanna restrict access to them. So we first create a testable truss of computing base. So in this case, the only entities that should be able to modify the provenance fields are the tecton controllers themselves and very limited for each individual build step, the outputs of their build step. Step two of this is to enforce integrity of the fields by signing them with the truss of computing base. So the components in the truss of computing base would sign the fields that they produce such that and verify that every time they consume them. So this means that if a malicious actor came in to modify something, they will be able to detect it because verification will fail. And last but not least, all this is gonna be consumed by tecton chains to then generate the provenance. So tecton chains will use the certificate authority of the truss of computing base to be able to verify all the fields were not tampered with. So what we end up now is that because the task run is locked down, the areas in which we saw malicious actors trying to modify the task run is not prevented. Cool. So now we talked about what we should do. Let's talk a little bit of how we will do it. And so what we'll be using is the project Spiffy and Spire. So Spiffy and Spire at a very high level is a zero trust workload identity specification and framework. But buzzwords aside, like what does it really mean for us? So the first part of this is that Spiffy and Spire first provides attestation, strong attestation to create the truss of computing base. So in Spiffy and Spire you can say that I wanna provide a workload identity for a particular container or port in Kubernetes and say that this container has to have this particular, is running this particular image or has this particular ID and goes all the way from the workload itself as well as the infrastructure is running on. So you can say that I wanna make sure that this workload has this identity only if it's running on a particular node in my cloud or a particular node in my infrastructure and it's running a particular image or has a particular ID. So what Spiffy and Spire does then is after a workload is being run is provided with ephemeral signing keys and certificates tied to those attestations. So we can then use these signing keys to then sign the information that the provenance metadata and fields that are generated throughout this entire process. So in this case, build steps would be able to sign the output of their build step with their own key and the TecTon controller can modify and sign the task run status fields as well. So now I'm gonna pass it on to Paf that will go through the architecture and give a demo. All right, so let's talk about the architecture. So starting off, like we would wanna Spire server instantiated, right? So Spire server, right? Because it could be attacked by itself so we want it to be in a separate protected Kubernetes cluster. So we see that in the architecture that it's separate from where we deploy the pipeline controller as well as the change controller. So it separated out. At the same time, when Spire server is instantiated we also wanna bootload some of the things in there. So for example, we need to know that we have to do node attestations. So the node attestation has to go in the registration entry has to be added to the Spire server as well as the TecTon pipelines controller and the change controller has to be pre-registered into the Spire server so that when they all instantiate and everything starts communicating it has the proper connection so that it can authenticate, get a proper estimate and do communication back and forth between the Spire server. So we'll see that coming up. So second piece is the actual Spire agent. So now the Spire agent itself is gonna be running in the Kubernetes cluster along with the TecTon pipeline controller and TecTon change controller. We're also adding in CSI which is the container storage interface driver. So what this allows us to do is that in order for the TecTon pipeline controller and the TecTon change controller to communicate to the Spire agent it's gonna use this driver instead of a host path. So it makes it a little bit more secure. What happened here, as you can see there's a request plus cache SFIDs step right there. So what's happening there is basically the Spire agent authenticates. It validates against the Spire server so it's a known entity and it requests and caches all the SFIDs that it needs in order to perform its actions. So second piece now is now we're gonna have a connection like I said, so this is gonna be the CSI driver. We're gonna utilize the CSI driver to connect with the TecTon pipeline controller. So there'll be multiple connections happening here. So first is the Spire server connection as well as a workload connection. And I'll go into more detail exactly why there's two separate connections here but bear that in mind. And then the third piece here is the TecTon chains. So that one only has one connection which is the Spire workload connection. It doesn't need to communicate with the Spire server and I'll get to that in the later steps. But basically all the change controller really needs to do is able to get a trust bundle later on down the line in order to do verification. So that's why it needs the workload connection. So let's talk about the actual flow. So how is all this gonna work? So now that we talked about the architecture how everything is communicating how everything is connected how is the actual salsa level three gonna be actually how is that actually obtained? So starting off right we just have the TecTon chains controller TecTon pipelines controller Spire server running separately and the Spire agent. So we instantiate a task run or pipeline run we instantiate something and it kicks off the TecTon pipelines controller will kick off and create a task run pod at the same time it's gonna do a registration to the Spire server for that specific pod. So it's gonna use specific selectors. So it uses a selector UUID in this case it's gonna use a UUID of the task run pod as well as the name of the task run pod and put that registration entry into the Spire server. So when the task run pod doesn't add a station to the Spire agent now the Spire agent knows about this it can attest saying like yes you are what you say you are based on the UUID based on the name and then you can get its ESFID which contains a certain signing key and such. So this is happening simultaneously so every time a new task run pod is getting instantiated registration entry is getting added to the Spire server so both those actions happen simultaneously and once the task run pod is completed or has finished doing whatever it needs to do then the registration entry in the Spire server will expire and it will disappear and I'll show you through this happening in the demo coming up. So like Brandon was saying before the task run object right so now we want to secure the different aspects of it right so we want to secure the results and we also want to secure that the task run status right all the fields associated with the task run are not being modified so the first thing we need to do is actually hash and sign the results so, oh, lost the screen there hopefully comes back, there we go. So what's happening here so we'll start at the results phase right there so hash and sign results so what's happening in that step is as the task run pod is progressing and it's creating results and whatever results are getting passed into the task run object it's gonna hash those entries all those results and sign it so it's actually gonna use the signing key of the task run pod so you can see right there's a test post getSWID so it's gonna use that specific SWID and it's gonna use the signing key from there to sign the actual results so that they can be modified further on down the line and it's gonna pass that to the task run object and at the top you see there is the reconcile and verify modify hash sign task run so what that's happening there is that each time the task run object is getting modified right so the tecton pipeline controller is continuously validating and continuously updating the status as the task run object is running so what is happening there is that each time it updates it's gonna hash the status the whole status object of the task run and also give a signature and a cert associated with it so there that signing key is gonna be the tecton pipeline controller so that's gonna be there's two different signing keys associated here right so there's the task run pod signing key and then there's a tecton pipeline controller signing key so there's two separate signing keys that are being used right because only the task run pod should be the one creating results and the task run object itself should only be modified by the tecton pipeline controller so that's why there's two different signing keys so once all that finishes what's happening is the tecton pipeline controller will actually verify that the results are all valid so I forgot to mention is that as the results are getting created at the end it's gonna create a results manifest so result manifest is basically all right I expect these results to come back from the specific task so if there's anything missing or anything added then you'll know about it so it'll catch that so it's gonna validate that the results manifest is valid based on the results that it specs back it's gonna check to see that the task runs pod the certificate is valid based on based on getting a trust bundle from the spire server it's gonna validate the certificate is still valid and then it's all check it's gonna validate the signatures are valid and then of course it hashes so if everything is valid on the results side it's gonna create a condition on the task run object that's gonna say yes the all the results are all verified and valid and I'll show you that in coming up in the demo so after that piece it's gonna pass on go to the tecton change controller again the change controller is gonna verify the results so it's gonna say is that condition there is looking for that task run condition is the condition say all the results are verified if they are verified then good to go the other piece is gonna check is gonna verify the task run right so that's where that get the trust bundle pieces coming in is basically getting the trust bundle from the spire server again to make sure that the actual tecton pipeline controller cert now is valid right so similar steps we followed for the results we're kinda following for the controller itself to make sure that the status of the task run task run status is still valid after being passed to the tecton controller so nothing interfered between when the two things are communicating right when the task run finished and then when the tecton change controller picks it up and verifies it so if everything passes, everything looks good then the change controller is going to do its normal process it's gonna do the created signature sign it, generate some attestations and then attach that to the task run object or to OCI wherever you want it to store it so then you can see from this diagram right so the slide that Brandon showed before we had those three vulnerable spots around the task run now we're using spiffy spire to hash and sign the results so we can validate the results we can validate the task run object every time it's being updated to make sure that only the controller is the one making any updates and finally at the end where the tecton change is gonna validate to make sure that all the results are valid as well as the task run object itself is valid when it reaches the finish line so let's move on to the demo and hopefully this hopefully we can see this so I'm going to so first off let me just show you what's running in my cluster so here is my cluster right here so for demo purposes I am just using my spire server is instantiated in the same cluster but in reality you would want this in a separate cluster more protected Kubernetes cluster so there's the spire server, the spire agent here is the tecton change controller that's the other piece this is the dashboard so this is basically right here this is just like a visualization piece this is not necessary this is for the demo you can use it to visualize the task runs and see all the objects changing and stuff like that so it's just a UI piece for the tecton here is the tecton pipelines controller and then here's the other piece of the tecton pipelines which is the webhook so those are the pieces that we're going to be utilizing today so I want to show here real quick so here is that dashboard so dashboard basically you can see the different pipelines so right now there's nothing I haven't instantiated anything so everything is empty so you'll see the pipelines you can instantiate pipelines from the tecton dashboard you can have tasks and task runs and everything you can instantiate it and as the demo is progressing you'll see, I'll show you exactly how the task run and objects how they're all changing so it makes it a lot easier to visualize the other piece here is the this is basically another again not necessary for the purposes of salsa level three but basically it's just a modified version of the spire server so I'm just using a different image here but all that's doing is it's going to create it's giving you a UI into the spire server so you can actually see all the registration entries and all that kind of stuff so for example the first entry here is a node entry so that's a node attestation that's occurring for the spire agent for it to communicate properly and then you can see the tectons this is the tecton pipelines controller here that's that entry so I mentioned before that you bootstrap some of these entries in from the beginning so those are the entries that are coming in here so the tecton pipelines controller and tecton change controller so those are all in here so that they're validated so I'm going to be doing two different examples so the first thing I'm going to do is we're going to run like a built-packs example so it's basically going to run through a quick two task pipeline basically and we can see how exactly it behaves in a valid stance so if nothing interferes with it if nothing goes wrong you can see exactly what at the endpoint would look like and then the other demo is basically going to be alright so I have a long running task I'm going to be a malicious admin on my custer and I'm going to try injecting something into my task run as it's running and to see if the you know if salsa level 3 spire all that kind of stuff catches that and invalidates the task run so that's what we're going to be showing off so first off we're going to do the built-packs example so right away you can see in here so a pipeline run has been initiated and there's going to be two different tasks so here's a fetch from grit and it's going to build the trusted the build from untrusted is not going to get run during this step so right away the first one finished but before we look at that I want to show you this so the build trusted is getting started so let me maximize this a little bit so build trusted has you know for bmq this is the pod itself right so I want to go into the registration entries so I want to see that entry come up here in here so I scroll down you can see right there there's that for bmq so like I was saying as a pod when the pod gets created by the tecton pipeline controller it's going to create an entry in the spire server and you can see the selectors right here so it's going to be using the pod's uuid so when the spire agent when the spire agent communicates that pod to validate it it's going to check for all these things so it checks for the uuid it's going to check for the name to see if that is valid if it's valid if it's valid then it's going to then it's going to allow it to get the svid and use it for signing and such so coming back here so looking at this real quick so the first one to finish right away so it created two different results right it created a commit result and a commit url so more interestingly we want to look at okay what is the spire the spire backends are doing right what is where is all the verification where is the hashing and signatures and all that kind of stuff coming into play so right away let's look at the actual so if you scroll down here so in the results actually so in the results you don't see anything right the results are okay the result expected the task itself expected a commit and a url to come back and that's what came back but behind the scenes in the actual task run object right you can you see a lot more things going on so right here you see there's a termination message the termination message for that specific container right it contains a result manifest so does that result manifest is basically what I was telling you like okay it's expecting a commit and a url to come back right if those two things don't come back then something went wrong right so it's validating that it's also getting it's also his own signature so right so nobody else can modify the actual task run the result manifest itself here is the actual results for the the commit value right here and it's going to be there's a signature associated right here so commit sig you can see down here and then it also appends the actual certificate so you can see the ending of the certificate here so that certificate is basically the task run objects certificate because it's going to utilize this later on and it's going to utilize that actually up here so that's what I was mentioning when that when the actual when the task run completes right it's going to go in and verify okay are all the objects are all the results that came back all valid is a cert valid based on a trust bundle is a result manifest like am I expecting all the commit in url everything to come back properly and then it's going to check the signatures and check the hashes and everything right so if all that passes is going to create a condition up there that's going to be like yes successfully verified everything all the results of good so that's the result piece and then if you scroll up now so now we're going to talk about the the status piece right so this is where the tecton pipeline controller is now monitoring and signing the actual status so here is the status has itself so as the task run is running if you actually catch it if you look at it look at it closely you will see the hash changing as the task run object is getting updated right so the only person or the only object that should be modifying a task run is the tecton pipeline controller at any point it does not see that right if any point it sees some unauthorized entry right if it sees the hash like the status below does not match what is it had recorded and it's going to invalidate the whole task run and now now you can't trust it because it's it's been it's been corrupted by some outside force that he had no control over so this is the as it says here is that's the controllers certificate right and down there is the signature right here of that piece the same thing on the side this one created this one created an actual image so it output an image URL and a digest right and if you go closer to look into it we can see again this is all valid there's no if and I'll show you this in the other example but basically if it was invalid you would have another entry in here saying that okay no longer is this valid spire you know there's some kind of tampering going on so that it's no longer while a valid is going to add another annotation here at the same time you know that this is all successful let's say for example if it was not successful or if it you know if it crashes and this this actual condition message would change and it'd be like okay this is not valid the sign results are not verified so don't don't proceed forward so going so here is the actual task run object so this is the object for the actual the built trusted here so everything completed properly so basically what I showed you is the same thing so here is the output so here's that image digest value here is the actual the signatures associated with it you can see the actual s which so here's that actual certificate all in there so all the things is going to verify that and then if you scroll up here right this is again the condition says successful it's up here actually and then here is the other piece where status itself is successful so now chains comes into play right chains is going to come in and change is going to validate okay is this it's going to look for the specific condition and it's going to be like okay this condition is there so that that I can trust that so the results are verified so I'm continuing forward then it's going to check to see if the status is valid so it's going to run the checks again to see if the controller so this s of it is that valid based on the trust bundle is the status valid is everything is everything is a signature valid based on the certificate is the hash valid everything looks good everything looks if it agrees and everything looks proper then it's going to come up here and you can see that it did the signing so here the tecton chains did the signature so it signed it true and in this case because I have a configure so that it stores on the task run object itself right you can store it in in OCI and other storage back ends but in this case I'm storing it on the object so you can see the signature so it's being signed here and then you can see the payload which is the attestation being generated so those those two entries are there okay so next we're going to look at is we're going to make it so that we can we'll try breaking the system right so I'm going to be malicious and break into the my my own task run as it's running right so what I have is I scripted out everything but basically if you look at this this is a very short task run a task object basically all it's doing is like sleeping for 20 minutes or 20 seconds so that it allows allows me opportunity to go break into it so that's all it's doing it's a very simple task but you can think about more complicated tasks and it'll be the same kind of concept so what this what this is going to do is that as it runs and it's going to change so you can see right here I'm using an image you want to image right so what's going to happen is I'm going to be malicious the code is going to like it the script is all of my automatically going to go in there and patch that image and make it not you want to make you make it something else so that now is no longer valid so we could see that the spire you know the task runs that I should come back as this is no longer valid right I can't trust this anymore so we're going to do the long wing task so if I go back here and maximize this changes that so we can see this running so this is the non-falsifiable which is that the same thing and right away actually so you can see in here the image changed right the image changed to not Ubuntu right which is that's not what I specified in the first place right so something went wrong or someone someone malicious came in and modified my image as my task run was going through so right away it failed so the conditions because the image wasn't even valid right so even the results that I was expecting didn't even come back right so right here you can see another another condition message saying like well none of the results actually came back so it just it was just waiting waiting for the results to come back basically so that's not even a valid condition so tecton change is going to look at this being like well the results never came back are not valid so that's that's already one red flag and then the second red flag is up here saying tecton not verified so again it checked this so as a pipeline controller was reconciling it's a state of the text the actual task run object it saw that the hash is no longer matched so now it's going to be like well this is no longer valid so I'm going to mark this as not valid so what happens now is that if I so now the tecton chains will come take a look at this right so if I if I get the task runs I can describe that specific task run so it's going to be in this name space grab this so right here you can see let me maximize this a little bit you can see that the task run itself is already complaining that the status is not verified it's saying that the results results are also not verified there's a verification failure on both pieces so now we can go up here and see that because everything failed tecton chains did not do the signing so there's no signature associated with it there's no there's no attestation associated with it because because no long it's the task run itself is no longer valid right so that is a demo so going back to the slides so for the future we're looking so right now this this work actually is a PR that's been created in both the pipeline and chains repositories so there's two pipeline ones so there's a four seven five nine and the eight two eight ten eight four two four eight two eight sorry so the first one is basically the sign results and the second one is for the signed task run so those are two pipeline ones and then the last one is for change which is the verification everything once everything finishes right the chains comes in and verifies so those are two things so this is still there's a little bit of work left to do to get into alpha release but so they're mostly the code is mostly all done so that that's there so you can take a look at it if you want to and then the second piece is in the future we're kind of looking to see if we can extend you know some of the spire signing pieces into other custom resources or other fields and also within actual within tecton chains or tecton pipelines artifacts that are getting passed between tasks right so currently there's no way to validate that an artifact as is getting passed between between the different tasks remains the same right so if there's a way we can do a signing of the actual artifacts as they're getting passed between the task that's what we want to see and that's our next step thank you any questions yeah yeah that's you yeah go ahead oh yeah is it in scope for this to detect a modification like if the worker node or compromised and inside the container just got modified directly without going to Kubernetes APIs so so two parts of it one is like the worker not being compromised and then go into it so so I think the idea behind spire is that we want to do no attestation and something that like right now is kind of like bootstrap attestation of like authentication of the node but something that spire community is like slowly moving towards its continuous attestation I think the idea is that with the right monitoring tools you want to detect certain compromise of these things you're gonna the attestation of the node is going to fail and so then you wouldn't be the secrets wouldn't be released the keys wouldn't be released to the to the workloads so I think that's that's the first part of it or the second part of it is like going into the workload and doing things in that I think this is this is something that still requires being locked down but it's given the current the current facilities provided by Kubernetes to manage RBAC and all these things it's a pretty I think it's something that's pretty simple to kind of just have an emission control to to reject the API so this can be baked into the trust computing base so with spire now our trust computing base we showed was like the controller you can imagine the trust computing base that is covered by the attestations also includes part of the Kubernetes infrastructure as well yeah that's the piece we were talking about future work is like okay can we expand this out to include the Kubernetes space itself maybe maybe do some kind of signing and verification so that you can detect you know unauthorized changes and stuff happening in the Kubernetes space all questions any questions anything else thank you very much awesome thank you