 All right, let's get started. Hello everyone and welcome to the chain of trust towards salsa L3 with tecton trusted artifact. So thanks for joining today. Unfortunately, my colleague, Jerome, couldn't travel today to the conference, so it will be just me delivering the presentation today. Thanks for joining. I know it's one of the last sessions of the day. Hopefully it will be interesting and exciting. So my name is Andrea Frittoli. I'm an open source developer advocate, work for IBM. I'm the chair of the TOC for the Continuous Delivery Foundation. That's a sister foundation to the CNCF and we focus on CICD and I'm also a maintainer of the tecton project and member of the governing board there. And Jerome, who couldn't be here today, she's a senior software engineer at Google and she's also tecton maintainer and member of the governing board. So today we'll start talking about chain of trust in the software development life cycle context. I'll introduce tecton and talk about it and then we'll talk about artifacts for tecton and have a short demo about it and finally we'll discuss what's next and what the future brings in this context. Well, let's start with chain of trust. I mean, I probably don't need to speak, I spend too many words about software development life cycle as this is the SDLC track. But yeah, when thinking about software and software being produced, we usually start from a producer, which is typically a developer or team of developers writing the software. As everyone is talking about AI, it could be maybe some AI system contributing to it. And then the software goes through a series of steps. Typically, it's stored in a software configuration measurement system, then goes through a build process which might bring in a certain number of dependencies. The result is packaged and then the package is what gets the final consumer. And so, chain of trust in this context, it means that we can put a certain level of trust on every single step in this process that I just described. So in all the processes that we have in our software development life cycle and each one of them. So at the end, the consumer can trust that what it's consuming, what it's using, either as a dependency or an application. It's actually what was the intended result by the producer. But what happens is something goes wrong. It's enough that one part of the chain is broken and then the chain of trust falls. And so this picture is taken from the Salsa specification. I will talk more about Salsa in a moment. But I like it because it highlights the threat surface quite well. It starts by separating the initial diagram in three parts. So we have source part, build, phase and the dependencies on the bottom. And then it identifies threats. So things which could go wrong in the build process in every single part of this. For instance, if you think about writing software, it could be that someone is impersonating a developer. Maybe a developer does not have a secure account for GitHub or GitLab and someone manages to get access to the credentials and submit some malicious code on their behalf. Or it could be that the Git repository itself is compromised. So we actually produce the right software but someone manages to get access to it and introduce malicious software there. And so forth. Similarly, in the build phase, the build system itself could be compromised. And this happened actually in the case of the solar wind case. If you have heard about it, where they were producing software but the build system was compromised. So they were actually without realizing building software which was malicious and then used by several of their customers. Then even if the build system is fine, the build system will produce artifacts and store them somewhere. And the package registry or where this software is stored may be compromised and so forth. So there you can see that the threat surface is quite wide. Today we will focus on the build part of it. All right. I promise I will say something more about Salsa. So Salsa stands for supply chain levels for software artifacts. And it's a security framework. So it's a way, it gives you a framework of reference to describe the level of trust that you have in your build process basically or in your entire software development lifecycle. Salsa defines three tracks. One is for build, one is for source and one for dependencies. In fact, the only one that has been implemented by the specification until now is the build one, which is the one that we are looking at today. So how does the framework look like? So for each track, there are several levels. The first level is level zero. Means no requirement at all. So every build system out there is Salsa level zero compliant. That's easy enough. In the level one, we introduce a concept of provenance that shows how the package was built. So it means that we must have some document produced by the build system, which tells us several information like which were the inputs to the build system, but also what steps we used. So what tasks were executed to actually produce the software. And so every build system that produces such provenance information is Salsa level one. In level two, we introduce a concept of hosted build platform. Meaning that if you're building the software on your laptop, that cannot be Salsa level two compliant. So it needs to be a hosted build platform. And the provenance document, so the attestation that we describe in level one must be signed. So we must use a signing system that we trust, of course. And so we can have signed provenance and we can build our software in a hosted platform. And finally, the third level, the final level is level three. And that requires a hardened build platform. A secure build platform that we can trust, basically. Okay, a few words about Tecnon now. Tecnon is a cloud native open source CI CD system or a tool to build CI CD system. It's hosted by the CD foundation, continuous delivery foundation. It's a graduated project there. And it benefits from a nice community, large community of contributors, including IBM, Red Hat, Google, CloudBees, and many, many more companies. There are several adopters ranging from vendors like IBM, CloudBees, Google, Ozone and Red Hat. And we have a number of end users as well, which some large scale implementations of Tecnon like Nubank and many others. We have a larger list of our adopters. I put the QR code there if you want to get the list. But also if you're a user of Tecnon, please submit a pull request to us. We are very keen on knowing our end users. So how does Tecnon work? So Tecnon is based on Kubernetes and it's basically an extension of Kubernetes. You might be familiar with the concept of custom resource definition. In a nutshell, Kubernetes defines a number of resources like pods, deployment and services, but it also allows application to define their own resources that work in a similar way. And you can write controllers for them. And so Tecnon introduces a few resources to model CI CD pipelines. So there is a pipeline that is kind of the larger resource. Or maybe we can start from the inner one. So the smaller piece of reusable definition that you have in Tecnon is the step. And we call it step action. And if you use Tecnon before, you may have not heard of step actions before. This is because it's something that we introduce in one of our latest releases to make steps actually reusable. A sequence of step definition makes a task and a graph of task makes a pipeline. And all these are Tecnon resources, which means they can be reused. So you can buy them within an organization, only in the open source, and share them. You can sign them, secure them, and distribute them. We work with Artifact Hub, for instance, so you can get Tecnon resources from there. On the runtime side, I talked about pipelines and tasks. We have also other resources that are dedicated to the runtime side. So if you want to run a pipeline in Tecnon, you need to create a pipeline run. And then the pipeline run controller will kick in and make sure that the pipeline is actually executed. In terms of Kubernetes resources, steps map to containers in Kubernetes. Pods correspond to Taskruns. So Taskruns correspond to Pod. And we have some magic trick in Tecnon to make sure that the different steps, so the different containers in the pod are executed sequentially, rather than all together, like normally in a pod. And then, right, so, and then pipeline runs and Taskrun include the list of parameters that have been passed. So typically when you start a Taskrun or a pipeline run, you pass some parameters to it, and you obtain some results. So parameters are part of the specification, and then the results go into the status, like other Kubernetes resources. And all this information, so the definition of the task and the pipeline, the definition of the runtime side, so the parameters, the results, and all this information together, then is what you need to actually write a provenance document that is required for Salsa L1 and L2. We have a number of security features already implemented in Tecnon. So if you're security minded, I think it's a good option for you to look at. As I was mentioning, pipeline, task, those kind of resources can be signed by the author. And so the Tecnon controller has the ability to verify the signature when you submit them and refuse to execute anything that is not matching the signature, basically. Another thing, another feature that it's almost finished, is about signing the run side. For that, we are working, we are integrating with PFE, so because on the run side, we actually have a workload running in the Kubernetes cluster. So we want to have a workload identity that we can use to sign the status and the specification of the runtime resources. So I mentioned in the previous slide here that task runs correspond to pod. So if you have multiple tasks in your pipeline run, what happens is that your task might need to share data across them. Maybe one task produces some amount of data, it clones a repository and the next task wants to use that. Because pods don't natively share any storage, so we introduce a concept of workspace. Workspace is typically mapped at runtime to something like a PVC in Kubernetes and it allows task to share data. And I put a red circle around the workspace because that's kind of the weak part of the chain that we're going to look at today because we don't really know what happens in the workspace. So we know that a task writes something in the workspace and the next task takes out of the workspace, but we don't know what these things are and we don't have a way to verify that what we receive is actually what was produced. Before we get to there, I also wanted to mention TectonChains. Tecton is actually a collection of projects and one of them is TectonChains focuses on security. It's again a Kubernetes controller. It watches Tecton runtime resources. And what it does, it basically it's able to detect spatial results which corresponds to artifact being produced. And then it integrates with SixTor to sign those artifacts and then produce attestations. We support various formats including in total. And those attestations can be uploaded even in the SixTor permanent log. Right. I'm going to switch to a quick demo where I wanted to show you the case why we are worried about what happens with the workspace. It's not showing the other screen somehow. Okay. Let me see. All right. I need to move it because it's, sorry, but let me see if I can mirror instead. Yeah. Okay. That's better. All right. I was saying I have a very simple setup with a producer task and a consumer task. So the producer task writes a simple message on the workspace. The consumer task reads from the workspace the message. And you can imagine that this instead of being a message could be an artifact that is going to be embedded in the final artifact delivered to the end user. All right. So this is the pipeline run definition where a trigger, let's see if I have something. So you see some YAML at least. It's always good to see. Right. So you can see there is a producer task here, a consumer task and a workspace that is shared between the two. So this works like every other Kubernetes type of system. So you just kubectl create and then your demo. We have also a nice CLI where you can see producer task executed. He wrote I love open source. I won't try to read the French version. And then the consumer received I love open source so everything looks good. So what I could do, I also have a malicious task here which is look, it knows somehow where the other task is going to write. And it's looking for changes there. So if I run my demo pipeline again and look at the logs, wrong command, sorry. Something different happened this time. So we produced I love open source and we got bonjour instead. But the really worrying thing is that the pipeline run executed correctly. No one knows something wrong happened. So we have no way to detect this. So that means that in this case this artifact will be produced. Chains will pick it up, we'll sign it with a valid certificate, say actually, yeah, this is stamped, this is good. And it end up maybe as a dependency of a much larger system that will do harm to very wide surface. Okay, let me switch back to the presentation. Really to see my notes. I don't need them so I will just do like this. Right, so this was a set up of the first part of the demo with the two tasks. And the build system is compromised by an internal attacker. So someone needs to have access to the build system to do something like this. And that's why we started working on technical artifacts. And so there are a couple of phases that we are introducing this. The first part is really step attestation. So I mentioned before that the kind of smaller unit of execution we have is a step. The step is a container. So we're defining a kind of standard format that step can use to write an attestation. So declare what the inputs that they received and the outputs that they write in. Right. And so and this can be used then by subsequent steps or tasks to verify that what they received is actually what was produced and what was expected. So at this level we are introducing then the attestation format for steps. Also task level attestations which are which is which must be a subset of the step attestations because steps might have a combination of actual artifacts and byproducts. So things that are kind of intermediate things that have been produced along the way before we actually get to the final result. And also what we are adding is a mechanism to pick up this attestation files written by the steps and transform them and put them in the status of the Kubernetes resources. Meaning in your task run status or the pipeline run status. Sorry no. In the task run status you will be able to see this provenance information. And this is how we actually make this available then to the next task that is running. And this is to show some yaml again. This is what it might look like. So as part of the task run we have status results and then we have a list of steps that have been executed. And for each step we have input and outputs. And we specify an optional name, a URI where the artifact is and a digest or potentially a list of digests. All right the second phase is to actually extend the tecton API to introduce a concept of artifact in the API itself. This means that tasks and pipeline will be able to define the artifacts that they will consume and that they will produce upfront as part of their specification. So when I'm writing a task and say actually this task will accept a certain repository as an input and it will produce an OCI image. So I can define that upfront. And the fact that we define that upfront as certain advantages because it means that the tecton controller can do some more automatic things for you. It can automatically inject steps that will help you generate and verify the provenance which we cannot do if we don't know upfront what is a certain task is going to do. Also we envision that we will have user provided steps because you might want to sorry you might want to upload your artifact or store your artifact in your storage of choice. So it could be today we use PVC within the pipeline but you might actually want to store artifacts in an object store, in an OCI registry or some other kind of registry that is a storage system that you have internally that you prefer. So the other reason we are introducing this feature in tecton is to improve the generation of attestation that we do today. I briefly mentioned earlier that today tecton chains relies on special kind of results that tecton produces to identify artifacts. So we want artifacts to actually be a first class resource in tecton so that chains can do a much better job in identifying different type of artifacts and sign them for you and produce attestation that includes all the different steps that were executed and the different artifacts, intermediate artifacts that were produced before the final one. And this allows us to align better with the in total specification for attestation. And something else that we want you to do is that tecton today can produce events that tell you what is happening in your pipeline run, if pipeline is starting, stopping and so forth. And you can collect these events to have kind of an audit trail of what is going on, what happened in your pipeline. Or you can also use these events to trigger logic. For instance if you want to start a new pipeline when the previous one is finished. But because we don't have a concept of artifact yet or we didn't have a concept of artifact yet, we couldn't send events specific to artifacts. There is a specification within the CD foundation called CD events, which defines a standard for events format in the CI CD space. And CD events include artifacts, event specific to artifacts. So CD events community is working with different tools in the CI CD space including artifact registries to support this kind of events. And we want tecton to be able to produce artifacts events as well. So whenever an artifact is produced by tecton we can send an event and you can store this event in your event store. Again for audit trail purposes. Or you can trigger logic based on that. Okay let's switch to the second part of the demo. So the setup is a bit more complicated this time. We still have our producer and consumer task but we have additional steps that are being executed. So the producer, on the producer side we produce our content in the local storage for the pod. And then we have steps that hash that content. And then upload it to the workspace. On the receiving side, before we actually consume the content, we have additional steps that download locally the content that was stored in the workspace and verify that the hash that was produced on the producing side matches. So it recalculates the hash and verifies that it matches. So we use the status of the resources to store the provenance information. So the consumer side can take the expected hash from the status and verify with the hash that has been calculated. Okay so our evil attacker is still running here. But this time I will run a different pipeline. Sorry let's look at the, which is this one where we use the artifact producer and artifact consumer. So these are announced version of the same task. As I was mentioning this task additionally I have the steps to calculate the shas and verify the shas. So you can see the producer is doing the same thing as before. But additionally it's displaying this information with the sha of the artifact that was produced, our messaging in this case. And the consumer phase this time. Right because it verifies it expected this sha which is the one that was produced by the producer. But it computed the sha locally again. And because the message was hacked now the sha doesn't match anymore. So the consumer can be aware that this has been tampered with. So, um, so what's next? So we are working on implementing this feature on the Tecno site. And I was, um, mentioned earlier, um, we expect to have steps that can do this work of uploading, downloading artifacts, verifying, computing and verifying the shas. Hopefully contributed, um, by Tecno users as well. As there may be many different type of artifacts and each type of artifacts may have its own, uh, sha algorithm that is used to calculate. And so hopefully we'll be able to build a catalog of reusable steps that the community can benefit from. We're working on extending the concept of artifacts from task to pipeline. And then there are, uh, features like, um, hermetic execution that we started working on, uh, before but we had the problem of knowing, okay, when can we cut network access and restore it? And with the, um, definition of artifacts, we, we have a clear, um, definition of what is inputs that are coming into the pipeline and what are output that needs, output that needs to be, uh, pushed to some external system. So with what we can do when enabling hermetic execution, we can make sure that all the input artifacts are available locally. Run our build without network access, making sure that no unwanted dependencies are brought in and then resume for the production of artifacts. I mentioned already support of related standards, like SPIFE in Toto and CD events. Other things that the community has been working on, um, uh, things like machine learning model transparency. So the team, uh, on Google side, especially, um, has been looking at using this kind of technology to produce attestation, uh, for machine learning models. So you can imagine using tecton pipelines for building, uh, for running machine learning type of pipelines. And you can do this natively, uh, directly in tecton or you can do through tools like Kubeflow. So this is supported today. Um, Kubeflow you can define your pipelines, um, in Python format and then they can compile to different languages and supported today is Argo workflow and tecton. So you can define them in a language that is familiar with, uh, data scientists and then execute them in tecton. And this, uh, type of functionality, uh, with artifacts and provenance will allow to have more transparency in how the models were produced by your pipeline because you can see step by step what was produced, um, by the pipeline. Another functionality that we are looking forward to, uh, implement out of this is res, resumable pipelines. Because today we kind of miss the exact state in between tasks. It's hard to implement resumable pipelines, but especially when you start looking at data pipelines that may be very expensive from a computational point of view, it makes sense if a pipeline was interrupted at some point in time to be able to resume the execution so that you don't have to recompute everything that was done already. And adding, uh, a track of all the artifact, intermediate artifact within a pipeline will, uh, enable us to do that. Alright, just to conclude, um, we are implementing this feature, uh, with tecton artifacts and will soon have the first release, uh, with the first phase implemented. And so we are looking for input and feedback from the community of course. So if you, uh, are looking at the chain of trust in your build system today and you have specific use cases or problem that you want to address, feel free to reach out to us on the tecton slack or let us know. I put here, uh, a QR code with a link to the design that we are using. So we use a mechanism called TEP, TEP, tecton enhancement proposal similar to the Kubernetes one, uh, to the caps. So if you have feedback to that, that would be, uh, very welcome. And just in a nutshell, uh, to summarize, uh, Salsa defines guidelines for supply chain security with multiple levels. Today we focus on level three for build, which focuses on, uh, avoid tampering during the build. And for that, um, we look at that in the tecton, which is a powerful CICD framework, which already implements many salsa related features. Um, and we introduce tecton artifacts to extend the existing salsa features to bring tecton towards the level three requirements for build and also introduce better provenance generation. Okay. I hope this was, uh, useful to you. Thanks again for attending. And we might have one minute left for, for questions if you have any. Oh, and the QR code is a link to the slides. They are in, in, in SCAD anyways. I thank you very much for this presentation. I did not know much about tecton, but, uh, now I'm interested in, uh, just quickly could you tell me if, uh, I can improve with tecton my current, uh, go release or pipeline, which is currently running either on GitHub and GitLab because I, uh, tested both of them. So I'm already computing S-bombs with the go release of framework. I'm also relying on cosine to sign it, but I'm yet to introduce attestation. So can tecton help me? Uh, does it work on GitLab and GitHub? Do I need, uh, custom runners? Right. Um, that's a great question. So tecton is, is very popular for, um, organizations and companies that build their internal, um, kind of platform, if you will, the system, uh, because of this reusability and scalability features. We started working on a GitHub, um, action to make it, uh, easier to ramp up with like GitHub integration. But, yeah, right now it still needs some, you know, extra work and scaffolding to, to bring it up. So you don't have like a click and run tecton for, for GitHub. If you have everything in place, you don't have to, you know, switch to another system if everything that works for you. But on tecton side, if you're building a larger system, we have a very, uh, strong focus on security features, being security minded. So if that's something that is interesting and you want to build, to have a system that allows you to scale, uh, for large teams, many, many pipeline execution, that might be something to look into. So, uh, question over here. Um, so when you're generating attestations for something like an ML pipeline, um, uh, that process is gonna be non-deterministic. If you were to go and train a model or do something else like this again, you'd get a different result. So what can you actually use the attestation for? Right, um, that's a great question. Um, I'd be honest, I didn't work on that part, that was mostly the, the Google team, uh, that worked on, that started looking at that. Um, so, um, yeah, I don't have a great answer for that. But I will relay your question. There's one more question. Hi. Um, hi. Yeah, so about the workspaces, you, you, uh, if I followed correctly, you had two task runs side by side. And, um, one of the task runs was interfering with information in the workspace. Um, and then like messing with the other task run, is that right? Yeah. That's right. Um, I guess this only works in a situation where your workspace is declared as like a PVC type. Like if it's an empty dir type and therefore the workspace is isolated to that single task run alone, it's not accessible by the other task run. Is that right? That's right. The only thing is that when you use empty dir, we use empty dir a lot like local storage for the task. But then it's hard to make it available to other tasks. So if you have a following task consuming that, uh, you need to have some kind of share storage that can be a PVC or it can be, you know, you can have a local object storage that you set up where you share things. But in any case, you might still have an attacker that is gain access to, to that kind of share storage. So the key thing is to be able to, to use a different channel like the status to share the expected Shah so that you can verify on the receiving side. Okay. But if, if the PVC or the workspace was like specific just to that task run and I was like vending a new PVC for every task run, that would be fine, right? That would resolve the problem. But I guess there's issues with requesting a PVC like for a task run as it might introduce latency. Um, yeah, but even if you had a PVC dedicated to every task or I mean, you still wouldn't be able to share data across the different tasks because I meant like a workspace in the PVC per task run. Okay. Like for that single pipeline. Yeah. But yeah. Yeah. So you can, you can secure it as much as possible. You can do tricks to, to try and secure the specific of the storage system that you use for sure. So that that's a great point and a question also I've had on this. But I think having the ability to verify, um, it's really useful. It's also helpful for the provenance information because then you get this intermediate, uh, shots. All right. I think we are time. Thanks again.