 Thank you for coming. I'm Jose. I'm an open source engineer at Aqua Security. This talk was created with Ikai. He runs the open source team at Aqua. Today I'll be presenting by myself just some context. A couple years ago, some very meaningful attacks opens everyone's eyes for supply chain security, right? They could have attacked, for example. An attacker was able to modify their bash script, the installer. So whenever you were installing CodeCop, this would collect sensitive information and send to a remote IP, right? So if you were using GitHub Actions, for example, when you were installing the CodeCop action, it would collect your GitHub token and send to this remote IP. Like all others, we also want to explore a solution for this, and we start a POC for GitHub action specifically. But our perspective were like runtime security, right? We want to try to bring the word of runtime security to attempt to fix this problem. And that's what I want to talk to you like in this talk, right? The POC, we did the attempt. So we did the wire implemented. And a few things we learned along the way. Why runtime security? Because that's what we do, right? I work on the Tracy project. It's an open source project for runtime security and forensics. Tracy will use EBPF to collect events on your kernel, over 500 of those events, right? Like so it's a lot of events. And those events can be as simple as a syscall, so you can say, I want to know whenever the right syscall happens, the host, or I want to know if a specific process is doing a right syscall, right? But there are also like complex events, which we call like signature events, right? Those complex events, they detect malicious activity. So you can ask Tracy to trigger an event if a file is attacked, it's happening on the host, or a reverse shell, right? When you have a hammer, everything looks like a nail. And Tracy's our hammer. So that's what we want to try to use to fix this problem of supply chain. The first solution we attempt was pretty basic. We got Tracy the way it was with signature events. We create a GitHub action, right? We added Tracy to it. So whenever someone would use it, Tracy would boot in the background and see if any malicious activity was happening. It was actually good to see this working because we didn't know at first if we would be able to run a BPF in GitHub actions. And it did work, was a good first step, but first lesson comes right after the release of this POC, right? Which is production time is different than build time, right? There are different ideas there. And specifically at the time, Tracy had like a set of signatures very specific for production. So for example, the guaranteeing you have a immutable infrastructure. In production, you don't create artifacts. You already have your container image, your binary, right? You just execute them. So if you're running a container and some binary is downloaded from the outside, this is a suspicious activity, right? It's good to have an event about it. But when you're in building time, actually it's the opposite. You are creating your artifacts, right? You are probably installing dependency, auxiliary tools to help you create your artifacts. So you have your compiler, maybe code to do some downloading, maybe JQ, right? So it's not the same thing, right? So what we try to do is like go over the signatures we had and see what does make sense for production and what does make sense for build time. And because like in production, making, we don't wanna make assumptions, for example. It does not matter if you're running a binary built by go or rust in production. We don't care, right? The events are collected from the kernel and that's how we would see if it's a file as attack or not, right? But in build time, maybe we could do the assumption when you're running Go that your Go mode should not change. So we can create an event for that. The Go mode can change in development time, but in build time it should. So we could create an event for that, right? And considering this same idea of build time, we can make assumptions. We saw that build time is actually kind of predictable, right? You have your steps there. So for example, you'll say, I will clone my repository. I'm gonna download my dependencies. I'm gonna run my tests, build my binary. So that's kind of flow there, right? So considering this idea, this specific assumption that build time is not production, we thought that what if instead of using trace to see the malicious activity, use trace to actually see the good activities, right? The normal behavior, this flow. And then we try to enforce it every time, right? So if you have a baseline, whenever the baseline is diverged, means something changed, we'll let you know about it, right? In other words, we start to do a profile. And that's our second attempt here, right? We booked the action in two. So we have a start and stop. Well, whatever is in between there will be profiled execution. We wanna track what is executed in between those actions. Those actions, and another lesson learned is that creating a profile of good data is very hard, right? It's very hard. Why? Because there's a lot of volatile information. For example, arguments, right? Arguments, sometimes they don't change, but if you're using a temporary directory, it will change, right? And the next run almost certainly will be different. Process ID, same thing. If you're using a process ID in the next run, so the profile is very unstable with arguments, right? So we try to balance what information to have in the profile about the executions that was good enough to inform you about your baseline without making it stable. So we decided to go on the first attempt, this second attempt with the binary path, the binary hash, and how many times it was acute. Of course, like now, I know it doesn't make sense, right? We did it, but it doesn't make sense. Why? Because knowing that curl was execute is good, but curl to a bad IP and curl to GitHub is very different, right? So the arguments are indeed very important here. We cannot ignore them, right? If we want to have a profile of the good actions. Other things that we saw as well, profiling executions is very limited. It leaves a lot of things not answered. For example, with executions only, we don't know if this build is hermetic, right? Is the network active happening while you're building your artifact? With executions only, we don't know that. Also, we don't know if any file was modified, right? So there's things we would like to answer, the good things, and just executions is too little. And another idea that also, like from this first attempt to the second, one we are doing just signatures and the other just profile. And then we thought, okay, profile is a good idea. We like it, it's a good concept. But signatures also have a role, right? We also want to look about any bad activity that might be happening. So this brings us to our third attempt, right? Where we are trying to fix those things that we learned. First, we brought back the signatures to identify any malicious activity that might be going in the host. Second, we extend our profiles. Now we look for executions, files modified, and network active, okay? And this kind of gives the feeling of deny, allow list, right? So the bad stuff is on the deny list, which is the signatures events can identify. And the good stuff is on the profile, right? Tracy comes with like a bunch of those signature events by default, right? It's shipped with Tracy. And also if you want to create your own, it's possible, right? The aqua research team adds their own there. So it's always evolving, but you also can evolve that library if you want to. And that's the demo I want to do for you real quick here. So just some context, okay? I have a goal and project. I create a branch here. And in this branch, I'm introducing a GitHub workflow, okay? So, okay. It's a goal and project, some files. We have the starting action for trace. They stop action, right? The things in between you're gonna trace. We're running our tests, building our binary. And here I want to keep in mind this upload fake action, right? Something will happen. For now, just note it, okay? So, we're gonna go back to the terminal and push this branch to GitHub so we can create our PR. Pushing, let's go to the GitHub, create our PR. All right, and now they are the test of waiting for the PR to run, right? I really try to use like my skills that I did in this video here, okay? So, it's faster. It's not that fast in real life. Let me warn you guys. So, we wait for it to run. And it fails, right? Why it fails? Because it's the first time we are running, right? So, we don't have a baseline yet. It did create a PR. And this PR, we have the three profiles I made before. We see there's no activity for DNS. We see the executions that we were able to trace, right? What your product line did, what workflow did. And we also see that two files were changed. So, this is our baseline and we're gonna merge it, okay? Merging, all right? We now we need to update our PR, right? With this baseline. So, we're gonna go to men, check out and update the branch, all right? And going to the PR, we're gonna rebase from men and push it. We're basing and push to update the PR. Again, this will start the workflow. Waiting for it to start. Okay, and should be now. Yeah, okay. So, it's starting to run in there, right? And it generated a new profile and it compared to the baseline. And before we failed because we didn't have, now we have the baseline. So, it passed, right? We, the build this time did exactly the same thing that it did before, right? All right, we're gonna merge our PR, adding our workflow there. And now, like, pretend that sometime has passed, we are still working on the project and we're gonna create a new feature, right? So, we are updating our man with the latest code. We're gonna create a branch for the new feature. Okay, on lines. Here, just to demonstrate, I'm gonna do an empty commit. Okay, like, it doesn't need to actually be any code. Just an empty commit. So, we triggered the workflow again. But, before pushing, we're gonna, remember that fake upload action, we're gonna hijack it, right? Pretend someone actually changed that code to do some malicious activity. So, we're gonna go there, change to a branch that has the malicious code, create a new tag, right, with that code. So, we recreate the tag 010 and we have to push it and update it on GitHub. So, we're gonna push it. Awesome, this will update there. So, like, the point here is like, to think that a supply chain attack like code covers something, the hijack could have happened without no one knowing it, right? You'll think that things are okay. You are code, you test, you scan, you validate, and things are good. But, like, maybe one of your dependence, right, the class supply chain attack was hijacked and you don't know about it. So, we push it and let's create a PR. Okay, again, we need to wait for the workflow to start. So, give it like a couple seconds. It starts, and this time it should fail, right? We're gonna note a couple things. So, the first thing, right, you see it wrote a comment there. It's still running and it should fail now. It also created a PR. We see that it was linked there. So, it failed. Okay, so, what has in the comment? This comment is like how we found to say that, like, how to inform you about the malicious activity, right? So, there was a signature event that identified someone trying to contact a crypto mining domain, right? So, it added there. There are the informations about the event, right? What's the process ID? What is the context? Everything. And also, it created a profile because now our baseline diverged, right? And if we see the profile, what we see there? First, the DNS that was accessed, right? It's a crypto mining DNS. The comment that was executed. Here, just to simulate we're using this. So, the comment that was executed. And now, so that our main goal was changed, right? Our file, our space, our code now is different. So, this should be alarming, right? So, that's the idea of the profiling, right? And that's the demo. And now, like, I would like to dig a little deep in each of those sections, right? About the executions because like, before we started with the executions, then we said, oh, we care about other things. Executions, network activity, file system activity, right? So, let's start first with the executions. So, the concept of a deny list and allow list. In the deny list, we have the signatures, the event signatures. For example, we have the code injection event. The code injection will trigger every time processes try to inject code into another process. The LD preload event. It will trigger every time someone is using the LD preload variable to load a library before your process, right? And for the good part, we extend the profile now. Before we just had binary path, binary hash. Now we have user ID, arguments, and environment variable, right? But remember that arguments made the profile unstable. Yeah, so because of this, so we can actually allow the profile to be consistent, run to run. We had to ignore it system, right? So, if there's one comment that is using attempt here, we need to add that here to the ignoring system so we don't record it in the profile. This happens with the git checkout, right? So, when you're checking out your code, it will create several temporary directories and that's something we do to the ignore rules. Another thing is the environment variable, right? We also can ignore some of them. For example, the github talking, right? And by default, it's disabled, right? And it's disabled so people are cautious about it. The code called attack, it was collecting this information, right? And as we're tracking from the kernel, we have all the information of what's going on, right? And we are recording it. If the environments are enabled by default, we're gonna create the pull request. We have our secrets variable there, right? So, to be cautious, it's disabled so we need to think a little more on how to do it. And the next thing is how do we know what is getting executed, right? How do we ask Tracy to collect this information for us? I think the first idea that comes to mind is a syscall, right? So, whenever you execute something, it's through the executive syscall, right? You give a path name, it executes that binary, right? But there are a couple problems with it. First, the executive syscall, it's relative to the directory. So, if you inside a directory and you execute something, the path name that is traced on that syscall is relative. And we need the canonical path, right? We actually need to know the canonical path, like just the relative path is too little information. And without the canonical path for them, how can we have the hash of the binary that was executed, right? Another problem, right? When you invoke syscall, you're actually asking the kernel to do something, right? You're saying for it, okay, do this for me, execute this binary for me, it's not the actual execution, right? So, it's susceptible to a type of attack that one thread can say, hey, execute this binary for me. And before the kernel actually starts to execute it, in a race condition, another thread changes the parameter, right? So, Tracy, when he see the syscall, actually he see the previous parameter, which was a pointer, and he doesn't see the change. So, because of this type of attack, time of check, time of use, we prefer to use a tracy point. So, Tracy not only can collect those syscalls for us, but also LSM hooks, trace points, kprobe, and those signatures events that we built, and you could also build if you want. Next one, file mode file. Same idea, we have the deny list and the allow list. For the deny list, for example, we're checking if the sudoers file of the host was changed. And for the allow list, we're checking what file path changed, but specifically in our workspace, because GitHub give us a VM, right? And we don't wanna know like everything that has changed the whole host. We care more about our pipeline, right? About the workspace that we are building our artifact, right? And how to collect events for it. This is also a cool one because we can think the right syscall, right? But the right syscall, because in Linux everything's a file, it receives a file descriptor. And this file descriptor could be not a file, it could be a socket, could be something else. And also, the right syscall is executed a lot in Linux, right? So, we do a trick here. Instead of tracing the actual action, we trace the intention to write. So, what needs to happen for someone to write in a file? You need to open that file with the right permission. And that's what we trace here. Trace doesn't have this event, so we have to create it, right? And I'm gonna show you in a bit how we did it. For the network activity, again, deny and allow. Here we broke, we consider the access direct to a bare IP because bare IPs, they are like volatile, right? So, having the profile, the profile will never be stable, right? So, we kind of treat it like it's a better activity. We'll let you know about that with an event. And in the profile, we add in the DNS resolution, right? DNS event. It can also be ignored if you want, right? And how we track it. Trace has a bunch of events for network. They are like higher level events, right? Instead of worrying about the SIS calls and everything, we create a higher level event and they are like protocol aware. So, you can say, let me know whenever an HTTP request happens or let me know whenever HTTP response happens. The same for DNS. And that's what we track here. And the last category I want to mention is, I like the trace action like this POC that we implemented because something that we want to do with it was not possible in the first few versions, but like, as trace is evolving, we are able to improve this POC like every couple months, right? So, as I meant, like GitHub, it gave us VM, right? So, there's a lot of things happening and we don't want to trace everything or else the profile is never stable. It's chaotic, right? The only thing we care about is what is happening in your workflow. And inspect GitHub, we actually see that like, every step of our workflow is executed under the same process tree. And for us, this is great because trace can filter easily by process tree. We just say, hey, Tracy, collect this evidence by this process tree. It doesn't catch everything because, for example, if you're using Docker to be there machine, whatever common is in your Docker file is important for the profile, right? You want to know that your Docker file did occur to some place, right? But what we see on the runner tree is actually the execution of a Docker build, right? We don't see the exact commons being executed. So, this happens in another process tree. We also say, hey, Tracy, please trace this process tree and that process tree. But something that we couldn't do before and recently we did a feature in Now It Canes, like once we narrowed, we couldn't have like the other eye, right? We were looking with one eye to these things, the other eye to the rest, right? So, for example, signatures. The signatures should not be only about the process tree of the Docker build or the workflow, right? It should be about the whole host. We care about anything, like if someone is doing an SSH connection on this host, while your build is running, we care about it, right? And with a recent feature of policies, we finally were able to implement it. Let me show the policy for you. So, this is a policy for the runner, right? Basic policy, it has a name, description, the full action. The scope here, which is the runner, right? Whatever is happening under that process tree and the events we care about. SCAD Processes Act, NAT DNS. For the case of the container build, right? Same idea, the full action name or events in this case are the same. SCAD Processes Act, NAT-PAC-DNS. But our scope is different. Here, what we are saying is that, okay, you know that binary, it builds the Docker image. And what we wanna do is follow the children that are created there, right? Because there are the common executions for the Docker image, so we do a dash follow there. For the signatures, then like the new thing I was saying, right? Our scope is now global. For signatures, we care about all those events, but for the whole host. We don't want it just for one process tree or the other. And I mentioned before about like tracing the intention to write, right? And we had to create an event for that. Like you can create signatures events in Tracy with either Rigo or Golang. Here I'm using Rigo. You can see the metadata, the event name is called FireWrite, right? In the trace select events, we say what we will originate these events. So we say whenever a secured file open, which is an LSM hook happens, meaning someone is opening a file, we run our logic here, which is the Rigo part about the Tracy match. What is our logic? We check, is this open actually intended to write? If so, we grab the path name and return it, okay? And why we do it? Because now we can use it in a policy. We can do a policy that for the event, FireWrite that we just created, but then we filter by the workspace, right? We only care about change happening in our code, right? And that's why we return the path name there, okay? So this is very new to Tracy. And I work on this feature, I'm really happy about it. Sorry. And the last thing I wanna mention here is like while working on the profiles, we thought that Tracy can collect now a lot information, right? And like, this is good for detecting malicious activity. This was good for the concept of the profile, but it could also be good for other things, right? Thinking like observability here. And very recently there was a predicate for runtime attestation that was merged in total. And specifically for that, it's kind of the same idea of the profile, right? It's different, but it's similar. Not sure if anyone use like tecton here. Tecton has tecton chains. And whenever you're building an artifact in tecton, tecton chains Tracy, what are you doing? And then it attests, it saves you. These artifacts was built with those comments, right? And this runtime predicate here, this attestation has the same idea, but it's more generic. It could be applied for anything, right? You could do it for a script. You could do it for other CI. In our case, GitHub Actions. And that's what we are looking at implementing now, because we can use this attestation to help like supplement our salsa attestation, our provenance, right, for salsa. And it's still ongoing, but I have appealed to it and so the idea is to release it, right? There's like some of the lessons learned that there's a list if anyone get these slides. And thank you, here's like the links if you wanna know more of the projects, or if you wanna chat about it and like give me your perspective. I would love to like hear your feedback. Thank you guys, really.