 Okay, so thank you, John, fascinating talk, and this will be yet another talk on security. And actually, half of the talk is praising Tetragon, so maybe this talk will be much shorter right now. So today's topic will be securing CICD through eBPF. In the upcoming talk, I will showcase to you how I take this great technology of eBPF and try to solve for it some of the most, some of the tough issues that we have in securing CICD pipelines. So this is an eBPF conference, so we'll be introducing eBPF, introducing some of the tooling the community built for eBPF. And then we will describe what is CICD, what risks pose to CICD pipelines, what is the security issues we have in CICD. Then we will merge both technology, eBPF and CICD, and showcase what the possible solution could be to solve these issues through utilizing eBPF technology. We'll tackle that solution through looking some of the most prevalent attacks we have over the CICD pipelines in the recent years. Some of them were attacking the software supply chain. Later we will get a bit deep dive into how can we implement such concept, and stay tuned because we have also several demos. And finally we'll talk a bit of the future plan for this concept. Okay. I'm Alex Elgayev, I'm a senior security researcher for CICOD. Previously I was a team leader for the malware research team at Checkpoint, where I reverse engineer some complex pieces of malware that originated both from cybercrime and APTs. And nowadays I'm a vulnerability researcher at CICOD, a research vulnerability, and also mitigations for software supply chain attacks. CICOD is a cyber security company that provides a complete software supply chain solutions for organizations. Okay. I bet some of you already know this architecture by now. So what is eBPF? eBPF is a revolutionary technology that originated in the Linux kernel that allows you to load sandboxed code into the kernel. It allows you efficiently and safely to extend kernel capabilities without modifying any kernel source code or writing complex pieces of kernel modules. So why do we need eBPF? I think we had enough reasons up to this point, but in short we can get great tracing abilities of observability. And of course we can create a security mechanism in the kernel, effectively, safely, without a lot of hard work. So specifically in this talk we won't be talking a lot on eBPF code, on how to write eBPF code. There are more of the concept of taking eBPF technology and solving security issues with it. There's plenty of tutorials explaining that. Some of the eBPF tooling that exists open source nowadays, like I am, I bet some people here are getting started with writing eBPF code, so there are VCC and eBPF trays, great tooling to get it started and get your first eBPF code compiled and running. There's several issues with these tools that demand the kernel headers and recompiling for every operating system. And eBPF is a library that introduced a concept called Coree, compiled once, run everywhere. It solves several issues, like, as it says, you don't need to compile it for every operating system. Most latest Linux distributions contain the BTF headers, which is BPF type format, that allows this Coree applications run easily on every modern Linux system. Some of the popular open source projects that utilize this Coree, Alcelium, Tetragon, and more, that allows you with one liner to install eBPF code on your machine and achieve great things. So a bit on CICD. I bet almost every developer project nowadays embedded some kind of CICD into the pipeline. It helps you to automate processes and quicken the development process. Such simple CICD pipeline could be taking your source code, check out it, compile it, either through a Docker build, make file or whatever your compilation method is, taking the artifact you created from this compilation, put it in some registry, maybe some package manager, and even sometimes take this artifact and deploy it in your production environment staging, testing, whatever. Such processes has a few issues. For example, it demands highly privileged access to some very sensitive assets. For example, the process I just described demands sometimes right access to our code depositors. It demands right access to our artifact registries or package managers, and sometimes even to the cloud infrastructure. So lately we are observing that the attacks on software supply chain are arising because of these risks. So securing CICD is hard. Why it's hard? First, we don't have a lot of security tooling. The standard security tooling like antiviruses, EDR firewalls just don't fit for these purposes. And the other application security tooling focuses more on the finding vulnerabilities and vulnerable packages instead of securing the pipeline itself. And the CICD system usually have a lot of configurations. Whoever tried to play with Jenkins a bit knows what I'm talking about. Each misconfiguration could lead maybe to potential compromise of the system. Next usually CICD system because of lack of security tooling also have really low visibility. So this is another issue security sock team need to handle it. And lastly, some of the systems, for example, GitHub Actions, GitLab and more are built on ephemeral environments. This means that every build is creating a new machine or a new container environment that's creating the build and destroyed afterwards. Does this make security even harder? So I propose let's take this idea of ABPF and try to stop some of the security issues we have in CICD. Why specifically ABPF? It's safe, it's easy to build an ABPF problem, relatively easy. I don't want to say it's really easy. And it can solve some of the observability and security issues we have. As I said about CORE, it can work after the box for latest Linux kernel. So it will also work after the box for most of the CI systems themselves. And of course it has very powerful community and tooling that pushes the technology forward. So this concept of creating an ABPF agent that will be easily deployed on any CI system and will monitor or protect using some configuration. So let's tackle this concept for some of the latest attacks we have over the software supply chain and CI-CD systems in the recent years. The first and maybe most famous one is SolarWinds. In SolarWinds there was a SunSpot malware that was planted on the build server for a SolarWinds corporation that replaced the code file just before it being compiled to a malicious one, which resulted to a malicious artifact that was signed by SolarWinds and deployed to its many customers. So how this specific ABPF agent could maybe solve this issue? First the most straightforward method will be to deny any right access to the source code file during the CI build. It's plain simple but also can be very effective. I'm finding it really hard to find the use cases where a CI workflow or some build process need to have right access to the source code file. And another possible solution would be to monitor process execution and file system execution and maybe compare it to previous execution of the same build. This can work because usually workflows or usually builds can be very repetitive in their task. They're doing the same task over and over again so it can be quite effective to compare it with previous execution of the same task. Let's look at another famous incident which will be the CodeCode Bash installer compromise. The Bash installer of CodeCode was modified by a malicious actor. So it's a code coverage tool that usually was run during the CI workflow of many projects. So on every CI that was installing CodeCode during that was compromised, it was server months I think it was compromised, it also was exfiltrating all its environment variable inside the CI to the attacker control server. modified environment variables because as I said the CI process has a lot of previous access to many systems and usually the access is done by through some secret tokens that are saved as environment variable. Specifically in CodeCode they were after GitHub tokens that will let them access private repositories. So how we could maybe help with this case? First the most naive and simple solution would be let's monitor all the network connection that our builds are doing. As I said that they are quite repetitive that doesn't have to access hundreds of domains or IPs, the list should be quite simple. A more advanced solution let's create an allow listing. Let's say every process that connects to IP or domain that's not in the allow list should be terminated immediately. The third use case is more like, I don't talk about specific incidents but a group of incidents that would see a rising in the recent year are installation of malicious dependencies through the CI. Usually when we are building software we are installing dependencies. Usually we don't know what dependency exactly we are installing, we don't know whether they are malicious or not. And there are many incidents where a project during the build compilation they install malicious packages whether they have been hijacked by some attacker or maybe some type of squatting and mistake for the package name. For example you put in the requirements file for Python, you put request instead of the popular request package. This request package could be malicious. So similar to Kotko we could also stop it through some tight network monitor. So none would explain the concept, let's talk about how we are going to implement it. Let me show you some POC that I have built based on Tetragon to demonstrate the concept that is also executed on GitHub Actions. So first you all heard John's talk, really fascinating to talk about Tetragon. Tetragon is an extremely powerful tool that allows you observability and run to enforcement for security purposes. So I will run it quite quickly because you already seen it. But why I choose specifically Tetragon? Because it's highly generic, it allows you to build policies and to load them to Tetragon Engine which also is a highly capable engine that loads it into the BPF code. Second it has limited but powerful features to enforce based on predefined filters so we can enforce on the kernel level and it doesn't have to write code in the user mode to stop this. And third unlike other powerful tools that exist in the BPF ecosystem, Tetragon also works outside of Kubernetes environment. This is important because the CIC system can work outside of Kubernetes environment and for example GitHub Actions is not working in Kubernetes, it has a dedicated new machine for every new build. So Tetragon answer these requirements. So the architecture of the POC is quite simple, it has two parts, the agent which is the most important part and some server in our lab environment that will get all the results and all the detection we had on the agent part. We don't have to, the server is not a mandatory but it's only to see the results and to use the experiment. So in the agent part we have two pieces of code, we have Tetragon executable and we have a very agent, simple agent written in GoLang that installs Tetragon on the machine and communicates it with the policy that it should be loading to the Linux kernel and extracts all the events and sends it to our lab environment to understand what's going on in the build itself. So some of the functionality we implement in this POC, the first is observability for the build system, understanding all the executed processes, build some simple process tree out of that to see domains and IPs, it's nothing fancy here. Then we implement also a simple source code integrity feature to stop against attacks such as SolarWinds and to experiment with this feature. Then the third which is the most important to my opinion is network protection. This creates some simple allow listing for domains and block any connection that's not in this specific list. These abilities are demonstrated on GitHub actions but it can easily be deployed on any other CI system which works on Linux of course. And this functionality is of course proving the ability to stop the use cases we presented. Okay, so I will be running it quite fast because I don't have time for the demo. So the installation is quite easy, it's through what's called the custom GitHub action. We just add EBPF agent action in the CI. We're not familiar with the GitHub action syntax, it's not really required here. In short, for every installation I'm just putting a Docker container, privileged in the detached mode that installs this tetragone and the agent we wrote to secure the machine. So for example, if we have some hello world CI in the GitHub action, first it has three parts, the first one we're installing the EBPF agent and then we're running two commands, let's comment a curl to Google.com and on the server side we have a simple process tree for all what the builder was doing, the runner.worker process is the main process of executing steps in GitHub action and we also see the contacted domains. So let's go over quickly how we implemented the functionalities. For the tracing part it was quite easy because tetragone as I said is a really powerful tool. It's giving us the entire process list that is executed including it reaches it with additional information such as the parent, the arguments, the binary and more. When we combine it with TCP connect tracing, which is tracing for all the TCP connection made by the operating system, we can also give an addition context to the connection and understand which process did the connection so we can terminate it. In case of DNS tracing, it's a feature that's missing in tetragone so we have to use some external tool that showing us all the DNS requests made by the agent. Of course, ultimately we won't want it to be implemented in EBPF as well but for the POC it was used as external tool. One of the strengths of this tool of this agent comes when we call it deep inspection. The CI usually are using some external models, external tools, binary packages and etc. We don't have complete visibility over what each package is executing. So with this tool we have the complete visibility of the entire, for example, process tree that's executed by the CI even if it was using external dependencies. For example, my CI was using the action to set up a goal which is a very common action in GitHub actions. So I can see the entire process tree of that specific setup goal and understand if it was doing some malicious activity. So for integrity, as I said, we can implement integrity on several levels, specifically for this POC we implemented only code integrity but ultimately we want also to verify the builder hash that wasn't altered. For example, if you say go, make or docker, we want to verify that no memory modifications were made to the process. It could be a valid hash but it was modified in the memory. And we want to verify that no write operation was made for the artifact created by the builder. So this is how we implement code integrity. This is a tetragone syntax, simple policy that we're checking that no write operations are made over file that ends with that goal or goal that sum or goal that month. And lastly, network protection, we are receiving a list of allow list of domains and we're checking that any process that makes connection outside of this list will be terminated. So how we will implement this? For example, we are receiving as a network policy input variable to receive list of domains that allow to be accessed. For example, we are using code code, so we're adding code code domains and other additional domains that GitHub action is using internally. We are resolving these domains into a list of IPs and we're building the right tetragone policy to load it into the kernel, into the BPF code so any IP access that wasn't in this list will be terminated. So okay, let's see some of the demos. First I will start my server. I will demonstrate a real world scenario of SolarWinds. First without the mitigation and later I'll put the mitigation on. So I'm putting remounted, this is the server that's sitting on my machine. I'm tunneling it through Ngrok and I'm starting the CI. Let's see how the CI looks like. Let's start the server. So this CI basically installs the BPF agent. We also install some SolarWinds attack setup. This is of course artificial. This won't be in a real CI but it's only to simulate what the attacker can do if he was getting a hold of the build server. This SolarWinds executable just waiting for the build process and just swap the source file with the other source file with the malicious one. And then we're just having some simple go, hello world goaling program. And finally we're executing it. So let's see what the output was for this CI. It's already over. We have out of the hack text. This means that the attacker managed to replace the goaling file and to create an malicious artifact. If we look on our server, we can see the process tree that we are observing from our agent and some of the main that we're accessing. So let's run it again but this time we will enforce it. And we put a code integrity for goaling language. In this case, we will expect the agent to understand that this binary SolarWinds that we have predefined in the CI system is touching some goal source file and terminate the process immediately. We also want to see that in our lab environment that we managed to stop this threat. So let's wait a few seconds. So when we run the executable, this is benign hello world. This means that the attacker didn't manage to replace the file and we have the original source file. And when we look on the server, we can see that these two processes were killed by the agent. We are searching all the processes that were received, this is our method of finding a mitigation made. So we can see that SolarWinds the executable that we predefined at CI to do the malicious work was stopped. So this is demo number one. Let's look at demo number two for code code attack. So let's look at the code code CI attack. First we run it without the mitigation. So the CI of code code, first we install the agent and then we have a very simple said hello world calling program that builds the code, test it and uploads the coverage to code code but through the compromised batch uploader, not the original one, just to simulate the code code attack from the last year. So I think it's over, not yet. Let's give it a few more seconds, yeah, it's over. So when we look at this, we can see this is the process tree executed. We have an interesting line here. This is under the compromised batch uploader which is the code code batch uploader. It was doing a call to some unknown IPA or a domain. This contains the complete environment available list of the CI. We can also see that in the domain list. So when we run it, let's run it again but this time we'll put in a closed list of allow domains. I have a prepared list that contains a GitHub, code code and a few other domains that should be all allowed and benign. So similar to the SolarWinds in this, we expect in this run, so the agent will identify that the call inside the compromised batch uploader of code code, we're trying to access some unknown domain and terminate it so we could see that it was terminated. So now it's running the coverage, so we will upload it. Okay, so it's quite long and ugly, but this very long list is processes killed. This is the entire argument list of the call command because it was running the end of the command inside the minus D parameter of call. So that's how the attacker initially wanted to send all the environment variables. So we can see that this call command was terminated and it caused the termination of the entire malicious process. So this is for demo. So what's next? This POC has many, many to do, as many issues. It was only to demonstrate this concept. There's a lot of improvement that we could do to the engine. There's an issue that we have to do more smarter domain filtering because the user wanted to give a domain for the allow list on the... But EPPF receives IP, so we need to understand this connection and implement it better. Also, we have many additional features that we want ideally to implement. For example, we want to be more service-oriented with the so-called agent in to understand whether it's running on GitHub Actions, on GitLab, on Jenkins, or Travis. And for example, in GitHub Action, maybe we want to allow certain steps inside the jobs to access to certain domains. For example, we have like a GoMode download. We want to allow only this step to access to the Go Package Manager, and the entire CI shouldn't do it. There's an additional anomaly agent that we could implement. There's an entire work on Salsa, whoever... Salsa, it's a standard that they're building some industry leaders into increasing the supply chain software artifact integrity. So the standard says that good practice for the artifact is to contain their provenance, their recipe for how they were created. So this recipe could also be created using such a tool that was run during the build process. We could also support additional CI systems and more and more. So this project currently is not open source because I didn't have time to create the open source, but it will be in the upcoming weeks, so you can maybe try and experiment with it. And when we're looking at the future, we are considering taking this concept and maybe creating a better solution for many other CI systems that will work also for a production-grade level and maybe could be adopted by many other open source projects. So we would love to hear feedbacks from the community on this idea and don't hesitate to ping me on DM me on Twitter, on LinkedIn, or whatever media you want. So let's wrap it up. What is the takeaway from this talk? First, this is an EBPF conference and I wanted to showcase how EBPF is awesome technology and with great use cases, it enables many innovations, specifically in my case for security purposes, but for others as well. The EBPF community has created really great tooling. I demonstrated Tetragon, how I was able to take it almost out of the box and use it for additional purposes that the maintainers didn't think about it in the first place. And finally, of course, we welcome any individuals who want to contribute to the idea and maybe cooperate on this and create better security solutions for a specific CI system or maybe for other purposes as well. And that's it. Thank you. Wonderful. Thank you, Alex. Thomas coming up to set himself up. While he's plugging himself, we might have time for a couple of questions. Who has a question? Good. Thank you for the talk. It was great. In the context of GitHub Actions, when you are, I guess, doing a SIG kill from that EBPF detection, how does actions handle that? Does it just think nothing happened and keeps going about its life or are you actually able to catch that and display to the developer in the GitHub Action panel some kind of meaningful thing of what actually occurred? Can you maybe repeat the second part of the question? Maybe we have the mask. I'll barely hear you. Oh, sure. When it fails from the detection and say the EBPF detections running a SIG kill, can GitHub Action somehow capture that and also capture maybe standard out or standard error from that failure and then display that in the GitHub Action context to the developer or do they have to go to the build machine and find what happened? Yeah, basically, GitHub Action is just VM fully controlled. You can do whatever you want. And what I noticed is that when I'm killing such a processor or a SIG kill, sometimes it stops, sometimes it continues. It depends on the parameters you get for the job. You can give it a parameter that you continue even when it fails. So it will just show you like everything is okay. But one of the core processors like the build was stopped. So I don't trust entirely the GitHub Action log system to show you like the issues there. So it's not its purpose. It's purpose only to show you logs from the machine. That's it. Thanks for doing this. This is great. I just thought about the GitHub Action. We can actually throw arbitrary signals too. So maybe there's some way that the GitHub Action could catch a specific signal that we send it. We'd have to work with the GitHub folks, but it's possible. Do you plan to take it from a PSC into production? Can it be without the mask? Remove the mask. Will you take this from a POC and keep working on it? Because that would be awesome. And if you keep working on it and you're interested, I saw some of your to-dos. They looked like things that were on my to-do list as well. So we should talk. But if you wanted to submit a link or something to Tetragram, the GitHub page, maybe there's somewhere we can link the two up so people find them. That was just a comment. Thanks. Yeah.