 So first let me welcome you. My name is Jeremy Callan and I am a developer advocate manager for Amazon EKS and joining me today. I'm Mohammed Wasik and I'm a principal security engineer with AWS. In particular, I work with Amazon GuardDuty team. And today we're going to be talking to you about finding the needles in Haystack or identifying suspicious behaviors with EVPF. And I'd like to start by first looking at the challenges that you're likely to encounter when developing a solution for detecting threats in runtime events. Now for starters, the solution should be lightweight and stable. Also should be able to handle a high volume of events. And assuming that you can overcome those challenges, you then need to separate the wheat from the chaff. So all that you're left with are actionable insights. And in this talk, we'll explain how AWS approached these different problems and why we settled on EVPF. So there are various approaches to this problem. We could have tried to extend the links kernel. And the advantage there is that you have enormous flexibility. However, the likelihood that your change is going to be accepted upstream is significantly small because when modifying the kernel, your changes typically have to be broadly applicable. And so it's unlikely that you would choose to extend the Linux kernel to solve this type of problem. You could also write a kernel module. And again, that also gives you a lot of flexibility. But a lot of folks are a little apprehensive about installing kernel modules in their system as they can affect the stability and security of the system, of the operating system. You could also try deploying a sidecar container. And this is good because you have a separation of concerns. You don't have to muddy your application logic with security code. But this increases overhead and it can be circumvented. So for example, if you're using an admission controller to inject a sidecar into a container that's deployed into a particular namespace, you could circumvent that by deploying a pod into a different namespace, for instance. And yet another option that has come to the fore recently is EBPF. And EBPF, like kernel modules, is extremely versatile in that it allows you to capture rich information about events that are occurring from within the kernel. And EBPF happens to address a lot of the challenges that I mentioned at the beginning of the presentation. For instance, programs, EBPF programs are typically considered safe to run because they're sandboxed. They also only have read-only access to the system call parameters. They can't modify the parameters of the CIS calls. They're also very performant because they run within kernel space and they can be loaded dynamically. That is, they don't require you to reboot the system, which is really nice. And EBPF has really evolved since it was initially introduced in the Linux kernel 3.18, I believe it was. And at that time, it was largely designed for filtering network traffic. But it has since evolved and new capabilities have been added to it too now where you can deny certain CIS calls. And the implementation that a lot of us are probably familiar with is SecComp. Okay, so let me spend a moment here to explain how it works. So first, the operating system loads your EBPF program or the bytecode for your EBPF program verifies that it's safe to run. It's typically run through a verifier, looking for things like infinite loops and that your program exits gracefully and so on. The program is then just in time compiled and then run. And typically, there is an accompanying user space application that is used to load the EBPF program and then reads the output or enriches the output with additional metadata like in a Kubernetes environment. It's valuable to know the container ID or the pod name. But reading that output from EBPF is completely optional. You don't have to do that. And I feel like this is a replica of Liz's slide from this morning. This basically is depicting how applications in user space communicate with the kernel. This happens when you have an application that has to access an area of memory or accessing a file on disk. Interfaces with the kernel through these syscalls. And EBPF can attach itself or an EBPF program can attach itself to these syscalls. And that's where you can get additional information about the program that invoked that particular syscall. So here's a very simplistic example of how GuardDuty is using EBPF. In this example, we have a process that's running in user space that's attempting to open a file. It calls the system handler and the arguments that were used to call that handler is passed in the EBPF probe. And then that EBPF probe has the ability to send that information to a user space agent. And with Zeke here, he's gonna run through this in greater detail a little later when we get to his part of the presentation. As for getting started with EBPF, I'll say this, if you're not a C programmer like me, writing your own EBPF program might not be very feasible, practical, or easy. And having said that, I'm pretty sure there are plans for EBPF to support Rust and other languages in the near future, which will make EBPF a lot more accessible to developers. And in preparing for this talk, I found three really useful resources. The first is a project from SoloIO called Bumblebee, which I'll say automatically generates boilerplate code based on your answers to a series of questions. It also comes with the CLI, which makes loading your EBPF program relatively easy. The other resource is a book that appeared on Liz's slide this morning called Learning EBPF. I managed to get through like the first half dozen chapters. Very, very good resource on EBPF, especially if you're getting started. And then the last resource that I'll mention here is the EBPF Summit. All the talks from that event are now available to watch on demand if you're so interested. And similar to the earlier slide where I showed you the advantages and disadvantages of different approaches to threat detection, I've created a table here showing the advantages and disadvantages of EBPF. We know from earlier that it's extremely versatile and faster to write and deploy than say Linux kernel module or to change the kernel itself. There are other advantages like memory safety and great performance and portability with things like compile once and run everywhere along with BTF or the EBPF type format. Yeah, which can translate with data structures within the kernel between different kernel versions. Yeah, now the downside of EBPF is that the tooling for it is relatively immature and debugging it is still pretty hard. But I'm sure that as it becomes increasingly popular that these challenges will be overcome. As for the common use cases for EBPF security, of course, but it's also found in a lot of networking applications and observability tools like Hubble, which you saw a few pictures of during the keynote, along with Pixie and lots of other networking and observability tools. And at AWS, we love EBPF. We have several groups that are using it. It's becoming increasingly popular at AWS. Here are smattering of examples of how AWS is making use of EBPF. Lambda is using it today to create pools of Geneva network tunnels. And this allowed us to reduce the VPC function cold start from 150 milliseconds to 150 microseconds. Pretty significant improvement. And then VPC is actually using it in several different places. They're currently using it to observe TCP flow level performance. We have applications like S3 that can be accessed in a variety of different ways. And so they're looking at the TCP flows for performance. And then later, they're looking at dynamically tuning the performance of the TCP stack for different services and their access patterns. We're also using it, or we have a distributed packet processing pipeline written in EBPF. And then it's also used to implement security groups and knackles. And if you used Amazon EKS, Amazon's Managed Kubernetes offering, it has its own VPC CNI. Today, it does not include a policy engine. We make use of the Calico policy engine or the Cillian policy engine. In the future, we're going to add support for network policy using EBPF. And then why EBPF for guard duty? EBPF elected to use, guard duty elected to use EBPF for threat detection for a variety of reasons. First, it can be implemented quickly. As I mentioned, there's less apprehension about installing EBPF programs than there is about kernel modules because EBPF programs are sandboxed. And EBPF programs are relatively easy to install and update. Finally, EBPF provides rich information about kernel events. And these events can be enriched with additional information, like container ID and pod name, which gives the user additional context to help identify threats to their environment. And EBPF can be used to provide protection at runtime, which also makes it very appealing for threat prevention, which you might talk about later, right? Yeah, so today, guard duty is primarily using it for threat detection, but the appealing thing about EBPF is that it could be used to prevent attacks in addition to detecting them. So with that, I'm gonna hand the baton to Zeke. Yeah. So I'm gonna dive into some more details about how we are using EBPF, what choices we have made, what type of events we are collecting, and we'll also go into a scenario and show you what type of events will result in that scenario and what type of detections we'll be able to get. Currently, we are primarily using EBPF for a system called tracing, because it's very effective for threat monitoring and threat detection. When you're using system called tracing for threat detection, there are three main objectives. Number one, you need to capture the input arguments of the system calls so that you could figure out what it is trying to do. Number two, you need to capture the details of the actor process or the process that invoke the system call. Number three, you may also need to capture the written value of the system call so you could figure out if the system call was successful or it returned an error. EBPF allows you to, as Jeremy mentioned, allows you to attach EBPF code or EBPF probe to various trace points or hook points inside the kernel. For system called tracing, one such hook point is system call enter hook or system call enter trace point. This trace point triggers as soon as the kernel starts processing the system call. When an EBPF probe is attached to this trace point, the kernel passes all the input arguments of the system call to the EBPF probe. The EBPF probe can also get the details of the actor process from the task structure, which is an internal kernel structure. Then it can send the input arguments as well as the details of the actor process to user space for further processing. Another option for system call tracing with EBPF is to attach an EBPF probe to an internal kernel function which is invoked as part of the kernel processing of the system call. These type of probes are called K-probes. When you attach an EBPF probe to an internal kernel function, the kernel passes all the input arguments of that function to the EBPF probe. The EBPF probe can then take these input arguments as well as the actor process details and send those to user space for further processing. In order to get the return value, the primary option is to hook into the system call exit trace point or hook. When you attach a probe to that or EBPF probe to that, the kernel passes the return value of the system call to your EBPF probe. You can collect the return value and send it to the user space. I'll also talk about some of the security considerations when you are choosing a proper trace point. I talked about the system call entered trace point for capturing system call arguments. Although it's an easy and efficient way of capturing system call arguments, it is vulnerable to race conditions or time of check, time of use issues. Let's consider the example of open Syscall in order to understand this point. When you hook an EBPF probe into open system call, as I said, the kernel passes all the input arguments to your EBPF probe. One of those arguments in case of the open system call is path name, which is a pointer to a user space address that contains the path name of the file being opened. Now your EBPF probe in this case has to read the path name from this user space address. And some point later, the kernel also reads the same path name from the same user space address. You can notice that there's a time window between the time when the EBPF probe reads the path name and the kernel reads the path name, right? And an attacker can potentially exploit this time window. So since both the probe and the kernel are reading the path name at different times from user space, a user space attacker can modify the path name in that time window. If that happens, your probe is going to read a different path name than what kernel reads and processes. A safe option or a safe alternative is to attach your EBPF probe to an internal kernel function. For example, in case of open system call, one such internal kernel function is security file open. When you hook to your EBPF probe to this internal kernel function, you can read the path name from internal kernel data structures. For example, the file is struck in this case. This is not vulnerable to race conditions or time of use exploitation because a user space attacker cannot manipulate the internal kernel function, sorry, the internal kernel data structure. So we have learned, if you are interested in more details of these type of attacks, you can view this DAF CON presentation, Phantom attack, evading system call monitoring. Very interesting presentation. So we have learned from the research on these attacks and we have implemented our EBPF probe in secure fashion. In particular, none of our EBPF probe reads user space memory. I'll talk a little bit about the level of context which EBPF provides us. A significant percentage of our customers now use container workloads on platforms like Amazon EKS. When it comes to threat detection, the primary demand is that the detections should contain container level details. In other words, if a detection originated from inside a container, it should have the details of that container. Otherwise, that detection is not very useful or actionable for them. EBPF allows us to not just provide the container and pod level details, it also allows us to provide all the process level details which is a significant improvement on guard duties, existing flow logs and DNS logs based detections which only provide host level or EC2 level details. A little bit about how we are collecting the context details, which details we are collecting from kernel, which from user space. Our strategy is to collect as much information or data from inside the kernel as much possible because that's more efficient than it's also safer. So we are able to get all the process level details from the kernel, such as the PID of the process executable path. We are also even able to get the container ID if a process happens to be running inside a container from the kernel. Of course, there is some information which is not available inside the kernel. We have to get it from the user space. So for example, container image name, image digest, we have to get it from the user space. Similarly, Kubernetes pod IDs, name space and name is also obtained from the user space. And then the short 256 hash of the executable is also obtained from the user space. Now what type of events we are collecting? Linux supports or has 300 plus system calls. EBPF can capture all those, but that wouldn't be very efficient. So what we do, we try to collect all the relevant system call events which are valuable in terms of threat detection. Some of the main ones that we collect include process creation execution events. These events allow us to provide process level details. They also allow us to identify suspicious process executions. They also allow us to profile or track behavior of various executables and pods and containers. Next up, we also collect file system operations such as file open and file mounts, file system mounts. These events allow us to identify suspicious file system operations. They also allow us to track file system activity of various processes, executables, pods and containers. And then another useful category is network connection events. These events allow us to identify connections with known bad IP addresses or known malicious IP addresses. They also allow us to again track network-based activity of various processes, executables, pods and containers. DNS request and responses are also another useful set of data. They allow us to detect when a process tries to look up a suspicious domain name. And then another category is inter-process interactions or when one process tries to inject into the memory of another process. We try to collect events relevant to this category because these techniques or these system calls are commonly used for in attacks like process injection. And there are some other events that we collect. There are some miscellaneous events, system call events that we collect, which are primarily commonly used in exploitation techniques like halting kernel processing or interrupting kernel from the user space. And last but not least, we also collect container creation events because these events allow us to provide container level details and pod level details, and they also allow us to track behavior of various containers and pods. The EBPF agent primarily collects events and sends those events out to the backend. It does not implement any rules within the agent. This architecture allows us to update and add rules quickly. It also allows us to perform more complex processing on the backend. So after the agent sends the events, we collect events at the backend and then we pass all the events and we apply threat intelligence to all those events to identify connections with known bad IP addresses and domains. We then also pass events to our stateless rules. These are the rules which use a single event in isolation, and then we also have more complex stateful rules, which depend on more than one events. And in future, we also plan to pass these events to machine learning in order to profile behavior of various entities in customers environment and use those learnings for anomaly detection. Now I'm gonna go through a simple scenario in order to just illustrate what type of detections that EBPF allows us to generate. This is a simple command injection scenario, a command injection vulnerability exploitation inside a VAB application, which is running inside a container inside a pod. So the attacker exploits, or the threat actor exploits this command injection vulnerability in order to first download a crypto miner, and then they execute the crypto miner, and then the crypto miner connects to a mining pool. So in the first step, when the attacker downloads the crypto miner, using file system operations, we detect that a new file has been downloaded at runtime, which is kind of container runtime drift, right? At this point, we don't generate any detection, we just maintain or store the state that this is a new file, which has been downloaded to the container at runtime. In the next step, when the attacker executes this new file, we detect that, oh, this was a new file and it was executed inside the container, so we generate new binary executed detection. So this detection is good in context of containers because containers are supposed to be immutable at runtime, so this detection shows that something was downloaded to the container during runtime and executed. Next step, using the execution process execution event, we are also able to identify that a known binary, which is a known crypto miner, was executed using its short 256 hash, as well as using the name of the binary. And then when the crypto miner connects to the mining pool, we have the IP address of that mining pool as a known IP address or known mining pool IP address. When it connects to that IP address, based on the threat intelligence, we are able to generate a detection. So these are the types of detections which EBPF allows us to generate. The main point here is that we are able to provide detailed context information in these detections. So this is just an example of one of these detections, so you can see that we are able to provide pod level details and then container level details such as container name. And then we also are able to provide process level details, all the details of the process and even the lineage of the process or which shows you the ancestors of the process, like the direct parent, grandparent of the process. And we are also able to provide the runtime context which is related to, or which is specific to the finding. For example, this is for the new binary executed finding. And in this case, binary path, which is the path of the new binary which was executed and the process that created the binary or modified the binary. We are able to provide the details of that process. So I'm gonna pass to Jeremy to wrap up the presentation. Great, thanks for seeing. Okay, so to quickly summarize what we talked about here, EVPF is an attractive option for threat detection because it can capture events from the kernel and the data can also be enriched to provide additional context. And it's really good for threat detection applications such as GuardDuty because it's lightweight, it's portable and doesn't require changes to the kernel. And when it's combined with the power of the cloud, it can be used to find the proverbial needles in the haystack that allow you to focus on the root cause of a security incident. And with that, I wanna thank you for coming to the session today. Again, my name is Jeremy Cowan. This is Rizik Muhammad and we'll be here if you have questions, thanks. Yes, sir. Not yet. I cannot say if there are any plans or we are not actively working just to be accurate on those. So is the question, what is the size or what is the performance impact of running an EVPF program? Yeah, that's one of the major considerations. The volume of the events that you collect on Linux workloads is pretty significant. The size of individual events is not that much in general. It's typically what we have noticed is that it's around 1K. 1K by experiment, but the volume of events is pretty high. So, and this is the reason at the back end, we have to use several optimization techniques to bring down the cost. This is a feature we are thinking about, yeah. This is going to be part of GuardDuty Patrol, which was announced at re-invent last year. It's looking at the Kubernetes Autoloc. Yeah, the question is Falco is deployed as daemon set. Then what's the daemon set and what we are doing and then how we are managing the resources, how we are making sure that enough resources are available on the nodes. And secondly, what's the advantage over open source? So, we are working to offer multiple deployment options. One of the primary ones again would be using daemon set on EKS, right? And we are trying to make it as hands-free as possible for customers, the customers that choose to do it, right? And then there will be other options as well where customers will be able to integrate the agent with their CID-CD pipeline and then deploy themselves. So, we are working to provide multiple options. In terms of advantages, I believe it's the primary benefit is the customers, just like GuardDuty, the customers are going to get kind of a ready product where they don't have to really pick and choose what rules they need to enable, what type of settings that they need for daemon set or how they are going to set up the backend in order for backend processing of the events. If the volume of the events is too high, then you need lots of engineering. Even for an individual customer, you may end up deploying lots of processing capabilities in order to manage the volumes of the events. So, the customers are going to be relieved from that work. How it reduced the VPC function start time? I don't have the details on that. I didn't have an opportunity to talk to a member of the team before this presentation. I did read a paper on it. I have to admit I didn't understand everything that was in that paper. But if you come see me afterwards, I can give you my contact information I can find. Yeah. Yeah, as much as we could. We try to hook into LSM hook points or the internal kernel functions which are invoked after the kernel has read the input arguments. Exactly, it was just an example. There's several LSM hooks that we are looking into. All right, well, if there are any other questions, thanks again for coming to the session. I really appreciate it. Thank you. Oh, and don't forget to rate the session.