 Hello everybody, welcome to this cheeky new talk, it can sound strange but all of this talk is to share with you all the findings of an experiment, finding new ways to bypass a CNCF tool called Falco. Falco is a runtime security project and I should add I'm one of its authors. My name is Leonardo Donato but people usually shorten my name so feel free to just call me Leo. Here's a photo of me moving hands like a real Italian. No promise, Falco swiped to the first one guessing the exact meaning and writing it to my Twitter direct messages. What I do daily? Well, I'm an open source software engineer at CISDIG where my daily job is to guess what write code for Falco and maintain it. Maybe you confused at this point what a maintain on Falco talking on how to bypass it? I know crazy times to be alive. I firmly believe that challenging your software really are the only way to actually improve it. But just give me some minutes and I'll completely explain to you the reasoning behind this approach. In the meantime, you can find me on Twitter where I tweeted these slides with the nickname LeoDido. Feel free to drop me a line, follow me ask questions about Falco ABPF kernel, whatever, no problem at all. First things first, where does Falco come from? Falco is a runtime security project originally created in 2016 by CISDIG, the company that pays me a salary, I don't know why. Then, in 2018, it was donated to the CNCF, a branch of the Linux Foundation, thus becoming the four CNCF sandbox runtime security project. It started to gain some serious traction to the point that the other maintainers and I had to even start weekly community calls, to which, by the way, I invite anyone to participate every Wednesday at 5pm Italian time. This past January, Falco was the first ever runtime security project to be promoted to the CNCF incubation level. The next natural step is to bring Falco to a level of maturity so as to make it possible to graduate it. But in order to do this, it's absolutely essential that we identify and assess its faults. It's also with this in mind that I talked about this experiment in this talk. Stressing our software, challenging its limits, so to be able to improve them is the fourth step. And I invite anyone here to try Falco, challenge it and report back to us and to the Falco community the findings so we can gain more and more awareness. In the end, breaking things is the best way to understand how to fix them, isn't it? Anyways, the plan for the next 20 minutes is to look up photos of my beautiful homeland in southern Italy that I miss so much, remembering the good old times when we were all free to travel around and pandemics were only a movie thing. In the meanwhile, I'll approach the reasons behind the choice to stress Falco, why I think this is the best possible approach to make our runtime security software more and more solid day after day. In order to do so, we also need to briefly get skinny on how Falco solves the runtime security problem, even for the cloud native environments based on Kubernetes. And the way to do this is by looking at how it actually works under the ills. I mean, that's the way to go when you really want to understand things, not to mention when you want to bypass them. We'll also reflect a bit on the security landscape in general. Shortly after this pitch, I'll demo some of the possible ways to bypass Falco, developing them as accounted for the most of the work that went into preparing this talk, so please be kind with me. I'll also present some mitigation and even fixes that we put in place to prevent you from using my ideas to bypass my beloved Falco in the future. Consider it an encouragement to find and develop your own. As soon as this talk got accepted, I told my dad, look pal, I'm gonna teach people how to bypass the software I built. I had a period of details, but what ensued was a fairly predictable and stereotypical Italian family drama. He started yelling at me and shouted that to do such a thing was a grave and unforgivable mistake that since they called the show dev, fired me for even conceiving something as twisted as this. Indeed, according to him, presenting in public the shortcomings of the software I contributed to creating amounted to admit some sort of failure, either mine or of my team or of my company as a whole. In his eyes, it was an admission of defeat, the product of a subversive attitude. I tried in vain, as he was still shouting, to explain that a constructive attitude begins with a quest for our weaknesses. It's the force and necessity stepping in gaining self-awareness. It's only by getting to really know our limits that we can hope to overcome them. It's only after having looked into the ravines long enough that we can imagine ways of feeling them. It's a matter of epistemic hunger, of knowing that a path to real improvement is almost inevitably convoluted, difficult, even painful. Software development can be conceived as a process whose main feature is the continuous strive for the attainment of perfection that is by definition unattainable. The life of software is really over, only when it's no longer fixed or improved. Any software will always present some glitches, but broken does mean unfixable. Software is improved over time and made more robust and reliable by this continuous process of tinkering, playing, fixing, improving. This is the case in every subfield of computer science, but even more so in a real mass dynamic as security, in which every day carries the possibility that our solutions suddenly become obsolete and in a need of a fix, a patch or generally an improvement. When I think about this, often Kitsuji pops to my mind. Kitsuji is an ancient Japanese art consisting of repairing broken pottery by mending the areas with powdered gold, silver or platinum. It threats breakage and repair as part of the history of an object rather than something to disguise. In a very real sense, this Japanese practice that fascinates me captures the core of what I said so far. Indeed, developing software is a therapeutic process that inscribes the way we deal with imperfections, glitches, fisheries, into the essence of our software, making it more precious and more solid as a result. But let's come back to real world now, okay? So what does security mean to me? How to characterize the security problem? I don't know about you, but I personally would want anything to happen to my system without me even noticing it. At least in this context, I want to be able to determine what things can happen and what cannot. Since preemptive control is not always possible, I also want to possess deep visibility so as to be able to know what just happened as soon as possible. Basically, I think of security in terms of two words, prevention and detection. What does these two words have in common? Policies. Both concepts use some kind of policies to describe the allowed or disallowed behavior for a process in terms of system codes, their arguments and those resources assessed. The main differences is this. The first word prevention is connected to the concept of enforcement. In other words, that allows some actions to take place to happen at all because of some policies. Tools in this category such as Seccom, Seccom BPF, Selenius, UP Armor, or even authorization mechanism like Kubernetes Airbag or the Kubernetes policy based as mission plugins, change the behavior of a process by preventing actions, sometimes schools, from succeeding or in some cases also killing the process trying to perform those actions. On the other hand, the second approach to security is to use the policy to monitor the behavior of a process and notify when it steps outside the policy. Falco belongs to this second group, a topic that especially in cloud-native environments has not been solved yet. Maybe you're wondering now, can Falco solve all our security concerns? Honestly, not at all. Software is made of layers, so is security. This is even more true in the current cloud-native environments. As you can see this came on the slide, pod, services, ingress, containers, and so on up and down. These are environments that embrace change as their fundamental component, constantly opening the doors to the unknown. So Falco exists to enable us to detect intrusions, malicious behaviors, security threats in general at runtime. Let me also be clear that since security is made out of layers, my suggestion is to combine Falco with prevention tools, thus applying a defense in-depth strategy. The idea is to defend a system against any particular attack using several independent methods. The first specific example that comes to my mind is using Falco to identify malicious attempts to assess sensitive resources by observing the world behavior of your environments and then write appropriate enforcement policies, for example, up-armor profiles for your containers that will prevent episodes to happen again in the future. Basically implementing a sort of feedback loop, continuously improving the security posture of our environments. So what's runtime security? With this tour, I refer to the practice of using detection tooling to detect unwanted behavior such that it can then be prevented using prevention techniques. So to continuously improve your threat model, it's last line of defense. This means that in case it is not able to promptly detect and alert you of malicious attempts, you will end up like sitting ducks, defenseless. Let's try with a metaphor. She's clearly look how beautiful she was. I have locks on my door and also an alarm, think of them as compliance rules. But she alerts me when things aren't going right, policy violations. When little bro is misbehaving, think of him like a compromised insider, or if there's something suspicious happening outside or nearby, anomaly or zero day, she detects runtime anomalies in my life at home. And she was, may she rest in peace, very, very serious about her job, as you can see the picture. So why prevention is about looking the doors, detection is about continuously monitoring the inside and the perimeter. Still, bad people were able to enter my house and put her to sleep. I think I made myself a bit more clear now, sadly. In case you're still not convinced that there's no such thing as perfectly safe and perfectly secure software, and the layered approach is the best possible strategy for software security, just allow me to emphasize a few points. How to trust cloud providers and their ability to detect malicious or compromised insiders? How to prevent an undisclosed vulnerability or a zero day bug that allows someone to break into your system? I mean, CVE still happens. Linux, Kubernetes, name an open source software that has not experienced a CVE and suffered an exploit. Falco had one too. I think it's clear now that prevention tools are not enough alone. Neither are detection tools alone. Good computer security is hard. It requires a lot of technical knowledge and takes a lot of time and effort. And at the end of the day, there will always still be weaknesses in your system. This means that you should never be below the into false sense of security thinking that you're working on a secure system. Your system can be more or less secure, but a perfect one doesn't exist. This does not mean that you should do nothing. It's just about awareness. And surely there are ways to harden your runtime. For example, dropping capabilities, so preventing things to happen when a container starts in a way to harden it. But that's just one side of the coin. When you have a goal of reinforcing the isolation between the container and the host words, using a container more like a sandbox, the main option you have is syscall monitoring, right? Surely somebody could still argue, but why tracing the syscalls? Because in the end, whatever program you run, it will end up making a lot of syscalls. Syscalls are the way programs ask the kernel, where everything really happens, to perform some task, whether the task regards networking, IEO processes and so on. This does matter. Here Falco enters the game. Falco is designed to leverage libsysp events to add a layer of security on top of your containers and your nodes. Its approach for detecting security threats is to go at the lowest possible level and trace all the footprints and the context in which they happened. The libsysp events are a special augmented structure containing info obtained by tracing the syscalls going on in the kernel, plus container runtime and Kubernetes metadata related to those signals. It uses those events to match a rule list that you can define with a simple yet expressive language, as we'll see very soon. At the moment, Falco can receive those events in three alternative ways that we call drivers. Through a kernel module, or with an eBPF probe, or with a Petro's based driver called PDig, that's really helpful in environments where you can't install the other two drivers just mentioned. Anyways, Falco can also use other input sources at the same time. I'm referring to the Kubernetes audit logs in particular that proven to be very useful when leveraged by Falco to improve the visibility into your Kubernetes clusters. So we will ship also an input API that will allow the Falco community to create even more input sources. I mentioned the Falco rules language in the previous slide, right? So maybe you're asking yourself, okay, Leo, but how do we define policies? How do we describe what's a security threat for us? The Falco rules are very simple to write since they are basically YAML, and this makes them very easy to write, to learn, and very, very hard to invent correctly. But that's YAML fold, not ours. Don't blame me. Falco rules proven to be very straightforward to adopt, unlike the policy languages of other tools in the security space we mentioned earlier. I mean, have you ever tried writing Selenic's policies and things like that? In a Falco rules set, you can write conditions, you can group them into macros to reuse later, you can define a list, you can give names to rules, customize their output message, and the details it has to contain, and so on. You can also override and mix things totally or partially. Finally, Falco ships with a default huge rule set. For reasons, we're gonna explore and understand better in a bit. In case you want to jump into reading thousands of lines of YAML, I put a link in the slide footnote. You're welcome. For example, here we have a rule to detect untrusted shell being spawned below a non-shell application. By looking at its condition, we can see that aside from specific applications that are allowed to spawn shell, this rule emits an output taller in case. The macrospawn underscore process evaluates to true, which means an exec VSC school was detected, which in turn means a program has been executed. The name of the current process is one in the shell underscore procs list, so bash, dash, and so on. The current process as a parent one, the name of the ancestor of the current process is one of those listed by protected shell spawning binaries list. The first thing that catches my eye here is the spawn underscore process macro that only checks for exec VA event. This syscall is part of the exec family. It executes a program referred to by the puff name given to it as the first argument, but exec VA has siblings in this family. Let's see if we can circumvent this rule by using a brother or a sister of the exec VA syscall. Are you ready? Demo time! So this is the macro that got our interest. It only checks for exec VA. It s the first operand in the condition of the rule run shell untrusted, so the plan is simple. We suspect that a process spawning a shell through exec VA art remains unnoticed by Falco. Thinking deeply about this rule, I have also other ideas like changing the name of the process spawning a shell to one of the names treated as exception by this rule or simulinking the shell to an unlisted name, but I leave these as exercises at home. In this little demo, we are going to use the current development release of Falco, basically its master branch from VITAB. In particular, we have Falco version 0261-43, 43 commits ahead of the latest release, rather version 2888. Let's check if we already have the kernel module inserted. We have, perfect, Falco can start. Let's start Falco with one of its default rule sets, and Falco is here. Let's verify now the target rule works as intended. To do so, we ll use the event generator project. It s a nice go tool in the Falco security organization that helps you easily generate a variety of suspect actions detected by the default Falco rule sets. I quickly change this command to make the shell it spawns create a chow file inside the temp test directory. Let's check the chow file is here. Let's remove it. To make things more clear, we can easily trigger this Falco rule again. How? This way. Let's create a sim link to the event generator named httpd. Let's create a pizza file which is a simple script that basically calls the event generator run shell command. Let's make it executable. Executing this script, we just created Falco shell detective anyway. In fact, it does. As you can see, there are two alerts, the one that we just triggered by launching the pizza script and the previous one. Now, what if we are going to execute this through execvia at syscon. My kernel is this one. So I have the execvia at syscon on this host because it should be there since version 3.19 of the Linux kernel. It operates in some way as execvia except for some differences in the handling of the path name. Let's see if Falco driver traces it already or not. Oh, as you can see, it doesn't. Hello, we have a very strong candidate for our bypass. I know you all were waiting for this. This is time to write some beautiful C code. I'm joking. I already written it. Let's just take a look at it. As you can see, a bunch of encloses usual. Here I'm basically calling the execvia at syscon telling it to execute temp test HTTPD with these two arguments. Let's compile it. Okay. Let's execute it. Who seems to have executed? As you can see, there is no rule launched by Falco. But do we have the show file? Yes, we have. So it worked but got unnoticed by Falco. So we discover together our syscoles are the joy and torment at the same time for Falco. They are a very powerful mechanism that acts as the kernel API that abstracts all the hardware for us. Falco magic is based upon them and will not be possible otherwise. But aside from the fact that tracing all the syscoles from a user space perspective often is not that comfortable, the real issue is that there are a lot of syscoles. And with every kernel release, they can change, gain a new parameter, new syscoles can be introduced, old ones got deprecated, and things like that. So it can be really, really painful for us as Falco maintainers to keep Falco on track. For example, we still miss the copy file range syscoles. This can be used to bypass Falco rules looking for sensitive file renames. A malicious actor could use it to copy all the bytes of a sensitive file and Falco will not be able to detect it because it only looks for rename at rename to syscoles in the rename macro. Also, rename at true was missing until two months ago. We had the support for it in Falco 025. So we're putting an effort to support more and more syscoles before releasing Falco 1.0, especially the ones that could impact detection abilities of Falco. You can read more about it in the issue I linked in this slide. We definitely need help from the community, from you, to cover as many syscoles as possible. This is a wonderful opportunity to contribute to Falco. Don't miss it. I think it's useful now to quickly show you how to determine which syscoles are not supported yet by Falco drivers. Otherwise, you won't know where to start, right? I provide you this little best script. Don't forget to clone syscoles where the Falco drivers Libsinsp and Libscap still reside, but expect to find them soon in the Falco security organization. As you can see on the right, Falco 026, the one used in the previous demo, doesn't support yet exactly what syscoles have said. Neither it supports copy file range, but it supports rename at true. Then we surely should talk a bit about the idea to extend Falco to something more than syscoles. Yes, there's more. In the Linux kernel, new very cool APIs like IO Ring are landing. The IO Ring is a new asynchronous API with very, very little overhead that aims to overcome the limitations of the current Select, Poll, E-Poll or AIO family of system codes. But I suppose this is a topic for my next talk, right? I imagine it's way more useful now to show you how to add support for a new syscale, When you hit the manual page for rename syscoles, you got no surprises. In fact, as name suggests, they allow us to rename a file moving it between the directories. In this family, we also have the rename at and rename at true syscoles. The rename at syscoles operates in exactly the same way as renamed does, except for some differences in how the file paths the old ones and the new ones are handled. The rename at true syscoles is equivalent to the rename at. It only has an additional flags argument used to specify whether the name should be atomic or not, whether you want to allow the override of a file and so on. Let's briefly take a look at how support for rename at true was added in pull request 1654, so that maybe you can easily contribute your additions to. Let's look at how we added support for the syscale rename at true. The first thing to do is modifying the dryers ppm events public either by adding constants to the ppm event type enum table for the entry and exit fillers of the rename at true syscale and adding an item for the rename at true syscale to the ppm syscale code enum table. Here we also need to define the rename at true flags and the struct for holding them. Now we can populate in the flags table C file this struct. While in the syscale info table C file inside Libscap we need to instruct Libscap about those flags and don't forget to increment the number of supported syscoles at this point in the driver filler stable C file. We need to match the filler constants with their actual implementations so that Falker drivers know which code to execute at the entry and the exit when the target syscale is detected. We also need to represent the rename at true syscale into the driver event table C file by editing the g event info array adding two items one for the entrance of the syscale and one for the exit. Each line should represent the signature of the syscale we are adding so its arguments and its return value. For example here we are telling the Falker driver to look for the rename at true flags struct instance we declared before when it looks for the rename at true flags bitmask argument. Now go into the syscale table C file and edit the reference table for 32 and 64 bits architectures first add the rename at true item to the syscale table populating its value with references to the syscale entry and exit fillers. Then we also need to edit the syscale code routing table adding an item for rename at true syscale and pointing it to the syscale code we are seeing to it. Last but not least we need to define our fillers both for the Falker bpf driver and for the kernel module one. For brevdy we're gonna look only at the filler for the kernel module the one for the bpf driver is very very similar just uses some different APIs for collecting the data and pushing them to the user space in this case we only define the exit filter because we grab the rename at true parameters only when it completes as you can see here we get the value of all the old path new the file descriptor new path and flags and we put them into the ring buffer with val to method so populating a new event that falco will use to detect renames made with rename at true syscale at this point we are done but good luck with the compilation now in the falco default rule set we also ship this rule in the slide to let you detect when someone is using the package manager inside your containerized environments it's very simple it files when a process is spawned inside your container and its name is one in the list you can see on the left so apiti get apiti apiti and so on let's bypass it now this time will not use unsupported syscales rather something way simpler so this is the falco rule that triggers when someone is launching a package manager inside your containers as you can see there's the spawn process macro but this time will target the package management prox macro that monitors if the process name is one of the element in the package management binaries list so start again falco and give some space to the standard output where by default it emits alerts when we run a new boon2 container spawning a shell into it falco detects it but when we run apiti update you can see falco promptly emitting the alert of the rule we're targeting let's exit from this container since we know alpine is the most used basic image in the world let's try with an alpine we know that the package management prox macro checks whether the process name is equal to apk in this case let's simply try to create a sim link named pipo to it if we run pipo update we see that falco do not detect we're launching a package management process in this container just because we called it pipo anyway if we actually install something falco alerts us that someone somehow modified a file below a binary directory but it does not tell us that someone used the package management tool of alpine in my opinion the rule we targeted has flows and it should be more precise as we've just seen maybe we need to make falco rules also monitor the sim links we're thinking to build something inside it that automatically does it this way we don't have to edit all the rules and we don't have to remember to also monitor the sim link every time a rule looks for a binary or a file etc but we need to make a bigger conversation and we'd love your feedback too so please join the falco community codes we proved that falco alone is not enough but this is something we all already knew further more its ruleset or the one you're using can be incomplete or ineffective there's a huge interconnection between different rules maybe you bypassed the one detecting the usage of package managers in containers as i just did but then the one monitoring binaries directories alerts you so think about it when you disable a rule maybe you don't need it directly for your use case but that rule at least notifies you when someone broke the rule monitoring your use case for example a rule that detects the execution of a specific binary is okay alone only if it also monitors the copy the sim link and the rename of that file even the open syscall because someone could think to cut that binary into another file even more a very malicious actor could pipe from the outside a base 64 encoded binary and regulate it line by line inside the container and how to monitor bash scripts with their shop banks better prevent too in that case like removing bash from the container reducing the attacks of face by creating the container image from scratch not from alpine even if we love it then put inside them the read only binary that has to act as the entry point of the container make only one specific path writable and also restrict it to contain non executable files with flag not set finally you can create a falco rule that monitors the only binary executing in that container is the entry point you intended this way you should have reduced a lot the odds of an attacker now another little bypass demo very different from the others we've just seen its last one i promise for this demo we don't mind selecting a particular falco rule since we're ambitious and we intend to be able to bypass all of them anyways for the sake of this demo we're gonna select this rule it detects when something runs with the privileges of the owning user or group which means it has set with your set gd bits set i'm not using falco built from source this time rather i already installed the falco from its depth repositories on bin tray let's now trigger the rule i create a leo file in the temp directory i start falco and then i set the group id on temp leo file as you can see falco alerts us maybe you don't know but when falco get installed it puts the default yaml rule sets in the atc directory but also it puts some very important lua files inside user share falco lua the output dot lua file should catch our attention let's edit it we commend this line which is the gate from which the output methods of falco pass through save the file and exit if falco for some reasons restarts it will not send any more any output to us look no output i know to bring on this attack the malicious user should already have a lot of privileges on the target test it should already be able to manually kill the falco process for example but that's not the point the point is that it's way more likely that malicious attacker will do more selective things in the output lua files that we edited like excluding only some rule alerts or even modifying the output messages this way your own tension will not be drawn because you'll probably continue to think falco is still operating perfectly as intended and anyways falco should at least alert as it has been restarted from who and now don't you think so how to solve the problem of lua outputs we already did it no worries we have wrote the complete outputs module of falco in c++ starting with the next release of falco the gate of the outputs will be built inside the falco binary we plan to also rewrite the parser and the engine completely in c++ this will enable us to reduce the dependencies surface of falco removing lua related dependencies too so stay in touch and in case you want to take a look at how we did it i put a link to the pull request in the slide as usual so thanks anyone for being here this talk was really tough even to prepare and to listen i admit but i hope you enjoyed my approach my findings and my considerations see you soon let's open person in the meantime follow me on twitter join the falco calls and let's keep in touch