 Welcome to Cloud Native Live, where we dive into the Code behind Cloud Native. I'm your host today. My name is Whitney Lee. I'm a developer advocate at Tonsu, and every week we bring new presenters to showcase how to work with Cloud Native technologies. We're going to build things, we're going to break things. That's for you, Nigel. You got to break something, and we'll answer your questions. Today we have Nigel Douglas with us, here to talk about Falco Sidekick, the Swiss Army Knife for Cloud Native Security and Observability. So I say this every week, this is an official live stream of the CNCF, and as such it's subject to the CNCF Code of Conduct. So please don't add anything to the chats that would be in violation of that Code of Conduct. It's really easy to know what works. Just be nice, be nice to your fellow chat folks, be nice to the presenters, and be nice to me as the host, please. So friends who are joining us live, please say hello in chat and tell us where you're from. And as always, if you have any questions during the presentation, ask those. We're hoping this one feels more like a conversation than a presentation. So with all of that, finally, I'll kick it off to Nigel Douglas and Nigel, will you introduce yourself please? Yeah, so hi everyone, thanks for joining. I'm a developer advocate here at Cystig, and I work around the open source technologies, but mostly with Project Falco. And I'm super excited about today's session. It's an opportunity to talk around Falco rules, specifically Falco Sidekick, and trying to demystify kind of what is the intention of the Sidekick project. Is it an observability tool? Is it a relay tool, or is it more than that? So yeah, we'll go through all of it today, and we're trying to keep it conversational. So if you have questions that are relevant to the topic and you're curious about Falco Sidekick, just ask at any time, and I'll try to answer those threat session. Awesome. So yeah. Ready for me to share your screen? Yeah, absolutely, let's get going. Let's do the thing. So if everyone can see my screen right now, we should be good to go. So right now, what we're looking at is the Sidekick UI. So Falco Sidekick, just to give some kind of context around it. If there is Falco Sidekick, which is designed to handle the metadata, those events that are coming from Falco, and generally speaking, send it to third-party web book endpoints. So it could be... Maybe I interrupt already. Two things. Will you zoom in a little bit please? And then will you also just give a basic overview of what Falco is? Of course, I kind of jumped the gun there. So with Falco, Falco is a CNCF incubation project. It's a detection tool. So if you think about tools like Snort and traditional Linux endpoints, think of Falco as the Snort, but for Kubernetes and Cloud Native. So let's say if I run an arbitrary command, it could be to make changes to a file system. It could be changes to a user. Anything from deletion, creation, a bunch of CRUD actions, we should be able to detect those in Falco. And then there's... Yeah, sorry. I've always thought of Falco as like a runtime security tool. Is that an accurate... They're like watching system calls kind of. For sure. But it's not just limited to that, but definitely the terminology of runtime is the best way of thinking about it. So my good colleague Tomah, he built this visualization view to kind of just demystify, I guess, what the rules and how they work. So we have default rules that we provide and some of them are based on system calls. So that would be the out-of-the-box approach with Falco. So you can, for instance, what you mean by system calls for those who don't know is in a Linux system. Every time you make changes, again at runtime, at real time, whether it be a change to a file or change to systems, it generates a system call. And by hooking into and interpreting those system calls, we can therefore monitor what is happening inside an environment in real time. But we also have this concept of GRPC plugin. Now, we don't need to go so heavy into the RPC plugin, but the idea is it allows since Falco is an open source technology, you can build your own plugins to interpret real-time events, not just system calls, but from other real-time event sources, such as AWS CloudTrail, or GitHub, or Octa, and even through the Kubernetes audit log that's generated within Kubernetes itself. So system calls are powerful for showing us everything that goes on from a system that's generated through the kernel, but also if you wanna collect events from the cloud, from other services, you can build your own plugin or use existing plugins we've designed and, again, detect insecure behaviors from that. Excellent, thank you. Thanks for bringing me up to speed. So you specifically wanna talk about Falco Sidekick today, which is a sub-project of Falco. Is that right? Yeah, so we looked at the Falco project and many people will say, and they have every reason to say it, is they'll say, Falco, it's great, it detects some insecure behavior in my environment, but that's kinda useless if I have to manually go out and make changes myself. So for instance, if I detect that someone deleted a file, okay, but now I need to take action on that. So there's different ways that different organizations will take action on a security incident. Some will just want to send a webhook to an endpoint. So if we look at the endpoint view here, that Falco Sidekick was designed to have it's got 60 plus integrations and this number will only add up. So think about webhooks as a way of sending a notification, sending that metadata associated with the Falco alert to those arbitrary third-party endpoints. Again, when I say arbitrary, I mean, whatever one you want to send it to, you have the options here. So if it's gonna be Slack, you wanna get a notification and you're an SRE team and you wanna be notified that something happened, you can then take action once you get that notification. It could be sent to a Discord channel, it could be sent to a data visualization back end. So something like Datadog or Dynatrace or Prometheus, if you're just doing everything open source, you have all these options for sending both events or doing the monitoring of all activity associated with Falco itself via the Falco Sidekick add-on. So Sidekick is part of the Falco project, it is maintained by the same team that manage Falco, but think of it first of all as a forwarding tool. So if I have a nice little diagram here and if this is big enough for everyone to see, you can push the events that are coming out of Falco through this little pretty Falco Sidekick UI or endpoint and that gets hit to the destination. So it's pushing out the events to a webhook endpoint that can actually receive those events. Now that's one of the use cases. Another use case is actually using Falco Sidekick as the endpoint itself. So not everyone wants to configure Prometheus or Datadog, maybe they want to do it all natively through Falco and that's a perfectly justified scenario. So how did it work before Falco Sidekick? And that's I guess what I wanna show in the terminal here. So I have a, I put together a nice little GitHub repo, some something that people can use themselves if they want to do it after the session. So right now, if I was to look at say passwords, I think I have a password view, let's check a wrong address. So I'm just gonna go back and we'll go to the Falco Sidekick repository. And from here, if I type in passwords, we've got our example here. So let's say I, before using Falco Sidekick, what I would have probably done is you see this terminal window and I can blow this up. But again, you don't need to read this because it already looks like a bit of a jumbled mess. So what I'm doing is I'm saying kubectl logs, I'm using the dash F to check a file. So the file is running, it's in a container in Falco. It's got a specific label associated with the name of Falco. So I'm looking for anything that is triggered with a warning priority and it's gripping for anything that's got pride keys or whatever. That's the rule that's being triggered. So I could run something like kubectl logs. And then when I run, for instance, the action, which is to find anything in that root directory that's got this ID RSA, basically I'm sniffing for SSH keys. So I could open up another window. I could say I'll open up this window here and I have my second window just below it so I can make that just smaller. Again, the context isn't that important here. The most important thing is we prove that as a system call is interpreted, it will generate the output on the other window. So when I hit enter, oh, was I running that correctly? Let's, oh, I know why. So I'm going to terminal shell into a workload. Once I've created the workload, I should be able to exec into that Linux workload because right now I'm just on my host endpoint on my MacBook. So I'm going to terminal shell into that container. And from there, if I run the find command, you can see the output comes out in the terminal. So in essence, Falco detects threats in milliseconds because system calls are generated in real time. So we can interpret that. We get an output and it tells us warning, grep, private key or password activity found and all the associated metadata with that. If you're wondering where all that metadata is coming from, we can go back to that, this view here. You can look for the rules. So what was it again? It was something to do about private keys. Yes, searching for private keys or passwords. And in here, you can, if you want, read the rule logic that's actually in a rule feed, which shows you, for instance, the what it's being detected on. So it's looking for spawn processes, anything that's listed as a grep command. We have a list of prep commands and we have a list of private keys or passwords. So we say in this scenario, if there's a find operation and it was a spawn process, again, it matches anything from those grep commands and private keys. Then, and it has to be obviously an ID RSA or DSA or ECDSA, if it matches that granular context, then we'll trigger detection and the output what you saw in the other terminal should be all of the output fields listed here. So to show it like in a system call, we can take any granular activity relevant from the Linux kernel. So when we go back to that view, you can see, for instance, what the command that was executed, we can see what container image, what was the status on it. For instance, it was a CentOS pod. We get the context from Kubernetes namespace, such as like it was creating the default network namespace. So so much context can be grabbed, but this is the rule logic. Now, you can probably tell already, all I've done is triggered one command and I got a detection. But is this scalable? Can I? That's exactly what I was wondering. Like when I think about logs, I think about, or system calls are both, I think like across a whole system, like the amount of stuff you're looking at is so huge. So how is that performant or how is it scalable? And that's exactly it. So like, you know, doing this, even this exact command I was running, yeah, I could kind of address it by saying, okay, show me every case where I'm not looking for a specific grep output, that way, every other command that's generated, let's say I go up here and I run the command and I run it three or four times, we will then see in the other view, I have all those other outputs, but that's still not really scalable. Really, all I'm doing in this scenario is I'm tailing the output of those commands I ran. So with Falco Sidekick, if we were to go to the Sidekick UI, if you were to natively rely on Falco Sidekick, you would notice now I have five or six outputs for those five or six arbitrary commands I ran, which was the grep action against. Or the sidekick action. Will you describe this view and what I'm looking at exactly? Yeah, perfect. So in its most simplest state, if I was to go up through my repository, you can see when I installed Falco, which I said, look, install via Helm Falco from the repository, Falco Security Falco. I also want to look for TTYs at Trues, basically give me the outputs in real time. So the time to live or whatever is gonna be real time, but you can see I've set a bunch of other flags. So this proves the Falco Sidekick is natively part of the Falco Security repository. And in here, there's a few things I can do. So the first one, as I said, well, I'm gonna turn on Falco Sidekick because by default it's not enabled. So I've used a feature flag to turn that on because again, not every organization may wish to use it, but we've proven already that there's a genuine need for Falco Sidekick. Then I want that visualization element. So I've said, turn on the web user interface, which is what we're looking at in the UI. For this, it needs a backend that's gonna store, you know, the events that we're triggering. So if I wanna have a historical view of all the events generated over time, that's a much more scalable alternative to just tailing to looking at events in real time. So in that case, I've set up a Redis backend. So that's my storage and it's, oh, actually in this case, I've turned off storage. So I do have all the events in the UI, but if I were to shut down this cluster, I would lose that activity. That's fine, it's fine for just this demo. And then finally, when we talk about webhooks events, we wanna send to third-party endpoints, you can see I'm posting it to another local element called Falco Talent, which we'll discuss in a short while, but you could set a webhook to forward events too, as we mentioned earlier, Slack or infrastructure service alternatives or functions as a service. There's over 60 possible integrations of FalcoPsychic, so we can come to all those later. But these last Falco CTL, this isn't so important, this is really just about turning on those different libraries that may or may not be enabled by default, which are, for instance, the stable rules you get by default, but then there's also rules in incubation, sandbox deprecation. And we do this lifecycle management to ensure there's no spike in false positives out of the box. So yeah, that's where we're at so far. So here's how you configure it, it's a single command and I get Falco installed, I get a nice UI, and if I want to expose it, you see I just, this is probably hard to read, but all I'm doing here is I'm basically just saying, CUBE CTL port forward, the FalcoPsychic UI in the Falco namespace on port 2802. And in my case, it's, look, I don't even have TLS verification. So if I wanted to cut the connection, I go back to UI and it will not work, it'll be gone. But if I go back to my UI and start the port forward again, I should be good to, yeah, resume my session. Cool. We have a question in chat. Okay. When using a Helm chart to install Falco and Kubernetes, is it necessary to configure it after installation using the Helm chart? This includes pre-configuring Falco and any required customization of the configuration. Oh, no, I guess we can do that right now. So if I was to go into this, exit out of this pod, and you can see that I have Falco and Sidekick running, if I was to do Helm, let's say, Helm on install Falco-n Falco, that should really remove all components associated Falco. So when I get pods, you know, look, I have talent there. It's a different component will come to that in a while, but I've installed everything to do Falco and Sidekick, and you can see that even the port forward, all that's crashed. So the question was, do you need to do any pre-configurations? No, not really. What I've done, in my case, is I have just this custom rules file, and the reason I did that is I like to build my own rules, and I like to load them in when I install the program. So as long as I'm pointing to direction where I have my custom rules, then everything in this command will work out of box, the same for you as it would for me. And if you don't wanna create your own custom rules and you wanna use just what's in these three repositories, which are all accessible via Falco CTL, then yeah, there's no need for you to pre-configure anything. I don't need to install Falco, modify Falco before installing Sidekick. I could actually just grab this. I could go back in here. Now, I will need to make sure I'm in the same directory as the custom rules that I mentioned. So if I go back a directory or two, I think in here, maybe just one directory back, you can see I'm in a directory that has custom rules. So when I run this helm install, you should be able to see that now when I run Cube CTL, get pods-a-w to watch, you can see it should install all the components, Falco, Sidekick, the lot. And that's pretty cool. And so Suresh Kumar, part of their question is, it sounds like do you have to use the helm chart? If you install it with helm, do you have to configure it with helm? Or can you configure it another way? Is it how I interpret that question? Yeah, sorry. And one of the things I jumped to in a conclusion here, I guess, because it was a CNCF session, is I'm using helm because it's Kubernetes is what my environment is that I'm presenting right now. However, if you were to go to the Falco, Sidekick documentation, let's go to here. No, this is for Talon. If I go to Falco, Sidekick, I guess it's find is probably the Sidekick page, is you'll notice when you go through the talks that not every scenario is gonna be using helm. Some people are running it just using Docker. So you could go on a standalone Linux VM and actually run both Falco and Sidekick via, and you can see here where you can do it manually, where you W get the package, you make an executable, and then you can run it from a specified location. Yeah, there's a bunch of ways to install. The reason I used helm is helm is the standard, I guess, the most common way that people are gonna deploy a workload in Kubernetes. And it just made sense in my mind to use the helm approach for deploying all my custom things in a single command. But of course, you can do it in different approaches. And one of the examples is via Docker if you want or system D. Awesome, thank you. No problem. So yeah, so far in the session, what we've covered is triggering detection. We've talked around that custom rules briefly. It's not so important to our session, but it's just if you want to load in your own rules, you can do that. So what we wanna show is where is this sidekick UI useful? So as I mentioned, if I wanna start up a port forward, I just run a single command. Now when I go to UI, I have all my historical events showing up here. Now, okay, see, because I didn't set up a Redis backend, I've lost all my events because I have no storage persistent to hold these events. So let's generate some new events. If we went back into here, we want to create a privileged workload. So I go, oh yeah, I'm just gonna go back to our repository and we'll create the privileged workload. Now again, all of these commands you can run, there's no issue with trying to simulate any of these scenarios. So if I say apply dash F, okay, here's my workload. I just called it dodgy, it's something, it's set to true. So if you can avoid creating workloads as privileged, probably the best approach is to do that. So just like that, if I click apply, you can see, oh, the workload was already created. So if I was to do delete, we'll show it delete from our environment. So what I want to show is an example of sidekick being used as the response engine. So of course, there's so many different endpoints we can configure it to, we can configure it with Slack or function service, we mentioned all those. So I've set up four tabs to show a native use case for Falco rather than you configuring your own complex function as a service, you know, automation scripting. So in this tab, what I'm looking at is kubectl events in the default namespace and I'm watching it. So I'm watching in real time, all events that are generated in the default namespace. If I run that again, you can see it's, you know, I'm watching everything that happens in this namespace. So if I create a new workload and probably want to do this again here, I'll clear and I will show all the pods in the namespace default that have labels and watch that as well. So right now there are no pods in the default network namespace. So what I want to do is I want to create a privileged workload. When I hit enter, two things are happening. It pulls the image and it creates the workloads. We know that workloads being created and I'm watching the events coming from that network namespace and I can see the pod went through its stage of pending, container creation before running. By default, it took us two seconds to create the workload and it came with two labels associated with the pod. Now I have a response engine called Falco Talon. It's still very much in its early stage, but it's out of the box. It's designed to interpret events that are coming from FalcoPsychic. So as you can see in this simplistic graph, an event is triggered in Falco. FalcoPsychic works as a forwarder. So it forwards the event to the Talon endpoint. Now of course you could natively just use Talon to plug into Falco, but what I want to prove is that using this webhook approach, we can send Falco events to any arbitrary third-party endpoint if it's got a webhook address that it can receive on. So we'll show Talon in its glory here by doing different response actions. Now we have documentation for these actions, things like labelizing workloads, terminating, gracefully terminating a workload. So again, if we see a suspicious action, we want the sidekick to forward an event to an endpoint, which is Talon, which will terminate the workload. Or we could enforce, for instance, Kubernetes network policies. We could shell into a workload. We can run, again, custom scripting, which is so cool, get log output. These are so many different options. We can delete Kubernetes resources, all as response actions. Now I don't want to focus too much on Talon because today's session is about Falco sidekick, but we want to prove how these webhooks are playing out in real time. So I have a newly created workload. If everyone's following this so far, we have shown that the workload was created. It says killing, but let's just check. If I say Qtl, get pod-a. Yep, I still have the pod into default network namespace. So I want to exec into that container. So let's just check our script to make sure we're running the right commands. So if I say exec, oh yeah, okay, cool. So we're bashing to that container. So here I'm gonna run exec command. What happened here is we see now the container has a third label, which is set to suspicious equals true. So what happened was in real time, Falco triggered an event, which was a terminal shell into a container. So if we go back to the UI, we can see the last event that was triggered was terminal shell in container. We get all the lovely context. We know it happened as exec CVE. It was on the default namespace. We know it was against that pod we mentioned a while ago, dodgy pod. And if you want me to zoom in and make this easier to read, you can see it here. So I've got all that metadata, that context. We know it was a bash command. All that happened instantly before that all we did was launch the privileged container. Now we've actually terminal shelled into the privileged container. And we know it's privileged because we can see the user login of minus one. So going back to the UI, we know that there was this bash command that was action that was taken against it, but we haven't killed it. All we've done is labeled it to prove you can have a response action via Falco Talon, which was to perform a native Kubernetes operation to labelize and it was a successful operation. So now we want to prove where it can be used to mitigate or prevent a threat. So in here, I'm in just a standard CentOS pod that we've exacted into. Now let's take action. So I'm going to go into here. We have a crypto minor binary that we're able to pull. So what I want to do is yet download the package. So this is just a W get command or curl, sorry against the terrible. I've downloaded it. So if we give it a second up, I know why this worked or didn't work. If we go here and I'll just say cube CTL get network policies. I've actually isolated all network traffic already. So if I say cube CTL delete network policy, and I'm just going to remove the pod because I want to show how this operation works in a second. So I think that works. Yep. And you can see it was a native Kubernetes network policy, it's now deleted. So if I was to run that download command, now it downloads because network policy was blocking that network connection a second ago. So right now I have my terrible and I want to still do the same thing again, which was to show, oh yeah, this time I will watch for a network policy. So remember we mentioned there's no network policy in same way. There was no weird labels being assigned by default. I want to go in here into my repo. We want to unzip the package for the crypto mining binary. And then we want to CD into that XM rig directory, which has an XM rig binary in it. Running the operation, running that binary should do one thing. First of all, we can see it would trigger a rule and all the rules I've back links to the visual UI where you can check them out, which is pretty cool. So the first one, I want to run XM rig where it points to a suspicious endpoint, specifying my crypto or my mineral wallet to perform it with XM rig binary. And I'm just going to perform this operation of crypto mining. So when we go in here and we run the crypto mining binary, you can see it's running and nothing happens. So actually maybe it has. So if I check here, oh yeah. So this was it. So I don't have an action specific to the first one because if I go in here, we know the rule was triggered. So we see a detect outbound connection to a common miner pool and port. However, this could lead to a false positive. Even though it is considered a critical priority, let's say I killed the pod every time we saw a specific port getting used. This would be a bad operation to perform. You don't want to have response actions on every single rule because that way you would crash legitimate applications, block workload communication that you need in production. So what we want to do is perform response actions on things that we know for absolute certain are definitely crypto mining operations. Now it could have been on the when the mining binary is detected, we could have run an automation script to remove it. We could have, for instance, isolated connection based on these common miner pools and ports. However, I'm worried that a false positive detection could occur. So going back to the repository, instead, there's another operation that is commonly used when it comes to crypto mining, which is where they run stratum TCP or, you know, yeah, there's different variants of it. Like it could be stratum two to plus TCP, but stratum is a protocol only used for crypto mining operations. So in this case, if I run this command, it will enforce the network policy. So if I go back here, I'm still watching for Kubernetes network policy. I'm still in the container. If I hit this command, yeah, you can see network policy was enforced. And just like that, I now have a Kubernetes network policy called dodgy bot. And what's beautiful about this is we detected something we know is absolutely only going to be associated with crypto mining, which is stratum plus TCP operation. Network policy was enforced within a second without any prior knowledge of how to create Kubernetes network policies. And just like that, you can see it was enforced by talent, not by something I was doing via customer scripting. So if I was to go here, like the operations are all going to be connection refused and then operation canceled because network policies block that communication. So if I were to say kubectl get network policy, I guess, it's going to be called this one. And I say, I guess I'm going to say dash O YAML to see the output in YAML. You can see that it's basically blocking everything for anything that has the matching label that was of the app that was detected in the file code detection. And it's only allowing certain, sorry, addresses within our range. So everything else is going to get blocked. So it's a pretty extreme approach and it might not be a scalable one right now for some organizations. Similarly, it is a Kubernetes network policy. So I'm aware that although it's the default operation, some of you might choose to use Cillium or Calico. And we thought about that when we were building a response engine. So if we went back to our docs, there are options, for instance, to conform with Calico network policy instead of the Kubernetes one. It will give you a broader scope or control over the traffic you want to allow or deny. But is there any questions on this so far? I like to recap what I saw to make sure I caught it. So what happened was dodgy pod, you have a Falko rule in place that if you see any pod run this certain command that you know is associated with crypto mine, it's going to trigger a network policy that will prevent traffic from pod from talking to any other pods. And it will happen immediately, cool. So then I guess this is maybe splitting hairs, but why would you have a network policy as opposed to just killing the pod or? That's a very good point. So if you had a more grounded network policy, you may choose to just say block the fully qualified domain port or IP associated with that suspicious C2 connection. That way in maintaining stability, the application doesn't crash in production and cause us to have loss of operations. That would be one of the reasons a lot of organizations focus on zone-based architectures and strict network policy enforcement to ensure that that production applications communicate only what they should communicate and anything else should be forbidden via network policy. That's one of the operations examples. But yeah, the last one might just be to terminate it based on this is a really serious thing I'm concerned with. Let's terminate it, send a notification via Falco sidekick to our operations team and then they can focus on, okay, let's investigate what happened and then redeploy our workload. I'd rather stay on the side of safety than be allowed for data loss operation to play out. So that's a great question and that would be one of the reasons. Cool. And then we have a lot more activity from Suresh. Thank you for being active Suresh. Suresh has a lot to say and ask about Helm and Falco. So in general, official GitHub repositories serving as Helm chart repositories are open source and susceptible to hacking attempts where malicious actors inject viruses to target Kubernetes, potentially impacting business operation. And response utilizing Falco to monitor and mitigate such attacks via Helm chart on the Kubernetes system can enhance security measures. And then Suresh Kumar goes on to say, in this scenario, I'm employing Helm charts to deploy various open source software on Kubernetes, including Istio server, Kata, Carpenter, AppDynamics, cert manager and cloud provider secret stores and so on. My objective is to reduce vulnerabilities and thwart potential hacker attacks on the Kubernetes environment. Can Falco provide assistance in defending against such threats? Yeah. Yeah. Well, as for vulnerability analysis, you're better off looking at the tools that are designed for that. So we have, for instance, Aqua security, they build trivia or detecting if there are known vulnerabilities in these open source projects. As I say, they're only a known vulnerability when they're disclosed. So if no one identifies it, there's not much we can do. That's where Falco comes in useful. So if we're seeing, for instance, Istio communicating with something it shouldn't, then yeah, that's a brilliant use case for Falco. If we go back to our dashboard here and you're just to isolate for something that you think is unusual, we could see, for instance, okay, there's a spike in unexpected UDP traffic. So if we define what we expect from our applications, if we properly architect what Istio should be doing, what Falco should be doing, what all of our tools actually should communicate with or expected behavior, then we can certainly isolate down to something like unexpected UDP traffic and then go back to our events to, if I was to, yeah, now we can look at the concept. That makes me think of, like I was, when you were talking about, in your demo, when you were talking about, like here's a suspicious system call, but there are reasons it could be used. So I'm not gonna make a rule around that and so because I might cause false positives, hearing you talk through that made me think, like, ah, it's probably actually a hard problem to solve to be like, what level of security here do you want or how do you know what exactly to block? So you're not blocking too much or too little. Yeah, it's a tough line to draw because you don't want to be afraid of disabling workload monitoring because you have a lot of detections. That's what needs to be fine tuned. And I think that's also a really good use case for Falco. So like if I looked at this dashboard, for sure, if I remove that filtered context for unexpected UDP and just look for everything, I would look at these visualizations within Falco Sidekick UI and say to myself, well, why is there a lot of outbound connections to common miner pools and ports? There's two scenarios. One is either there actually is a mining operation and that's something we need to take action on or we're getting false positives of which do we need to fine tune it? Do we need to better understand why is a workload communicating with those? Because it might actually genuinely be a compromised workload. But if you are seeing things like unexpected UDP traffic and it's a legitimate endpoint, then what you want to do is go back in here. If I was to exit out of this workload and maybe CD back a few directories to that custom rules file I had in the right direction. Oh yeah, no, I'm not in the right one. If I go CD desktop and then CD into Falco, Falco CD, sorry. What was my direction? Oh yeah, I think it was called CNCF. Oh, what have I done? If I was to go back, oh yeah, sorry, CD webinar. In here, if I was to cat that customer VI into that custom rules file, you can see that, for instance, you have macros that you can define and lists that you can find. So if I was to go into my custom rule file and we were getting a lot of detections to a known IP, you can choose to say, okay, I expect that workload to communicate with that IP address. Do I exclude it from future detections? Do I fine tune the context? Because again, you can look for things like, is the process name not equal to something? So you can use that boonly and logic. So my point is that's where Falco comes in really handy. You don't want to do it based purely on just a system call IDs because that's not scalable. Instead, you need to create these compartmentalized macros. So for instance, if I create something and it sounds simple like AppBank connections to C2 servers, you don't want to just list IP addresses. Instead, you want to have something that says, well, what is an AppBank connection? So we say it is an AppBank connection. That's a macro. And it's an IP. It is matched as it is definitely an IP communication. And that IP is listed in another macro called a C2 server IP list, which is actually not a macro, it's a list. So the macro here is an operation and it's complex, which it can be looking at event types like connect operations, or it's a type is listed in send to or send message. And you can see where I'm getting that. You have TCP context, different ranges that should be excluded. And then the list is something like the IPs, which in my use case is simple, it's just two IP addresses. But in a more scalable one, you might have a full IP list. And you wouldn't want to write that in the rule. You'd write it in a third-party object, which is a list, and then reference the list in the rule. So if that's a helpful answer, what I'm trying to say is Valko's really good for defining and controlling complex lists to ensure actually a lower list of false prospects. And that's where the fact that Valko is like open source, it's a graduated CNCF project, is it not? Where the community has built these macros and common use cases have already been super defined. And there's something that as now a later adopter, you have these at your fingertips to add into your own organization. So that's, community is awesome in general and in this case too. Now we have a quick comment, which is that Anderson K says that they're learning a ton and ask how often do we have these sessions. And then we also have a comment that your answers are superb, thank you. So some love in the chat for you. And we have the Cloud Native Live is a show that streams about once a week on Tuesdays and or Wednesdays. And you can find upcoming episodes if you go to the Cloud Native Computing Foundation, the CNCF YouTube channel and click on the live tab, you can see what episodes are coming up. Cool. And yeah, just a follow up on that is if the content's useful, I sent it on a short while ago to you, but it was that CNCF blog on specifically this response engine. That's something you can maybe share in the group because what we do wanna get is people more engaged with these response actions, you know, demystifying I guess, responding to threats in real time with Falco and that it's not limited purely to just detecting threats. But yeah, one, I know I actually didn't get around to covering it, so I'll do it now is the terminating the pod gracefully. So we have, for instance, we talked about network policy. I'm gonna go back in here and I'm going to delete that network policy. So QCTL delete network policy called dodgy pod. Notice how it took the name from the pod itself. So now, delete network policy dodgy and now it's gone. So just like that, if we check for the state, we now know there's no more network policy. So all communications are allowed in the workload. So if I go back to the GitHub repo is we want to gracefully terminate the workload. So we grow down a little further and you see I have an option here for Kubernetes terminate. Now this example is really simple. We mentioned there earlier in my list that I had this outbound connections to C2 servers. So I want to, I know that IP address is malicious. Let's say it's something more complex. It's a Tor IP feed. And for sure, I don't want my workload communicating through something like Tor relay because that would be definitely associated with operations such as anonymity. You know, they want to use the Tor network to exfiltrate data to a C2, but they don't want me finding out, okay, what is that endpoint that they're sending my personal data to things like passwords, sensitive credentials. So if I were to terminal shell into, whoop, wrong one. So if I were to exec into that dodgy pod that we created earlier and it should still be running, you can see now I'm in, you can see I performed the relabel operation again. If I check for, yeah, the pod, we can see the pod still running and it's got the label that we expected and it's been running for 20 minutes now. So what we want to do is just connect to that C2 address. So if I curl that IP address, we know that that IP address was listed somewhere here. I put it always under C2 server IP list. So if I was to send a curl address, sorry, my bad, if I was to curl that address, which is here and we send that, you can see look, killing operation instantly. So the container, the sent to us pod was terminated and you can see the status change. So you might be asking yourself, like, actually, I'd love to know how are you configuring all this? We were able to do, you know, network policy enforcement, which is usually a hard thing for organizations to get their head around and we do it instantly in Falco, through sidekick automation actions. We see killing pods we're able to do instantly, labeling workloads and many more operations. So if I was to go here, I can CD into Falco Talon, deployment, helm. In here, I have a file called rules.yaml. So if I VI into the rules.yaml file. Will you send me a please? Oh, yes, absolutely. So this should be a little bit more easy to read and the audience can let me know if they're having an issue reading and I can even change the color contrast to something a little nicer, I guess, lighter on the eyes. I think that made it a little harder to read. Oh, right. The contrasty, oh, that's dark. Yeah, this is. There it is, that's it. That's perfect. Yeah. So you can see, like, because it's a native technology, I could have gone down the route of talking around functions of service or something else. And of course you can and I can give documentation on that for how you would use other alternative solutions for automating threat prevention. But in this case, all you have to do is say something like action, terminate pod. The actioner, which we've already configured through Falco Talin, works with Kubernetes native abstractions. So in this case, it's Kubernetes terminate. That means graceful termination. If it's Kubernetes labelize, that's to add labels. If it's Kubernetes network policy, enforce network policy. So these are easy to specify through actioners and action is just the unique name assigned to the actioner and then the parameters say, what is the thing that you're doing with it? Now terminate, there is no, there's not really many parameters. You could say, what is the graceful period? Like wait two seconds and then terminate. But for labelize, it's more specific. So you would say, okay, I want to enforce a label of suspicious equals true. That's what we enforced earlier. For outbound connection, you can do say, for instance, you can specify the output fields to say, I will apply this to any pod that is not listed in Kube system. Cause for sure I don't wanna enforce a network policy that might break communication in Kube system cause it might break my whole cluster. So it's, you can see these rules are really short. They're really tight. I don't have to write a whole lot of complex code. It's actually considered a no code alternative. So in my case, when I wanted to terminal shell into a container, every time the Falco rule, so we say rule of Falco rule, which was terminal shell in the container, if that rule is triggered, which comes out of the box and it's not in Kube system namespace or in the Falco namespace, we will enforce that label. And the label was to labelize positive suspicious, which we mentioned at the top was suspicious equals true. So that is the rule and action or response, which is basically enforce that label. And for the, you know, for the crypto mining operation when we want to enforce network policy, we did something similar. We specify the action, which was to say black and connection, which we mentioned earlier was a network policy enforcement. And then again, it will only apply to something if it's not in Kube system or in Falco, Michael. Oh yeah, sorry. I think there was a question. I have a question. That is, is this, are these actions talon actions or are they sidekick actions? Like, would you also put like go to Slack in here? Yeah. Yeah. So that is a great point. So if I was to go back to the documentation here for talon, if I was to show here, yes, those rule, the rule that Yamlfoil we were mentioning here was Falco talon, but how it all worked was sidekick was able to grab the Falco rule, interpret it, send it to the endpoint, where that webhook endpoint, where it was enforced in real time. And where did you configure that? Great question. So when we go back to our GitHub repo, all I had to do is when I deployed Falco, I just said for Falco sidekick, my webhook address, my endpoint I'm sending it to is Falco talon at 2803. So it's so simple, just add a one line config webhook to say, you know, when you install Falco and sidekick UI, just forward all your event data to a designated webhook endpoint. And by sending it to Falco talon, just leave Falco talon, do all the rest of the work for you. Because we've seen from here, it's not that complex. You know, if I wanted to terminate a workload, I just said use the action terminate pod, which is graceful termination, when the Falco rule and connection C2 server is established. So we're showing a few things here. Falco is very powerful in what it does, but it's not scalable alone. With Falco sidekick, I now have all these events for things like outbound connection to C2 server. But since it, when I deployed via helm, I could just specify what was my webhook address. I streamed all those events to the designated endpoint where Falco talon was able to respond and kill those workloads, which we see in the output here. And I hope- So with sidekick, if you wanted to like alert your team Slack channel that there's like suspicious mining activity, or if you wanted to add more observability around this suspicious mining activity, you would do that at that level. And then talent is the enforcing that we're kind of talking about now. Exactly. Like what you don't want to do is spend a lot of time, especially if you're making lots of changes, Kubernetes, you're scaling up and down workloads. You don't want to have to keep configuring webhook addresses. You want it to be out of the box when you deploy your workloads. So having the option to have a feature flag to say I'm going to deploy Falco, Falco sidekick, and while I'm at it, configure the webhook address to a designated endpoint, and it's not just talent. It could be any one of the, what did we list them as here? 60 plus options, that's really powerful. So again- I have a question that I think I know the answer to, but these aren't mutually exclusive, are they? You can do several of them at once, right? Oh, that's another good point, yeah. So yeah, of course, a lot of them can be used for that different use case, and a lot of people will use them. That'll be more common than not. So for instance, for Prometheus, you can perform more than one operation at once. You can say stream the events to Prometheus, but at the same time, collect the monitoring stats of Falco via Falco sidekick. So it does both things. And then a lot of organizations will say, okay, I'm using Prometheus. I'm just going to use that as the example because it's a very popular open source project to do all the data visualization and metric scraping to show us the health of Falco and sidekick and all the events that are coming from Falco. But while I'm at it, I also want it to go into Grafana for visualization and I need to perform threat response, which is generally people would think of functions as service or serverless operations. So if I get an AWS CloudTrail event, which we mentioned earlier, which we have the ability to detect activity in AWS CloudTrail, I get the event. Now I want to remediate that threat. And while Talon might not be the best choice for AWS because it only responds to Kubernetes actions, I could then configure an external endpoint, such as Tecton or whatever it be, Cloud Run, Cloud Functions, so that I can then kill the threat in Google Cloud. So there are so many different endpoint solutions. There's no point in me talking about them on this call. It would be too many hours as a simple solution, yeah. There is a question in chat, I'd like to get to. Is Falco sidekick, could it make some other operation like filtering or duplication before forwarding alerts to other tools? It's funny you say that, and yes, you can. If I go here, I think if I look at events, there is, I'm not sure if it's in this article or if it's another, that's the only downside, I have a few different pages, but yes, you can strip literals from the output events via here if you want to, to reduce the noise of the output, or you can add additional context through Falco sidekick. These are two powerful use cases for it, but yeah, it does pre-context filtering before it hits the third party endpoint. Although I can't find it here exactly, wrong repo in here, that is definitely one of the use cases for it, but I can send that on. That's a great question. Yeah. All right, we have just a few minutes left. What else do you want to teach us in these final minutes? Yeah, I don't know at this point. So we've gone through, as I say, this is all completely reproducible, everything from deploying the workload to deploying the third party talent to receive those Falco sidekick outputs. We've gone through the legacy approach, which is just running STD out to check for what is the output of those pods. But what I would tell people is if you want to go to this repository, it's just Nigel Falco with the Falco sidekick talent for the CNCF and I can send that on or maybe anyone who gets your recording can go to this address. I've put in both the Falco rule and the associated action in talent. So whether or not you choose to use talent, any rules, you can look them up the ones that I mentioned here and try to reproduce with any of the commands I also specified in the GitHub repository. There are a bunch of additional scenarios I've provided here, such as running a pod as root user, whatever you want to test out, but they're not so critical. I think they're just additional scenarios, but what I did want to leave as a kind of a go home note is that like Falco sidekick, it's like a time series view of everything that's going on. So some people think I have to use for me, Ethan Grafana. If there's not a designated reason for why everything has to be pumped in there, this can be just your dedicated security studio. You know, if I wanted to look at all my events over the last 24 hours, I can do that. If I want to prioritize it by a shorter period, I can say, look, let's look at the last two hours of data. And then you get a two hour window. If I wanted to look at it by, say, for instance, a one hour window, you get the point I can filter it down and get a more granular view by minute of the security incident and pinch into a specific time. So even if one third party tool told you some incidents played out in our environment, I can jump into Falco sidekick, pinch it to a specific minute window and then kind of zoom in on that to see what were the incidents, the critical incidents that played out in that time. You can see at the moment I filter by critical, but you can filter by, if you have hundreds of hosts, you can do it that way. You can filter by specific rules, tags. So if you're trying to meet regulatory compliance, this just makes life a lot easier if I remove that critical context. And I just want to look for a minor attack context. I can say, just look for T1059. And suddenly I have my outputs that are just related to that tactic or technique. Yeah, but if there's any other questions. Will you take that GitHub repo link and put it in the private chat and then I'll put it into the chat on YouTube? Of course. And thank you. Oh, here we are, mirror. And then we have a question from Suresh Kumar. So let me get this GitHub. Suresh says, how does Falco impact the performance and resource utilization of Kubernetes clusters? Have you encountered any scalability issues with Falco and larger Kubernetes deployments? Yeah, no, that's a brilliant question. It's really hard to get exact numbers from exact customer environments because you'll have to, for instance, know how many nodes where they're running, how many rules are they running, and naturally, what kind of activity would be triggering those rules? To answer the question simply, like Falco's very low resource utilization, and I'm not just saying it for the sake of saying it. It's designed with that in mind. There was a CNCF article published recent enough and it's part of Falco's commitment to sustainability in the environment. And it was really about how Falco actually is limiting. We specifically configure Falco for lower overhead. And that also plays into the environmental aspect, which is less resources, better for the environment. That's just one of the stories. But to give additional context, enterprise tools, I can't list all of them, but say, for instance, Sysdig, the company I work for, they built an enterprise suite based on the open source project of Falco. So there are organizations running hundreds of Falco rules that are being detected in real time, again with limited overhead from those. There are ways to reduce overhead and there's obviously ways to increase overhead, such as not fine-tuning rules, again, false positive detections that will lead to large detection set. But if rules are configured correctly, generally speaking, it shouldn't have a significant overhead because we also acknowledge we are only one aspect of your stack. Your focus of Kubernetes should be on deploying applications that are scaling well and we shouldn't be hindering that performance. So although I can't give exact numbers, if I looked at it in my own environment, I'd be using less than 1% of CPU utilization. So that's, yeah. Cool. And we have another question. This one is, what Falco engine, kernel module, EBBF, would you use in a production environment? Oh, that's a really good question. It's up to you. So if I was to look at Falco EBBF versus kernel module, I guess choosing a Falco drivers where you start, there are still to this day, three options. So there's the EBBF probe, there's the modern probe, and then there's, you know, your standard P trace user space program. There's actually other ones. This is what shows why open source is really useful. There was, what is it called again? Google also created a run time environment for Kubernetes and I've forgotten the name of it, which is a shame, but we also have a specific plugin for that. So in that sandbox environment, which is heavily secured, we also have a plugin that works for that environment. The kernel module by the fault is the one that comes out of the box. You can run it as the EBBF probe. There's limited to no performance difference, but this article goes into a good point. Chris Nova put this together and it was about why you would choose the other ones. So for instance, not all kernels, especially older kernel versions, will even support EBBF. So there's no point us pushing the EBBF probe on everyone, because that's just not gonna be an option. But then there are organizations that wish to use EBBF as a, let's say an, they would see this as a more secure option. Then you can choose the modern EBBF probe and you get the same detection engine, the same rule set, same plugins, everything works the same way. So my point is it's gonna come down to those simple questions of what exactly is your kernel? Like what are you using in your environment? And oh, it's a shame. I can't think of the one if I looked up like Falco, Google runtime, I guess. What was the runtime environment I'm thinking of or Falco blog maybe? Because it's an interesting news case, yeah. We're running out of time and we have one more question. So I wanna grab it. We also have lots of nice words, of lots of thank you and great presentation words. So thank you for that affirmation. It makes us both feel really good. But the last question we have time for is are you utilizing any custom Falco rules tailored to environment or application workloads? Say that question again, sorry. Are you utilizing any custom Falco rules tailored to the environment or application workloads? Or it might be a good opportunity to talk about rules in general and community rules versus custom rules. Yeah, that's a really good point. So like I mentioned earlier when I was deploying this, we have our own CTL solution called Falco CTL and this I, for instance, enabled incubation rules. I enabled sandbox rules. And yeah, those are the two I enabled. So we have a maturity matrix now. So if we looked up like Falco maturity matrix, yeah, we should have this publicly documented, yeah, adoption of Falco rules in. So it's a big long read, but the idea is we know that not all rules as time moves on are going to be useful for your environment or may, they might require a large amount of fine tuning. So we shouldn't be forcing that on people out of the box. It gives them a bad experience. So we create the rules maturity framework in a way to show that the projects evolved over time and this is the best step forward. So going back to this view, you can see some rules are, look at all some of them are stable. So these are the good ones. We know they've been, our engineers have given it the thumbs up and they said it's signed off as being a legitimate, safe rule to run in your environment. Incubation are ones that we hope will get to the point of being stable over time, but may require additional work. Sandbox are kind of, we've tested them, you can try them out yourself, but they may not be up to scratch and deprecation are ones we're actually moving away from. And there's good reason for that. Sometimes you create a rule and it could be, I'd say a good example of deprecation might be drift. Do I have drift in here? Maybe it's under sandbox, or maybe I'll just go out of here and I'll just type in drift, but we have different versions of drift rules. And you can see these ones, okay, they're sandbox, but they're disabled by default. So for your specific environment, you may choose to have an example, like determining drift would be considered anytime we change the operation of the container, so chmod operation or open create, which is creating something new in a container. These are all examples of container drift. And what I mean by that is deploying a container, whatever it was originally deployed at, is what it should be. And if someone's terminal shell into container and make changes, that is drifted away from the intended design of the container. So over time, we end up creating new rules, like I think there's one called drop and execute. And if you go into the description, it actually says, look, this is actually to help reduce noise by applying to a specific environment knowledge. We've decided to enforce this drop and execute binary container as a kind of a replacement rule for those old previous rules, which we're decorating, we're moving away from. So yes, you'll notice that actually, as you go through the rules feed that we've made this maturity matrix to make sure you're getting the best experience of Falco rules out of the box. But if you do like something that was there before, we're not just gonna chuck it in the bin. Instead, you can enable it and fine-tune it for your use case. Love that. Okay, I'm gonna start off sharing now, but we did not get to all of the questions. So if someone wants to reach you or if someone who's watching the recording wants to reach you, what's the best way to get in touch to ask any follow-up questions to you? Yeah, it's a great question. I'm part of the Kubernetes Slack. So anyone out there who uses Slack, that's the great best way of reaching out to the Falco team. In Kubernetes Slack, there's a channel called Falco. So you can go there. I'll be more than happy to answer questions. Or you can message me directly in Slack. So it's the same name, Nigel Douglas. So I'm more than happy to answer any questions that come up. Awesome. Thank you so much. And thanks everyone for joining today's Cloud Native Live. It was great to have you here, Nigel. I so appreciate you sharing your time and expertise with us and learning about Falco Sidekick was awesome. So the chat today, you are all really awesome too. Thank you for the interaction and for the questions. So here at Cloud Native Live, we bring you the latest in Cloud Native Code on Tuesdays and or Wednesdays at this time. And thanks for joining us today. Our next episode is gonna be Wednesday of next week. Thanks to everyone who's here today. Thanks to everyone who watches the recording. Goodbye.