 Welcome to today's CNCF webinar, Kubernetes Runtime Security with Valco and Cystic. We'd like to welcome our presenter today, Jorge Salamero, Director of Product and Technical Marketing at Cystic. Jorge, take it away. Thank you, Taylor. So, first of all, a little bit about myself. So, you know, who is presenting today. Taylor said, my name is Jorge Salamero. I run technical and product marketing here at Cystic. I used to speak at different DevOps and development conferences, and I was myself a TV developer in the past. Now I work mostly on containers and Kubernetes. I'm one of the persons behind some of the Falcon integrations, which is what we are going to be talking today a lot about Falcon Runtime Security. And you can follow more of my work on GitHub and Twitter as well. But before we get started, I want to give you a little bit of context about what Cystic was as a project or as an open source project as a company and then what's exactly the relationship with Falcon. So, Cystic started in 2013 as an open source project for Linux kernel tracing with container support, production ready. You can understand that as an evolution from TCP-DOM and Warshark from the network into the entire operating system, all the system calls happening in that kernel. And we can use that to gain visibility into what's happening inside containers. And Cystic had Kubernetes support. So that's how the project started. A year later, a company was started from the people who created the project. So Cystic Monitor was a commercial product to the Kubernetes monitoring, performance and availability monitoring. A few years later, Cystic Secure was largely using the same technology, helping implement security, looking at what's happening inside the containers. You can find plenty of open source projects that were born at Cystic or Cystic has contributed. Cystic, Cystic inspects Falcon and we hope also contributed to EVPF and Prometheus projects. But how Falcon faked into this picture? Well, Falcon was originally started by Cystic, the company. But it's currently run independently by our community. And it's under the CNCF umbrella since last year. Right now it's a sandbox project, but hopefully it will be in the incubation phase. And as I said, I'm going open proposal to make the change. Falcon leverage is some of the Cystic open source libraries as we will see later as we explore a little bit deeper on the Falcon architecture. Well, we need to answer why do we need runtime security to understand why we are talking today about desktop. Typically, when we are approaching implementing security in cloud native stocks, the first hard we get to, it comes to our minds is like, all right, let's do vulnerability scanning, typically, either from our container registries or CICD pipeline. Then let's set up some kind of like user identification and authorization system. So I can set up the permissions on my Kubernetes cluster. But is that enough? Can we make sure and then security properly when even if we implement all these things, we need to make sure that these layers is top from happening all the bad things we don't want to happen, basically. And this is what runtime security is about. Actually, when I explain runtime security, I like to split it in four blocks or phases. Number one step, it's the prevention or the enforcement. This is basically when we set up all the permissions and communities on what you can do and what you cannot do. The next step is the detection or the audit. This basically tells us if those barriers that were affected or something went through. The third step, it's having the opportunity to block those attacks, like basically if we were implementing a firewall, but again against the entire system, not just never. Step number four, sometimes ignore or miss, but it's still very important. It's being able to audit and record everything so we can do effective incident response and forensics. So prevention, detection, blocking and incident response and forensics. When we talk about prevention, as I was saying before, this is basically who can do what within Kubernetes. And the nice thing here is that Kubernetes provides a bunch of native controls to define that admission controllers, for example, allows us to hook into the deployment step into Kubernetes. When we schedule a poll, and we can define different policies to say this thing can be deploying Kubernetes or not. We can write our own admission controllers, we can use the web books, or we can use some other more advanced frameworks like Open Policy Agent that help us to do that. The other option or another layer that we can do on prevention, it's Airbot. Airbot, it's basically Kubernetes permissions. How we define some users, the roles that they have, so we define what they can do, what they can access, what they can modify, which different resources they can change in our Kubernetes class. Another option is using network policies. Network policies is against another Kubernetes resource that allow us to define firewall or network access rules for our Kubernetes services. Another very interesting resource, it's called pod security policies. Pod security policies allows us to define what's the security context, the security configuration of every path. It can be a privileged path, if it's a read-only root file system, and also allows us to define and to enforce some runtime security profiles using Secum or SELinux or Aparma. Secum allows us to define a sandbox state where we can allow the system calls that a container is allowed to execute. Other frameworks like SELinux or Aparma are slightly more advanced and allow you to define what paths can do the runtime with a more advanced language. So we can say this process can access these files or it can open these connections just to give you a few examples. But prevention is not enough. We need that second step or phase within runtime security, which is detection. Detection helps to answer what happens with a control fails. It's basically our last line of defense. We need to have it for catastrophes. We need to have it to tell a story basically when the unexpected happens. When either we fail to configure properly those inference mechanisms or the attacker could find caps or holds to basically bypass those mechanisms. We will be applying these detections very few times, but it's very critical when we need it. Also, this is very useful for our theme purposes. Detection can also be used to validate the enforcement mechanisms that you're actually working. And we can also use detection to validate to make sure that those policies, those enforcement mechanisms, they doesn't break my applications. So it's not exactly in a straight line. It's actually like it allows us to improve our enforcement. There are many use cases for detection. For example, even if we have images scanning our CSE pipeline and patch vulnerabilities or vulnerabilities that they don't have a public CVE won't be catch for those mechanisms. So detection is going to allow us to see to monitor your containers for anomalous behavior, which is going to be the way to catch anyone trying to exploit those third day vulnerabilities. It can also be used to detect insecure configurations, leaks or insecure credentials, internal effects, like someone doing QPCT Alexa in the pod and start to making changes. And also it's going to be very important for compliance and out of debt, as I was mentioning before, I want to audit any changes at runtime across my infrastructure. There are a bunch of different approaches for doing runtime detection or visibility. One approach is using LDP reload. LDP reload allows us to hook into all the live sequels and see what each of our processes is doing. The consequences of using LDP reload is that these mechanisms actually allows to change your application behavior. So from a security perspective, that's something that you might not be ready to trade off. Ptrace is another option. The limitation of Ptrace is that you can only trace or monitor a single process ID and also will be capturing every system goal. So it can have a significant performance impact. Again, also changes or can change in behavior. So that's something to consider from a security perspective. Sidecars are using Kubernetes pods. It's another approach. You can have a pod that owns the monitor pod name spaces are being shared between those two pods. I can access into the process, network, storage. So from one pod, I can get visibility into the other. This is in principle a good approach, similar to what it could be, for example, the limitations are here are basically the instrumentation overhead, the complexity, and that the visibility that you get is basically limited to the scope of the pod of your monitoring. A fourth approach is using a kernel base approach and leveraging either a kernel module or the EVPF approved to get visibility. It's a better approach to get very close to the total system visibility as we will be capturing all the system calls that and we can use that to get visibility into all containers or every single process running on your hosts. One of the benefits of these is that we can capture these activities synchronously, so it has a very low performance impact. Both kernel and EVPF proof required your kernel headers to be built. The difference between the kernel and the EVPF proof is that the EVPF proof runs in a safe mode inside the EVPF machine so it can be considered more. We wrote the blog post where we discuss both options and the benefits of EVPF so you can check that out if you want to know more information. Now let's talk more about how FALCO leverages these. FALCO is considered the Kubernetes and it's basically a detection engine for any kind of anomalous activity that might be happening in your host on your containers. FALCO leverages a set of rules that follow a syntax very similar to this to define what it's considered anomalous and what it's considered safe. Leverages some open source libraries, leaves a scab and leaves a snip coming from the open source project to get visibility and it has Kubernetes native support. This means that every system goal it's tagged by the Kubernetes context, name, space, deployment, demon set, pod, etc. So we know from which application or microservices each activity come in and FALCO can also hook into the QAPI server audit. So we can get information about what's happening at the orchestration layer. Well, we'll see some examples in a second. The problems that FALCO solves are things like are my hosts and containers doing something that they shouldn't? So for example, there is any unexpected design process like a Postgres SQL container is found in something different or a process doing something like installing a new package or changing configuration at running at runtime inside the container or looking at network activity. For example, like an NGINX container that's starting a new listening port or doing an unexpected network connection. We can also look at the orchestration activity. For example, we can answer if a Kubernetes user need a kubectl exit or kubectl attach into a shell into a privileged container and it started doing some changes. These are the kind of questions that you're going to be able to answer with FALCO. FALCO architecture, it can be understood following this diagram. We have like the kernel module or EVPF proof from where we get every system call so we can see what's happening in the system. We'll be leveraging LevisCAP and LevisNeap libraries to organize, to reconstruct that activity and send it to the filtering engine. All that information is combined with the Kubernetes metadata and labels and also with all the events coming from the QAPI server. We match all that activity against the FALCO rules and if anything matches, we can define that yes, there was some unexpected behavior and we can trigger a number of notifications across different mechanisms. Syslog, a file output, a standard output, we can execute commands, we can trigger webhooks or if we want to extend things a little bit further, we have a GRPC output that uses TLS so it's fully secure. Then, and then we have a bunch of different projects or site projects to FALCO that you can use to either send notification notifications to other places like pager duty, obscenity, data dog, Elasticsearch, AWS SNS or Google Pubsub using FALCO sidekick. We can also control FALCO using the client goals and you can write your own. And we also have a Promethex exporter that allows you to monitor and get metrics of all the events being triggered. If we look at a bit at how FALCO it's being deployed, FALCO typically package into a container and you will be deploying your cluster using a team on site. From that container and hooking into the host kernel using EVPF will gain visibility into any process, either running on the host or inside any container mechanisms or technology, Docker, cryo, he name it. If we look deeper in how FALCO works, basically we are going to have the kernel or the EVPF through. All the system calls into a range or buffer with user space. This is basically like a clock and we are filling new system calls and we basically drop everything at the end. With all this state of information, we have these libraries that basically control the reads and writes on this buffer and we reconstruct all the activity and we have this kind of like a state engine that we can use to match against our filtering rules. At the end, FALCO, this is just a common line to handle these libraries and allow to handle all that. And if you let me go even a little bit deeper and show you some of the peculiarities of FALCO EVPF implementation, there is something that makes FALCO and also the CISIC libraries different. So the EVPF programs, they use this map data structures to control all the communication between the EVPF program running in the kernel and the user space. On CISIC and FALCO, there is an additional ring buffer that we showed before that provides a high throughput channel between the kernel and the user space. So the EVPF map structures, they are just used for signalling and control and the ring buffer is used to move all these data into user space where we can mix it with a container and Kubernetes context so we can give it a little bit more of sense. You might be wondering how these rules actually work. I mentioned before that this is something very similar to TCP thumb filters where you have a name for the rule, description, and then the condition. This is basically the filter. So here we are looking, we want to detect if any of my Node.js containers run any process which is not the Node.js binary. So we are going to write a rule like this. We are looking for event type exact VE, so this is a system called we are looking. We want to look for someone or some container executing a process. We want to look at all the system codes within this scope. My Kubernetes deployment called my Node.app. And we want to detect if any process it's not called Node. If anything happens that it's different to this, we will trigger the rule, sending this output. And here we can add some placeholders so we can get additional information like the username that is found with processes, the process, the process, name and command line arcs, and from which container this is coming. We can use many filters to create these roles, filter for processes, for users, for file system activity, network, but also to define from which Kubernetes scope is coming, including pods, replication controllers, services, name spaces, replicasets, deployments, etc. You can find all these documented on file code documentation. But file code can also hook into the QAPI events. So for example, I'm able to detect if someone is creating or modifying a config map that has some private credentials inside rather than using a secret. So the way I would be doing that is writing a rule like this. And here this rule is slightly more advanced because we are using macros to simplify the condition of this rule. This rule, it's going to look for config maps, and then in our Kubernetes event audit, any modification on those config maps, and to see if those modifications contain any private credentials. So config map, we define it as a Kubernetes resource, which is a config map. A modified macro is defined as any create, update or patch, bearable functions. And then they contain private credentials. It's a macro that basically looks at the payload of the config map object and contains, for example, these different strings. This is basically how you would be writing these roles. But you might be wondering, all right, so these are just two examples, but tell me what are the typical scenarios? And this is something that we cover on the Cystic Continuous Usage Report, where we looked at real production data of FALCO and Cystic Secure and which were the alerts more typically triggered. And examples like containers right in below ATC, or right in below slash root, or launch in privileged container, or containers that they launch sensitive nouns, or they rewrite files under a binary directory, or just run in a shell directly on the container. These are examples of top runtime policy violations. We are going to find FALCO rules created out of the box available as part of the default configuration to detect all these things. But when people are deploying FALCO, these are the typical examples, the typical use cases they use it for. And actually we have detection rules created out of the box for this. So people check for best practices when running containers. So we detect that someone is not updating packages, not modifying binary directories or modifying the configuration, reading sensitive files, or containers, spanning unexpected processes, or changing the executing name and space, running privileged containers, sensitive mounts, or running shells. This is also useful for compliance. We can do file integrity monitoring. We can monitor the launch of privileged containers, or making changes on Kubernetes configuration like config maps, roll changes, and rules of the match, basically PCI or needs regulatory frameworks. FALCO can also be used to detect specific CVs. Some of them are seen in Kubernetes environments like the QCTL copy or the RunSeam breakout. And also the FALCO community provides default rule sets to monitor and inspect the behavior on very typical cloud media stacks, like Rook, MongoDB, Postgres, the Kubernetes control plane itself, and Enix, Elasticsearch, etc. You can find many of these rules on the security hub. This aggregates detection rules for FALCO and so on other projects. So we help you to implement Kubernetes security best practices. All these alerts that you get from FALCO, the way that you usually consume them are basically forwarding them into a CM system. You can connect FALCO to different CM systems. If you didn't have anything, you can use, for example, Kibana. You can forward your FALCO event to Elasticsearch and then use Kibana to create dashboards about what's happening in your containers. As I was explaining runtime security before, and they mentioned that a third step was actually blocking. We prevent users from doing things or services from doing things that they shouldn't. Then we detect activity. If we detect any unexpected activity, we can respond to that blocking that unexpected behavior. This is what it's a response engine about. We can trigger automatically reactions to 13 events to block these attacks or unexpected behavior. This is basically executing playbooks and we do that using functions as a service. We have implemented the response engine for different technologies using open source framework functions as a service frameworks or using Google Cloud or AWS and examples of different actions that you can execute. It's, for example, tinting a node with no schedule, so no other pods are scheduled on that node in case there was a container breakout. Isolate the pod using a network policy, delete the offending pod, scaling down the deployment to zero, triggers the capture of a forensics, or send a notification to different channels, like a slide. Different organizations that you're already using, Falcon production from PusaLand to frame.io, for example, Shopify, SumoLogic, or CISIC itself. So we can read a more comprehensive lease into the adopters file in the Falcon security repository. And I encourage you, if you're using Falcon yourself, to do a PR and tell us your story about how you're using Falcon production there. If you want to get more involved into the Falcon community, of course the first place to start is the Falcon.org website. You can find interest in blog posts and more content on Falcon on the Falcon.org blog, but also on the CISIC blog where we used to write a lot about Falcon. Of course, on the GitHub organization, and we also have in the Slack community docs, and then the security have I mentioned before where you'll be, where you'll be able to discover Falcon rules, but also contribute rules that you might be writing yourself. Now, I would like to talk a little bit more about how CISIC extends the Falcon functionality in some of the products. We saw how it's a runtime detection, but this is just a small part of your entire security approach. When we are talking about implementing security in your containers or your cloud native stock, I like to split this in again in three phases, the build, the run and respond phases. As you build and deploy your containers, you might be doing any scanning and configuration validation. As the containers that you start running in production, you do detection of unexpected behavior. But also you need to create that policy, maintain it, maybe find mechanisms to build those policies or roles automatically, or to respond to blog posts. Then, in case something unexpected happens, you need to be prepared to respond using incident response mechanisms, forensics analysis, and now with all the activity. Of course, compliance is not one-step process in this lifecycle, but needs to be implemented across the entire lifecycle. And CISIC security can help you with that. Something to highlight about the CISIC platform is that leverages open-source components to build all these functionalities. So you'll be finding the anchor engine for images scanning, Prometheus for monitoring, Falco for runtime detection, and CISIC for deep visibility for forensics and troubleshooting. And we use these resources to build a platform that adds a scale to workflow and Kubernetes and cloud context around all these data. When we are talking about these troubleshooting, all these different use cases, we see that it's actually the DevOps teams, the ones that they take care of security now in Kubernetes and containers. We see that security is now part of the DevOps process. We call this DevSecOps or secure DevOps, and this is where CISIC platform helps you. And now let me show you just three or four examples on how you can extend Falco to make it easier to use and to use it in production environments and more easily. One other functionality that we have here is the Falco editor that you will find in CISIC secure. The Falco editor allows you to edit Falco rules directly from this UI and to build your policies directly from this UI or from APIs, where you can define the name, the description of the rule, but also you can define the scope directly here and you can have multiple rules. And from here, you can also configure the different blocking or response actions like stopping the container, passing the container, or taking a capture file. You can also find insecure the Falco library that allows to pull in different rules or policies to create your runtime policy based, built on best practices and different framework, compliance frameworks, or attack frameworks like the Mitra attack framework. So you can build your runtime policy from a number of examples and defaults out of the world's rules that you find here. Another good example on extending Falco, it's the tuning functionality. We have seen that if you misconfigure your runtime policy, it might be triggering a lot of false positives. Or if you deploy a new application, a new version of your applications, things might have changed, so you need to fine-tune your rules. Falco tuning functionality looks at all the events being fired in your system and looks at what needs to be changed or fine-tuned so you reduce the number of rules that they are being triggered all the time. Another example, it's profiling. We see that some users, they don't want to spend time creating manually other runtime rules and policies. So profiling basically uses detections, uses that visibility to learn how containers they behave, looking at the network, connections, processes, file system activity or system calls, and then generating those runtime rules that the runtime policy automatically for you. Another example is the policy advisor. Policy advisor looks at the running containers and allows to generate the most restrictive policy security policy for that running container and then validate it against the running behavior. So we make sure that your policy security policy doesn't break your running applications. Policy security policies, as we mentioned before, can be used as an enforcement mechanisms in your Kubernetes cluster. So we define what the pods can and cannot do and we block anomalous behavior before actually it happens. Last but not least, we mentioned how this detection, this runtime data, it's very important to tell a story of what happened. This is audit, an incident response and forensics. Being able to correlate orchestration layer activity, like for example, a user or a service account doing a QCDL exec in Duopod and running a number of commands. It's something that you can do with SysTec. If you like all this, you can find SysTec and some members of the Falco community in some upcoming events. So we are running a number of Star Wars premieres. If you want to come with us to the cinema and enjoy Star Wars, you can do that and you can go and buy through that link. Next year, you'll find us in RSA, KubeCon and Red Hat Summit, just to mention a few events. And well, if you want to learn more about this, don't forget to check out the Falco.org website. I also wanted to include here a link to the container users to report and a webinar I quote before. Don't forget to subscribe to SysTec.com blog. And if, well, last week Kubernetes 117 was released, and we always like to write a blog post on what's new. That went, uh, become very popular. So again, I would like to use this opportunity to share that link again with you. Thanks very much for listening today and I hope you enjoyed. Thanks Jorge for a great presentation. We look forward to seeing you again at another CNCF webinar.