 All right, so we are on time and I have a lot to cover. So first of all, thank you OSS summit for having me. Thank you everyone for showing up and Yeah, my name is Pablo and I'll be talking about how to identify and meet gate cloud and container threats So let's get started a little bit about me. I'm the dev rail developer relationship in Sysdig I've been in open source for more than 15 years now Community organizer with meetups conferences and stuff, but education is really my thing like training Delivering talks creating content. That's what I really like to do. So a little bit about you Put your hands up if you use containers Okay, that's great. That's what I expected if you use Kubernetes Good if you consider yourself a developer Wow If you consider yourself ops or dev ops person Come on here. You're a developer or that have ops you need to choose That's true And put your hands up if you consider yourself a security person, okay, do you Do you know what ebpf is? Okay, could you come here and explain it and Do you know what Falco is? All right, just a few all right So I'm gonna be talking a little bit about ebpf security Falco But also on the generic idea on securing containers So I'll start with talking about images minimal images vulnerability management Image signing registry then I'll talk about runtime security I'll run a demo here to give you like a hands-on idea of how it goes and how to detect runtime threats And then I'll just do some closing remarks. All right, we get to go so What I think about containers and I think most people think as well is like you think about the container running, right? You think about the runtime part, but for the container to be running it actually needs an image to run and that image needs to come from a registry which needs to be actually published to the registry and At some point you need to build that image to be able to publish and you need a source code to actually build that image and also like a base image to get from in most cases, right and the great problem is All of them are tech vectors, right? So if you're just thinking about runtime, you have a problem if you're just thinking about the build time You have a problem if you're just thinking about your head registry, you have a problem You need to think about the thing holistically, right? So that's what I'm going to try to cover one step at a time. So let's start with the source code You're a developer you have a repository you send your code in there What's of course really important is access to that repository should be controlled and you should actually think about this privilege To make sure that only people Should that should have access to it will have access to it and that's all they have not more not less Right, you need to think about credentials You don't want to have leaked credentials into your repositories as people were pushing code You need to think about source code scanning and yesterday. We're just talking about that here on stage with Andrew and Matt Jarvis as well You need to be thinking about the vulnerabilities that you have within your code but also the vulnerabilities that could be injected into your code from your machine into going to the repository and Finally about third-party dependencies. What are you using to actually be able to run your code or to? Put your application together. Are any vulnerabilities in there is anything that you want to look into and You need to make sure that everything is sound So in this case you want to catch the malicious code before build and then it goes into build What are we worried about now same thing less privilege? You just want to give access to the services and people that need it and you just want to give enough Rights for them to do exactly what they need to do not more not less You want to think about a trusted base image and not just getting it from a public repository And I'm going to go deeper into that in a bit You want to think about CICD malware injection? So from the source code in your repository into the build the attacker could be Basically there in the middle and inject something that's actually going to end up in your image Actually is going to end up in your runtime and Finally crypto mine build machines Attackers are also using once they get access to that either the repository or the build pipeline They can basically change things in there. Sometimes they just put a crypto mining on those Jobs they're running and they're just doing crypto mining while you're doing your builds So those are things that actually happen just now in real world So you want to catch the malicious code before Publishing it and you want to make sure that the attackers are not actually exploiting your infrastructure within this part Then we go into the image registry, right? So we have the build it's done and we basically want to store it somewhere again What do we want to think about let's privilege principles and registry misconfiguration So in some cases people set up the registry as being public because oh right now It can be public and later on someone that's not Aware of it. It's just publishing an image that actually should be in a private repository What is the problem? There could be data in that image which should not happen but happens that attackers can just start exfutrating or getting from there and That's not going to be good, right? Registry break in so attackers can basically break into your registry and have access to private images that should not be there So configure and secure your registry On the image itself we can talk about the vulnerabilities that we have in the image And I'll have more slides later to specifically focus on that but things that we are going to be looking into is having Minimal images the image should not be bloated should not have extra packages Should not have things that are not going to be used at runtime, right? Distra less in talking a lot about that lately Also after you have the image and you know that you have vulnerabilities. How do you prioritize those vulnerabilities? How do you look into vulnerabilities? They're actually going to have a huge impact Into your environment instead of wasting time with vulnerabilities. They're not even exploitable and Finally the image configuration again list privilege principle You don't want to be giving rule privileges to your image Within the configuration if you don't have to if you don't need to and what we know that happens a lot is you as a Developer or just a DevOps person but within the dev environment. You're just like, okay, let me just put root here It's going to be easier I need all those packages in there and later on you forget to basically remove and to make Hygiene into that image and ends up in a bad story So minimize the image attack surface. That's the idea of this point Okay, now we get into runtime and now it gets a bit more complicated So you did all the steps that I discussed before But you're just out there in the open and attackers. You're gonna exploit everything they can So one of the common attack factors is public malicious images So if you don't trust the base image that you're getting Just make sure you start trusting them right because there are a lot of images out there They're just malicious and in some cases they do like typo squatting We're just just changing two letters of the name of the image you don't even notice because our brains too smart for that and Voila, you have now an image that's basically a malicious one with a backdoor or crypto mining That's gonna be running in there container escape so from the container itself If they get access to it, they start escaping exhalating privileges in there Vulnerable host so you're running containers, but they run within a host So you need to take a look into the host itself because if there is a vulnerability There's a chance that attackers are gonna exploit that as well zero-day vulnerabilities There's not a lot that we can do about that after all there's zero day It's just like completely new vulnerability that no one knew about but there are ways of Speeding up your remediation on top of that social engineering attacks Credentials being stolen all the things that we listen and talk about like in our daily lives Yep, so catch the suspicious behavior After other measures are put in place always remember security is about layers, right? It's not about one solution or the other solution it's about Making sure that your environment is harder to break into than your neighbor's environment. That's basically The notion of being secure Good, so now let's go into each of those topics or at least the most important ones So starting with a minimal image So you need a base image to run out of right and it's really important that you start with a minimal image so here's an example of a node image that Basically has 15 critical vulnerabilities and 153 high vulnerabilities within the image and if you actually get a slim version of that image It goes down into one critical vulnerability into three high vulnerabilities Like it's a huge difference in there and maybe none of those are actually exploitable or gonna be in use So attackers could not actually make use of them, but still why would you increase your attack surface, right? So use a library of base images from a trusted source and make sure that you only have what you need Bloated images is very common very similar to the previous topic So this is the minimal size and this is the actual size So if you compare a standard UBI image you it has 37.5 megabytes if you look into an alpine one That's 5.7. This is a 6.5 Size difference and that's exactly what I try to mirror within the font and the cubing there and everything that's in there is just useless just Potential things for attackers right and that's what you want to avoid. So if you're not using Just get rid of it like developers create the image. They need some packages It's easier for development and then they just don't do the hygiene. That's very very important So attacker tools, that's a second thing So after the attacker got access into an environment into a host or into a container Like they need tools to keep digging to keep exploiting your environment, right tools like curl for like Downloading scripts. They're gonna run vi and nano to edit those scripts on the fly apk npm package managers to basically download new tools that they might use We're talking about tar and unzip usually that's how they got files if config netcat all those things are tools That might be in the image that are just gonna help the attacker and they basically love those tools if they are there That's amazing. They're gonna be very happy However, if you don't have it there if you have minimal images, you're just gonna break their heart They get in there and there's not much that they can do they cannot even open a file to edit They cannot download a file from like an external source, which is the script that they want to run So they can still exploit things, but it just makes it way harder for them Alright, so by removing those tools, you're basically saying, okay You know like those attackers that don't know much that could actually break things No Now you actually need people that really know how to go into like low-level cold stuff and really do things well Which is like not a lot of them out there Good in terms of vulnerabilities and images the way you build your images the way you use the layers That's also very important right here. It's an example. So we have a base image in version 1.1 There is a CVE so a vulnerability that we fear we found and we want to fix that that same vulnerability Could be in a layer. It doesn't really matter The thing is by the time you discover vulnerability and you associate that vulnerability with one of the layers All you need to care about is fixing that specific layer You don't need to go and change everything so you fix that layer and you basically send out the patch and That's good. This is fixed without having to worry and that also helps you pinpoint when someone says oh There's a vulnerability in log4j Where is that? That brings you all the way back to okay. That's actually just in the base image We don't need to take to change any code within the layers that we have Okay, so for so good So let's move into vulnerability management. So we have vulnerabilities in images That's a fact actually eight seven percent of container images They have a high or a critical vulnerability, right? But that's not really a problem many of those vulnerabilities They cannot be exploited and that's what we are looking for So from those high or critical vulnerabilities only 71% have a fix available Which is okay? We can fix a bunch of them, but are you gonna fix them all that's a lot of work, right? Which ones which of those packages are actually in use? Which one of those are actually loaded into memory so attackers can exploit them, right? And which one of those are actually Exploitable they have an exploit out there that people can actually exploit in the way that your environment is configured So if you do the proper homework, and you look into those you end up with only 2% of those vulnerabilities that actually matter Right so instead of having your team like completely swamp with I don't know a thousand vulnerabilities to fix and to look Into now you basically reduce that into 20 right so that's what you want to do You want to prioritize you want to use runtime insights? So what's running in your container you want to be able to see that to have an agent in there that collects data So later on you can actually optimize Your team optimize the work that you do One way of thinking about this in use vulnerability prioritization is like this so sorry, it's a bit smaller Image registry we have an image in there that image has three vulnerabilities that we know about and When we go into runtime and we have our container running actually the only ones there are loaded into memory It's only one of those three right so there is only one vulnerability in there So what you want to ask yourself is what is the real risk? So do I have vulnerabilities in runtime? Yes, I have plenty of them are they in use Yes, so that's what I'm start focusing on. Are they exploitable? Yes So I'm gonna really focus on those do they have a fix Yes, how can I fix it? Is it the dev team? Is it the ops team? Who do I need to talk to to fix it right away? Oh, no, they don't have a fix yet There is no patch for that Then you're gonna go talk to your threat team or your security experts on how can you remediate that are you gonna? Write a rule or you're gonna increase a firewall policy or something like that to actually be able to stop that Specific vector that you just figure out and then you're gonna remediate that The way you can but that's the high priority just in there That's what we really care about or how we should be spending our time Good image signing and registry so as I said attackers They tried to find any way to get into your environment one way they do is they publish Images out in the open hoping that at some point you might just mistype the name of an image or you just might okay Let me download this. Oh, let me read the stack overflow Oh, that's the image that they are saying that I should use and then you just put that in your code And by that point you now have a malicious image that's being added to your environment And they do a few things with those images. They have crypto mining So that was a study done by a sys dick back in 2022 and they found out approximately 700 Images out there that were malicious either based on the IP on the domain or because they had secrets like that and They have different ways of actually trying to exploit the You after you actually put those images in your environment so crypto mining is one of them embedded secrets So they can do as a sage and stuff and auto options so For that you can have Trusted images and image signing is an important step into just using trusted images, right? So the main benefits are your container image integrity Images are from a trusted source and there is a safe hand over from one step to the other step within the pipeline, right? So your developer Okay, gets a base image. It's a trusted one starts working. It needs a new image. It starts working on that image It's a bloated one because for development. That's how you speed things up. You have all the tools in there You have all the packages. That's great at some point. I really hope you're gonna do the hygiene You're gonna start trimming down and making sure that your image is actually a Minimal one and then you want to publish that so you're gonna sign that and you're gonna say, okay Here it is my sign image and you're gonna send it to key way and key way It's gonna make sure that the only image that you're gonna use to test is the sign it image from developers That's how you make sure that you're testing what you should be testing And the same thing is gonna happen from key way to production You're gonna sign it again and you're gonna make sure that the signature is verified when we go into production Yeah, there's a lot of work in here. Yes, it's up to you to accept the risks I know that the teams are not infinite and we have a limited amount of time to actually work on things but You need to balance the risks and decide where you're gonna invest your time, right? So there is some work to do here. It's not their heart. There are tools out there to help you doing that But once you start putting processes And getting used to it. It just gets easier Good. All right, so I'll pause here let it sink in and I'm gonna go into the new part of the All right. So runtime security So why runtime security you have multiple layers in security, right? I just talked about a few of them within containers But after you have your container running bad things can still happen suspicious things can still happen You might be on your computer You might do a git push and you might have a key in there and that's a problem You probably don't want this to happen Or you might be in your computer again Just changing the configuration of your Kubernetes cluster adding a new pod and that pod is a privilege pod Should it be a privilege pod? Maybe yes, and that's fine But if that was not the case, you probably want to be alerted about that and know about it Someone in your company try to log in into AWS without using MFA Nowadays, that's a big issue like you should probably be looking into that Someone is trying to escalate their privilege if either from a container or from a host Just running a command associated with a CV. You probably want to be alerted about all that as well. So things have changed and Instead of just putting firewalls and not letting people win now everything is interconnected, right? So what you need is instead of blocking services from talking to each other is to make sure that you have good Security cameras all over the place that can actually let you know when something bad happens, right? So you want to have a camera looking to the Linux host looking to those containers looking to Kubernetes? And they're gonna be looking at system calls and collecting those system calls. So everything that happens in a Linux Machine goes through the kernel, right if you want to access a file if you want to access a socket to send to use the network if you want to access memory everything goes within the kernel, right and Take having visibility into system calls allow you to really understand what's going on So within Kubernetes you might want to take a look into the Kubernetes logs to look if it a privilege pod is being started Into AWS you might want to look into cloud trail. That's where service logs go And you want to see if maybe someone tried to log in without MFA or if someone just added an admin role to another account and After you collect all this data, you want to be able to analyze this data The way we do is we run the event against a set of rules And if it match one of the rules we send an alert Oh, this is suspicious, right and you want to be able to send this to a place where you actually have Visibility either a CM or slack message page your duty. It doesn't matter. That's up to you So runtime security and Falco Falco is an open-source runtime security solution, right? It's under the CN CF and you can use it right now. Just go download it and start it It's basically for threat detection across Kubernetes Containers hosts and the cloud when I talk about cloud. I'm talking about github. I'm talking about Kubernetes Autologs. I'm talking about AWS cloud trail and more when I'm talking about Kubernetes containers and hosts. I'm talking about collecting those system calls Containers or processes Kubernetes is just on top of all that everything goes through the system call If we tap into the kernel and we look into system calls, we have visibility into all those layers It's a incubation level project. We applied to graduation almost last year We are waiting from a word from the TOC any day now probably in the next five days. Hopefully, it's gonna be a good one Falco comes with more than 80 default rules for a system call if you're using AWS a cloud trail There are default rules for that Kubernetes audit logs There are also default rules for that and we're gonna be looking into crypto mining executing shell mutating the binaries privilege escalations and much more so Falco can Tap into the kernel to get system calls It can tap into what we call plugins to get also external source of data like Kubernetes audit logs and the one I said And when we output this alert those alerts, we go there are many ways There is standard out there assist log the traditional ones But you can also use HTTPS to send to Falco's younger broader Falco sidekick and Falco sidekick has more than 50 different integrations like slack elastic search Kafka page your duty Etc and you can even do things like oh if priority is greater than or equal to critical Send me a page your duty because I want to know right away Otherwise just send me to my data lake or to my CM whatever Good so going a bit deeper into how Falco actually works on the system call level. We have the kernel here and We tap into the kernel using a kernel module or a ebpf probe so ebpf and the kernel module allows you to basically hook the kernel and When there are sys calls you basically put a trace point in there You get that system call and what it does is it right to a ring buffer? That's a known blocking is not changing the behavior of anything. It's just visibility collect that event that system call We write it to the ring buffer Falco process running on user space is Constantly looking into that ring buffer and reading events from that ring buffer those same sys calls that the kernel was writing to and it's Matching them against a set of rules and saying this is suspicious drop it. This is not suspicious. Drop it. Drop it. Drop it Oh, this is suspicious send alert right important things here to call out as a state engine that allow Falcos to enrich the data Most IDS is intrusion detection systems out there don't have it They don't have visibility into Kubernetes. They don't have visibility into containers they lack all that metadata that Falco adds and Yeah, finally a rich set of rules that we match against All right, so sign for my demo before I actually Started let me just play the track here should take a few seconds and While it's starting let me just explain what I'm gonna do So I'm gonna talk about log for shell who here is familiar with log for shell Okay, who heard the name log for shell before my talk Okay, not a lot of people. So there was a vulnerability on log for Jay discovered at the end of 2021 22 yeah, 21 and basically that was 13 years later nine years later it was introducing 2013 found in 2021 and Largely exploited in 2022. It was crazy, right like log for log for Jay is using many many many Java applications out there The attack consists of the attacker. That's me the vulnerable server. That's any Java Application that's running using log for Jay with access From the network and the malicious server That's a now that server that I'm going to put there as an attacker, right? So as an attacker, I'm gonna send a malicious request with a payload this payload is gonna exploit log for Jay basically the JND I here forces log for Jay to run this which forces The Java process to go into that address to basically collect a specific Java class so the vulnerable server sends a now that request to my malicious server Which sends back a malicious Java class. That's gonna be executed I can even put here in this payload what I want to be executed What I'm gonna do is I'm gonna put in there a net cat request Back to myself So I'm gonna be here listening with net cat at a port and then it's just gonna open a reverse shell And from this reverse shell, I'm gonna start poking with the environment trying to find things in there and do bad things So my environment is ready Let me start I'm gonna need some help from you back there. Can you read this? Out of a back there perfect. Thank you very much. So this is one machine. That's the Developer or the vulnerable server. It just has a Java running. That's the account portal here I could even Good CTL logs Count portal Yeah, so I can check the logs off the application. I can just open here. Oh Wow Something bad happens here Let me just try to Wow That's a bug. I never saw that before That's the problem of live demos What? Internet is just failing. Let me see Okay, that's not good. I Did that demo like More than 10 times now and it never failed me So that's weird. Anyway We have a username here and a password here and I'm just gonna log in We did not recognize sorry and Let's see. Okay. So the admin is here. So it's not too broken As an attacker I'm gonna start the LDAP server, so I'm gonna be listening Then I'm also gonna start my nat cat. Sorry. This is the vulnerable This is the malicious server that's gonna be waiting to send the malicious class This is me listening on that cat for the reverse shell and This is me just generating the payload remember that I said that I'm gonna send an end cat A net cat just basically opening a shell so I'm gonna get the payload and Hopefully this lovely application is just gonna load Which doesn't look like oh it does perfect and It doesn't matter the password. I'm gonna log in. Oh, I don't recognize you and now if I go to attack or two Who am I? Voila, I just got an access with the exploit if we go back here And we look into the logs That's basically what happened right like it's just exploiting the whole thing. So as an attacker What can I do from here? So I can start poking around of course? I set things up, but that's what an attacker would do I don't have the time to actually be the attacker so I can try to find secrets and let's say I actually found here your Kubernetes cluster configuration Which is something should not be there But you would be surprised that many people still do it just to show that I'm actually in a different one get pods Minus a I don't have access to anything. I'm gonna create the config file So I'm gonna set up my attacker machine to have full access To the vulnerable cluster right now I'm in I'm in your Kubernetes cluster. I can do whatever I want What am I gonna do? I'm basically gonna Open a shell because if the net cat ends the connection I'm out So that's just easier for me then I'm gonna download a crypto mining Package and I'm just gonna run the crypto mining. That's not much happening here It's just the W get on tar And then just running the binary that I just downloaded and that's basically what I did So that was me from the attacker perspective. What's happening on the other side, right? So Falco was running all this time. It was running as a Kubernetes Give CTL get pause minus a Falco was just running here. It was running with Falco sidekick Falco sidekick not only gives output But Falco sidekick also gives a nice UI that you can see what happened in there So Falco was all the time running with the default rules and let's see what we get So yeah now I need to basically Probably go out. I'll try to explain as much as I can I know the letters are not that big So only one source that was his call. We know that We we had noticed critical error and warning in this proportion Here are some of the rules that we triggered overall right below root terminal shelling container Lounge ingress remote file copy tools in container Etc. If I click on events, I actually have a list of all the events that happened and here I only have 10 but I can increase it so I can actually see all the steps from the attacker So the first thing we actually did that was suspicious was a netcat remote code execution in container That was me Forcing the vulnerable application to actually get the Java and after getting the job It actually executed this netcat outbound. So in here I have access to the Command that was executed No Sorry, I have access to the command that was executed the container ID Which repository was that container coming from the kubernetes namespace the kubernetes pod name all the metadata that you need To to actually be able to react to that Then we can just keep looking so there was a redirect standard out that was from the netcat It happened a few times then I opened the shell to the container. That was me already at the end Opening the shell to the container then that was me running the W get to download the file. So there were a few System calls that actually happened in there Then there was a right below root because I just put it in the same directory if I had put it to temp as an attacker It would not catch it because it doesn't look into the temp directory So that's also about the rule, but I don't have the time to actually show you the rules drop and execute new binary That's one of the most interesting ones for me. So when you finish an image and you run that That's assumed to be the base layer. That's running like I'm not talking about the base image anymore I'm just talking about the layer within the operating system. That's running the container or the CRI the container runtime That's running. There is a flag which is easy writable Sorry easy upper layer that's associated With a with a binary right basically saying if that binary or that file was in the image When the image was actually used, right? So was that part of the image that you are using to this container in that case? It's not because I don't loaded that image if you create a new file That's a new thing, right? So that's basically gonna flag as critical saying. Hey, this is a container Someone is running a binary that was not here and was not expected, right? So that's a little bit of the demo It's basically giving you all that visibility. So that's Falco and Falco sidekick. I don't have Slack here. It's being recorded. I'm not gonna open the slack and I forgot to put But actually this is set to send it the output to the slack And if I go to the slack channel that I have it's gonna be there, right? It's gonna basically say hey You've got a critical Activity you probably want to take a look at Alright going back just to wrap things up Oops not this one So yeah containers they have a large attack surface And it's from code to build to store to run and that means that not only security folks Should take care of container security, right? It goes all the way back to developers and Having this Conversation is really really important. So some takeaways make sure you have minimal images, right? Prioritize the vulnerability So if you're spending time looking to vulnerabilities Make sure you prioritize and you really go after the ones that are in use at runtime Trusted images don't get public images from out the open that you don't know that you're running your production environment Runtime security after all those layers is really really important and Falco gives you that visibility Into what's happening and what's going on? There is an increasing Importance of list privilege and zero trust you can see like all the conference that you go There's always a talk about zero trust list privilege Like that's really really important. It's not a new concept people have been talking about that I don't know for what 30 years more like But now more than ever there are just so many services so many surfaces That you need to take care of that urgently Shift left as I was saying is a team collaboration, right thing and it matters to everyone, right? As the company name that's in stake as the company data. It's like a big Big money that companies have to pay for Data leaks and stuff like this and yes evpf is the kernel instrumentation made simple That's the part that I went a little bit low level Basically, how do we collect all this data from the host that applies to containers and Kubernetes as well Falco ecosystem just an overview and some references so the PDF is already in the schedule so if you want to download those are some references and We do Falco events Basically, we are doing workshops around the world. There are a few in the US. I know the ones in Europe We're going to London Berlin and Paris in October I'm gonna be in Portugal and I'm trying to organize something in October as well and we're trying to organize something in Barcelona and Madrid in In November those are two hour workshops that we really go into cloud native security that I didn't touch here and we really go into Falco How he works hands-on experience similar to the environment that I had here. You just give it a try a lot of labs Here's the Falco book if the subject interested you. It's a very nice book. It's easy to read I read it a few times already and it gives you an overview of a lot of interesting things really goes a low level It's free to download Yeah, and that's it for me. Thank you very much. Thank you for coming and if you have some questions I'm more than happy to answer all of them No questions Either I did a good job or a terrible job All right. Yeah. Yeah, so the question is Falco basically look into CIS calls and analyze it. Is there a way to block? Those CIS calls the suspicious one not within Falco. You have other tools that try to do that There are pros and cons to both approaches Basically if you're stopping things from happening you need to have a high certainty that that is really a bad thing and By the time you stop it it can have a huge influence in other things, right? Also system calls are all the way down to the kernel level if you start messing up things in there It's very hard to predict what's gonna happen with the application if you're gonna block it not block it So we try to be as fast and efficient as possible like we get the system call We write to the ring buffer the flow continues, right? So that's the idea within Falco There are other tools there that work with policies that you can try to do it Yeah, okay. Thank you