 Hi everyone. Thank you for joining. My name is Itai Shokuri and I work for Aqua Security on open source security related projects mostly for the cloud native ecosystem. Lately I've been working on a project that is a system tracing tool and its integration into a dynamic image scanning product and I wanted to share with you first of all the concept of dynamic image scanning the way that we perceive it and also the relationship between this and other concepts that you are familiar with maybe you're using today and also I wanted to talk about system tracing and how important it is for dynamic image scanning and how we can leverage that. So to start the discussion I would like to first discuss what we have today with container scanning. So I'm going to look at Trivi which is a popular open source container scanning tool. Full disclaimer it's being built by my team in Aqua but this is just an example for any container scanning tool most of them work the same way. So here I'm asking Trivi to scan this image. It's Drupal image which is a popular content management system. Trivi will discover that this image is based from Alpine and because of that it will look for Alpine packages that were installed using APK and then it will understand that this image was the software is using PHP so it will also look for a composer software installed using Composer and similarly there are some JavaScript components in there so Trivi will also look at the yarn file that it found in order to obtain the list of software relating to JavaScript. The goal here is to compile a list of installed software within the container and most importantly the versions because in the next step we're going to compare this list with a database of known vulnerabilities. This is the other piece of any container scanning tool for Trivi because it's open source and we build the database itself in GitHub and we store actually the database inside of GitHub. It's very easy to go ahead and see what we're doing there so just for example we aggregate vulnerability information from different sources for example NVD which is a great comprehensive database of vulnerabilities. We also consider different advisories and lists that publish their own security advisories and vulnerabilities. We even go as far as looking inside of the code of some projects that we are interested in. In this case you see an example from Alpine if we know that we can parse some security vulnerabilities from these sources. There are many other different sources that goes into the Trivi database but the result is just a simple database of vulnerabilities and the affected software and then we can go ahead and cross reference the list of vulnerabilities with the list of software that we found inside of the container and we see here that low-dash which is a JavaScript dependency library has a high severity vulnerability because of the 4.17.15 version that we were using. So great this is a very useful thing that any static image scanning tool can give us discovering known vulnerabilities. What else can we learn by statically scanning the image? So another thing is misconfigurations. Unlike vulnerabilities where vulnerabilities are basically bugs in the software, in the dependencies that needs to be fixed upstream, misconfiguration is not about bugs it's mostly about improper usage of the software and if we look out there we see a lot of examples for misconfigured container images that we could have detected just by scanning the image. For example, people using the default settings for the software and wasn't adapting it for production. People leaving a lot of unnecessarily open ports, TLS settings which were incorrectly configured, people even use sorry people even forget their passwords and the keys inside of the code and inside of the containers. So these are all examples for things that we could easily have discovered just by looking at the container image itself. So vulnerabilities, misconfigurations, what else can we understand just by analyzing the image at rest? So another thing is malware. The container itself is just a bunch of files so we can take any number of files from within the container and use traditional anti malware tools in order to identify those files and see if any of them appears to be malicious. These are all examples for the benefits of static image scanning. Just by looking at the image at rest we can learn a lot about what this container is up to. So what do you think? If you are scanning containers today is this satisfactory to you? Do you feel secure after you scan your container images? So my claim here is that you shouldn't be because there is an entire category of risks that simply cannot be detected using static image scanning. And one example for that is evasive malware that we have observed in the wild. So this is an example for an image in Docker Hub that passed all of the scans. It has no known vulnerabilities, no misconfigurations, no known malware. But when you run it, you see the entry point here on the right. There is an encoded base 64 encoded string here that is essentially a malware encoded as characters. And only at runtime the script will unpack this and run it. So if you were to scan this script, you know, it's a string, it's a weird string, but it's not necessarily malicious. Not a lot of static scanning tools will be able to understand what's going on here. But when you run the image, it's very easy to see that it is actually executing a malware. And this is not a made up example. Our research team has recently uncovered a big operation of a group that has used and abused Docker Hub in order to distribute malware in seemingly legitimate containers where the end goal was to run crypto mining tools on your servers. And my point here, you can go ahead and read the blog post if you want, but my point here is that this is real. This happens. And the fact that we are trusting Docker Hub in this case, Docker Hub has no fault here. They're just storing files. But the fact that we are trusting the source here blindly is an open door. There's actually an entire category of risks here that is called supply chain attacks. And this is where the hackers will not even target your servers. They will not try to hack you directly. Instead, they will target your supply chain that you already trust. Think about where you store your source code today. Think about where you build and test and produce artifacts, your CI CD pipeline. Think about where you store artifacts today and how you deliver those artifacts into production. Each and every one of those chain in the links is a point of interest to the attacker. Because if they compromise one of them, most likely their malware will be able to find its way into your servers because you already trust this pipeline. So to summarize, static image scanning can tell us a lot about the container. And it's very important to scan our containers. We learn about the known vulnerabilities. We learn about misconfigurations maybe, and maybe even about malware inside of the container. But there's also an entire category of risks that we were overlooking. We've discussed evasive malware. There's also unknown vulnerabilities. There's also more sophisticated attacks that are harder to scan for. My point is that the image itself at rest can only tell us so much about what the container will do at runtime. And the best way to understand what the container will do at runtime is to run it. And this is where system tracing also enters the picture because system tracing allows us to understand what's happening from the operating system point of view. This is something that the malware, the software that's running inside of the container, cannot simply evade or escape. So I would like to show you now a quick demonstration of what it means to detect a malicious behavior using system tracing. All right. So we will start by looking at an example for a script similar to the one that I showed you in the slides. And you see here the very long base 64 encoded string. And in the end, there is a command to decode it and to make it un-executable and to execute it. And then just as the disguise show that something else is happening. If we run this, this is called evasive script, we see hello world. This is being printed by the malware. And we see I'm good, which is the disguise of the script. Now, if the malware didn't naturally print hello world to us, how would we be able to understand that this script hides another executable? We could read it. But if this wasn't a script, if this was an executable, maybe even an obfuscated executable that's harder to manually analyze, then would be in some kind of trouble. So this is where tracing comes in. I'm going to use a very popular tool that's called strace. This is a very common Linux toolbox kind of tool. It allows us to trace system calls from the operating system point of view. And I'm going to run the same script, except I'm prefixing it with the command strace. I also want to tell it exactly what to trace. So I will say trace equals exactly e. Exactly e is the name of the system call that's being used to ask the operating system to execute something. If we do that, we see the same result here, hello world. I'm good in the output, but also we see all of the exactly e equals. And we see here that this file was executed. And again, this time it was a script, so we know that it was in there. But think about a different case where this was a binary where it wasn't so easy to understand what's going on in there. Actually, let's take a look at an example like this. So here I have another example. It's the same hello world application that was the malware. This time, this is the malware itself, this time I am hiding it within this hello world dot pect. It feels similar to the previous example where we reheat it within a script. But this time, it will not be that easy to discover it. First of all, this is a binary. So if we look at the binary itself, yeah, nothing really readable for us here. It's just binary data. This is one thing. Second of all, if we trace it using the same strace command and we ask it to trace exactly remember that in the previous example, we saw that the script was started and then other processes started and then explicitly the malware itself had started. And this time, I'm going to see a different result. I only see the execution of the host of the routing binary. There is no exact VE malware here, even though we know, because we see the output that hello world was invoked. And this is not hello world. So how can we still detect it? So this is where behavioral analysis comes into the picture. So far, we've seen an example for how tracing can help us understand what's going on. So we can look for exactly E and so on. But in this case, we need something more powerful. We need to learn about the behavioral pattern of this technique, and then we can detect it. So I'm going to add a few events to trace here. Not only exactly E, I want to look also at mApp and mProtect. These two same system calls are used to allocate memory, manage its permissions, basically to manage memory for the process. And I do this because I know that the packed version is using this technique in order to hide the embedded binary. So this time, we see the same exact VE of the hello world that packed. This is the entry point. We still don't see any other exact VE for the embedded malware, but we do see a suspicious pattern here. We see that the process has allocated a memory region and it has actually made it executable. As we can see here, this is necessary for it to execute the binary data that it writes into this memory region. And if we compare this to the trace of the regular hello world, we see that it is very, very different. Here we see just exact VE, and here we see a whole lot of suspicious activity. Okay, so this was about how we can understand what the software that we are running is doing by observing its behavioral patterns. We have done it manually this time, but soon we will see how we can also use more sophisticated tools to help us in this analysis. All right, so let's move back to the slides. Okay, so we have seen how we can use system tracing to understand the software that's running in the containers. Maybe some of you think this sounds familiar, maybe it rings a bell. We have been using similar techniques in production to prevent from certain things to happen or to alert. This is often called runtime security or runtime protection. It's nothing new, right? There's a lot of mature solutions in the market for runtime security. And what it means is that we are going to monitor everything that happens in production. We are going to analyze this stream of events, and we are going to look for suspicious patterns. And if we find something, then we are going to alert or we can even block it from ever happening. Again, this is not new. The company that I work for is one of the leading vendors in this space. So there's a mature market for these kind of solutions. But my argument here is that if we are dealing with malware and trying to assess the containers, won't it be even better to make this assessment earlier in the pipeline? Why are we waiting for the containers to be running in production in order to observe their behavior? In other words, what I'm saying here is let's shift left and use the same techniques or technology that we have been using for runtime security, but use it earlier in the pipeline when we build the container where we test it and so on. Now, this is not simply taking the same product and running it in a different place because there are different constraints here. We can actually tailor the product to fit its new place in the pipeline, in the pre-production environment. And this is what we call dynamic scanning. So this solution will now also run the containers inside of an ephemeral sandbox, right? Because we're not running the containers in production anymore, we need to run them somewhere else. And these are untrusted software. So an ephemeral sandbox sounds like a good solution. We can also use more comprehensive tracing when we do it in this stage. I will explain this point very soon. And we can also automate this into a solution that we can integrate into existing processes. Think that whenever you do a pull request, you can now spin up in the ephemeral sandbox, run the container in there. It is heavily, heavily instrumented. And you have all of the same heuristics that you've used for runtime security, but even more this time. And you are able to flag this container as suspicious or safe before it reaches production, actually before it reaches even your container registry. And I want to emphasize the fact that this is not the same as runtime security because it is quite similar, but not the same. There are different constraints here. And we need to even leverage these different constraints in order to build a solution that is tailored for this new place. In runtime security, in production, we always aspire to minimize the overhead of everything that we add there. But in dynamic scanning, there is no such requirement. These are ad hoc environments that we spin up just for the sake of testing the containers. We can turn on the verbosity to the maximum. We don't really care. We can even collect more kinds of information. For example, we can run a TCP dump while we read it. Why not? We can collect a lot more information in higher fidelity and the more data we have, the better decisions we can make. This is just one example for why the constraints are different and why the resulting product might look different. Another example is that in production, we have to make quick decisions. This makes sense, right? Because the runtime protection solution is in the critical path. We don't want to delay too much the decision-making process. But with dynamic scanning, we don't have these constraints. We can take as long as we need to in order to make decisions. We can use more complicated algorithms to test or to make the decisions. We can even defer the decisions. We can see that if something specific happens, we can flag this container for someone else, some security researcher to take a deeper look at later on. This is a privilege that we have with dynamic scanning and we don't have this with runtime security. Another example for the differences is that in production, the stakes are much, much higher. Everything is high-impact. With dynamic scanning, this is a fake environment. It's not real customer data. We can actually let the malware run to completion and do what it wanted to do. Maybe we will learn something about it that we couldn't have because we had to stop it short in production. So my point here is just to explain why dynamic scanning is not runtime security. Yes, it is using system tracing. It is using even similar terminology or technologies. But because the constraints are different, we can build two different products that each one of them leverages the constraints to its own benefit. All right. So now I want to explain what I meant earlier when I said advanced tracing. In the demo that we saw earlier, I used Strace, a very popular and very effective tool to trace the system calls that a process makes. But we cannot speak about system tracing today without mentioning EBPF. This is not going to be an introduction to EBPF. There are other sessions that cover that. But I think that EBPF is such an impactful and critical component that affects system tracing today that we need to mention it. So EBPF is just a subsystem in the Linux kernel that allows you to run your own code within the Linux kernel. And this is very unique because if you wanted to do something similar to that before EBPF, you had to maybe build a kernel module and load it. So that means that the code that you write this kernel module is essentially at the same level as the kernel. It has the same privileges. It has the same blast radius. And when we talk about security products, it is not so easy to convince a security-minded customer to run some arbitrary kernel module and basically load it into their production environment, which means that it can do everything that it wants to do. EBPF allows us to load our programs into the kernel in a safe way, in an isolated way. So that means that the risk is lower and it also has a lot of different kinds of integrations. One of them is tracing. So we can do everything that we wanted to do with tracing, even more than S-tracks. And I'm going to show it very soon in a safe way, in a more performant way. And this is really why EBPF is so important today. By the way, the name EBPF doesn't really make sense today. It carries a lot of legacy with it. It originated as a packet filter utility to filter the packets in the kernel, and then it was related to Berkeley's BSD, and then it was extended to do more than this. This is why it's called Extended Berkeley Packet Filter. EBPF, not a very meaningful name today. But still, this is the name and the technology itself is amazing. So in a high level, what it means is that the application is going to run somewhere on the stack, in user space, in a container. It's going to make a system call at some point or another because every resource that it needs to work with, a network call, writing to a file, everything has to go through the operating system. And this is where we're at, because we have injected our EBPF probes into the kernel. We will be waiting there and intercepting those requests from the applications. And at this point of interception, we can do a lot of interesting things. So this is a very, very high level explanation of what EBPF means for us. And this is what we are using for our container scanning tool. We've actually took this work and open sourced it. And we have a project on GitHub that is called Tracy. And this is basically the engine behind our commercial offering container scanning tools. So it is completely open source under a permissive license Apache 2. We have external contributors. And it is very interesting project for you to look at if you're interested in EBPF or system tracing. What's special about this project, besides the fact that it provides seminal functionality to S-Trace, but it is safer and a lot more performant, by the way. What's interesting about it is that it was built for security. So when we instrument a function in the kernel, we can not only report it like S-Trace does, we can do a lot of more interesting things with it that I'm going to show in a second. And we can also trace other things besides system calls. This is something very unique. We can also instrument just arbitrary functions within the kernel that we're interested in. And this produces a much higher quality of data for us to analyze in order to detect behavioral patterns. All right. So what we're going to do now is to take a look at Tracy and see how it can help us solve a few issues. Some of them we have already started to discuss previously. So you remember that packed binary that we have looked at previously using S-Trace, we were able to manually detect that it was using S-Trace. I want to first of all show you how it is done using Tracy. So I'm going to run Tracy. Let's do this. All right. So this is Tracy. And there's a lot of options here. I'm not going to cover them all. I'm just at this point trying to reach the same result as we have done with S-Trace. So I'm telling you to trace exact to the E. I'm telling you to trace and I'm telling you to trace and protect. All right. And I'm also going to prepare to run my Hello World Pact version. Let's start Tracy. Tracy is now registering the EBTF probes and we can now run the binary. And we see a very similar output to what we have seen before. We see exactly for the entry point, we see the M-up and M-protect system calls. We see that the arguments to these calls, in this case that someone requested the executable memory region. So far, pretty similar to S-Trace. Now I'm going to do the same, but to turn on this very interesting flag that's called security events. Oh, security events. And, sorry, what did I miss here? Security alerts. My bad. I'm going to run the same Pact executable. And this time, almost the same output, but this time you can see here that we have a new kind of alert here. M-prot alert. And this is basically Tracy telling us, hey, we've noticed that the protection of this memory region has changed to executable. This is a notable event. This is not raw data. This was not collected from the operating system. This is security insight that Tracy is adding on top of the raw data. Eventually you can see here a state machine that is tracking the behavioral pattern and it results in this final alert that protection change to executable, which is a trigger point that you can react to. So this is very cool. Tracy is not only surfacing raw data. It is actually producing security insights. But you know, Tracy can do something even better. It can actually collect the evidence for the security incident. In this case, the embedded binary that was ran from memory, it can actually capture this memory region into a file that you can later go on and investigate. This is super cool. So I'm going to run the same. I will tell Tracy to capture memory files. And I'm going to tell it to clear the output directory before he does that. So run this again and run the back version again. Almost the same output, except this time where we see this alert from Tracy, we also see that saving data to this file. This is the addition. And we can take a look at this file here. If we look at the temp Tracy, this is by default the output directory where Tracy saves stuff. We can see that we have this same file here. And this file, the contents of this file, this is not a memory dump. This is not the entire 16 gigabytes of memory. This is the precise memory region that triggered this alert. So if we take a look inside of this file, so it's under this directory, this is the name. So this is the binary data. It's not readable for us. But this is the same binary data that you would find if you were to take a look at the original embedded binary, which is the Hello World binary. It's basically the same binary data. And you can take this evidence and analyze it, investigate it, and so on. All right. So some very cool stuff here with Tracy. And by the way, this is what I meant earlier when I said how EBPF is such a game changer for us or for anyone who is doing system tracing today, because this is actually running our own programs in the kernel. At the point of interception, we can do all of this very cool stuff. We can track this state machine of behavioral pattern within the kernel. We can copy bits of information into the user space just to save it aside. We can do a lot of other cool stuff. And we actually have a lot of interesting things. For anyone who is interested in system tracing, we have seen the capture option with the man option. But we have like capturing of executed files, capturing of written files. We have the security alerts that we discussed. We have events which are not only system calls. We also support tracing arbitrary kernel functions. We have container tracing specifically. So a lot of very interesting capabilities here for a system tracing tool that is security minded. And Tracy is built in the open and we welcome you to participate and tell us what you think. And Tracy, I remind you, is the engine for our dynamic scanning solution. And this session was about dynamic scanning of containers. So I hope now maybe it all starting to tie together for you how static scanning relates to dynamic scanning which relates to runtime security and how system tracing is enabling this dynamic scanning technology. I think one of my goals here today was to explain why dynamic scanning is distinct from static scanning and from runtime security, even though it draws similar characteristics in some ways, similar technology to runtime security or similar requirements such as static scanning, but still it is a piece of its own. And I hope that you find this idea interesting. If you do, then you can start to explore dynamic scanning of your images as well. So thank you everyone. This has been great fun. I think we have a few more minutes for questions. If you don't make it, this time you can reach out to me personally over at Twitter, I'm at it is gay. And I wish you a fruitful rest of the conference.