 Okay, let's start. Hello, everyone. Thanks for joining this talk. So, today, we're going to talk about how to secure containerized applications with a focus on embedded Linux. My name is Sergio, I'm from Brazil, and I have a small company in Brazil called Embedded Labels. We do consulting and training back then in Brazil. And I've worked with embedded systems and Open Source software for some time now. Before I start this talk, I want to quickly do something here in my terminal. A quick demonstration to show you guys how secure a container is. So, I have here a file called secret. You can see that it is owned by the root user. So, only the root user can read this file, right? Me as a normal user. Cannot. Okay, but it happens that my user is inside a group called Docker. So, I can do something like this. Docker run, dash IT, dash V. I'm going to share my complete root file system inside the container. So, now I'm inside the Alpine container. I'm root there. I can go inside that directory, and I can see the secret, right? So, we can see here why we have this talk, right? So, that's the motivation. So, let's talk a little bit. I mean, this, I have a lot of content here that I try to, I will try to talk about in 40 minutes. And possibly, I'm going to skip a few things, but I like to document everything so you can go to the website, download these slides, and I mean, see a lot of things there. I'm going to focus here on the most important things that I want to cover in 40 minutes. So, I'm going to start with quick introduction to containers and security, and then focus on two aspects of security. How to build images in a secure way, and how to run images in a secure way. So, those are the two aspects that we're going to cover here. Quick exclaimers, I'm not a security engineer or a researcher, nothing like this. I'm only a software developer that cares about security, as we all should, right? Also, security requires a holistic approach. Security is not something that you just do like secure boot in your secure, right? So, it really requires working on several different levels of the system, or layers of the system, and here, you are just covering a small part of it, right? How to secure applications that are based on containers. Also, we're going to use Docker here. That's not required, I mean. The concepts that we're going to learn here applies to other container engines, so don't care about it. Probably the comments that you're going to see here using Docker, you could run on Podman, for example. But the most important thing here is not the how, because to explain the how, I would need kind of two, three days. But the important thing here is to understand the why, right? Why we are doing here, why we are doing all of this, and what we are doing. But how, I mean, how we are not going to have much time to explain, for example, how to sign container images, how to check the signature of container images, things like this. What is a container? Container has kind of two main aspects. It's a kind of way to distribute software, right? So, you take your application with all of its dependencies, create a self-contained image with everything that you can run anywhere, right? Any system with a compatible kernel and a container runtime. You can run this image. The second aspect is that containers run isolated from the system. As you already saw before, it really depends on how you run the containers, right? But if you design it the right way, you can really isolate the container. That's something that I want to cover here in this presentation. So, the container uses some features from the kernel, namespaces, to isolate resources like users, process, control groups to limit the usage of resources from the containers, switchhook, that's a small feature to change the location of the file system. What else? Second, to filter system calls. Some container engines also use some kind of Mac system, Metatorex has control, like a Selenux or Epiarmor, to control what a process can do in the system. So, that's the main objective of the container engines, take the container images and prepare an isolated runtime environment for them. I don't want also to go in details about why should we use container unembedded, because that's a discussion that could take some time. As everything in life that are trade-offs, right? You have some advantages and some disadvantages, so it really depends. But we can see some advantages here and focus on the advantages, right productivity. You can really focus on your application. You don't need to care much about the infrastructure to run the application. You know that it will run because you have there a kernel that is able to provide you the features to containerize applications and you have that runtime. So, you have everything you need to run your application. You have isolation when you, right, work with containers. You are encouraged to do a more modular development, right? You can use the microservices approach to break down your system in several containers, each one with a specific responsibility. You have more control over hardware resources. Of course, you can have the same thing without containers, but the container engines usually gives you nice features to control that. Updateability, that's something that we may consider because if we design containers in the right way, it's a kind of self-contained image that is immutable that you can update at any time easily. And security, right, since it provides all of this isolation from the host operating system, you can get security from it. But as we saw, not by default, right? So, a container solution is able to provide the security, but you need to design the right way because in the end, security is like, as I mentioned here, an onion with several layers. Your container image is only one of the layers. If we look at the complete container infrastructure, right, we have kind of several different ways that an attacker could exploit the system, right? We have the developers that abuse the images. We have a CI, CD environment. We have the container registry. That's where the container images are stored. We have the device or system that's running the image with a host operating system with a complete kind of surface attack that an attacker could try to exploit. So, every block there could be exploitable, right? And our focus here, again, is on that small blocks the images. So, this talks about, again, securing the container images. We're not going to talk about how to secure a container registry, how to build images in a secure way. We're going to really focus about the image itself. And there are two important aspects of security here. One of them is, that's usually called economy of mechanism or kiss. Keep it simple, stupid. So, the simplest, the solution, the better, right? Because it's more error prone. It has a surface attack. You decrease the surface attack when the solution is simpler. And the second one, the list, the principle of list privilege. So, you're going to run a container with only the privileges you need to do its job. That's really, really important. So, during the examples, we're going to try to containerize the simple application. It's a simple C application that will read the RTC file. So, it will run open and IO control to read the date and time and print to the user. As simple as that. Let's see how can we design a container as secure as possible for this simple application. So, part two, securing the image. There are several different mitigation techniques that we can apply to secure the container image. I'm talking here about the build time, right? So, building secure images. Create a minimal image as minimal as possible. Only base your image on images that you trust. So, trust is important here. Run tools on your application. That's not really related to containers, but it's very important to run linters, the static analysis tools on your application during the, not only the development, but also in a CI environment. Run container scanning tools. That's also very important. And try to make it easily updateable. Because in the end, we're going to have problems. That's the nature of Solter. So, having a good update system is very important. Creating a minimal container image. The idea here is that you should have in your image only what you need to run. So, we want to run that simple application. Do we need a full Debian image to run that application? No, we don't need that. So, the idea here is to really build a small image with just enough components to run that small application. And for that, we can use small images. For example, Alpine is an image that is sometimes used, because it's very small. It has inside their busy box, muscle, instead of Jilib-C. It's very small. A few megabytes. There are a few projects that are called Distroless. Google has a Google Distroless project where they provide kind of a distro-less container images. For example, you want to containerize a Python application. You can use the distro-less images from Google, because it has only what you need to run Python scripts. You don't need a full, complete Bunto Debian image for that. Another technique is multi-stage builds. I want to show a little bit about this. If one possible way to have the minimal image possible for your system is just take your application, link statically with the libraries, and that's it. That's one possible solution with some trade-offs. You could even use a build system for that. So, today you can use Yocto, OpenBadded, Buildhood to create a container image, a small file system with your application. Here I have some examples. As I mentioned, I'm going to be kind of fast here, because we don't have much time. I'm going to explain a little bit of the concepts. If you want to run those examples, you can, because you can just take those comments caught and execute. So, it's kind of prepared if you want to do it yourself. So, here we have the docker file. For those who don't know, the docker file describes instructions to build a container, and here we are basing, creating a container based on Debian. Inside there, we install the compiler, because we want to build the application. Then we run GTC to compile this application. It defines the command that we want to run. So, this is small instructions to build this container. And these are the commands to build and run this container. Building with docker builds. Then we can see there the size of the container to 150 megabytes to run one application that has, I don't know, 20 lines of code. It works. Yeah, it works. But you have so much things there, right? The attack surface of this container is so big with so many vulnerabilities. That doesn't make sense to do that way. Another approach would be to use a smaller image and a multi-stage build. That's what I'm doing here. So, I'm building using Alpine. It's much smaller than Debian. And also I'm doing a multi-stage build. So, here we have one stage just to build the application and other stage to build a container with that application. When we run the commands, we can see that in the end we have a smaller image with just five megabytes of space. And it works. The third approach would be to statically link the application. So, here I'm doing basically the same thing, a multi-stage build. Here I'm building the application. But I'm building statically, as you can see here. So, I'm creating a statically-linked application. And then I'm creating a container image with that from a scratch image. Scratch image is an empty image. So, in the end, we have a container image with 135 kilobytes of space. One thing important here is the fact that we are linking against muscle that has a permissive license. So, because if you are, for example, statically-linked with GilibyC, right? GilibyC is GPL. So, we might have problems doing that. The second mitigation technique, run, create and run image ill-trust. That's very important. Of course, answering that question might be difficult, right? Should we trust Debin? Should we trust Alpine? But that's the concept. When an image is built, we can reference that image in two forms, by a tag or by a digest, kind of hash of the manifest file of that image. It's much more secure to reference images by digest. Because you really know when you reference by digest that you are using that specific image. Because a tag, it's like git. A tag is a mutable object. You can remove the tag. You can create another one with the same name pointing to a different reference. That's the same thing here. So, if you are kind of building images with a tag like this, it might be not that secure because, let's say, the project decides to re-tag 3.16.0 to a different image. So, you do two builds expecting the same result, but you might have different results because the tag, of course, we don't expect people to be re-tagging stuff, but that can happen. So, it's much more secure if you base your images on digits and instead of tags, like I'm doing here. That's the only difference here, compared to the other approach. So, instead of using a tag, I'm using digits. Another approach would be to sign and check the signature of images. There are some frameworks for that. For example, Docker has one called Docker Content Trust. So, you can sign and check the signature of images with that. I'm not going over this. It could take probably a full presentation, but it's nice documented on the Docker website. I have here some kind of comments that you could run on your machine to create the signing keys and sign the images. The idea is that when you push the image to a registry, you sign the image, and then when you pull and run the images, the signature will be checked to make sure that you're really using an image from a trusted service. Again, I'm not going over all of these comments because we don't have much time, but if you are interested, I would suggest you to have a look at the Docker website. It explains well how the Docker... It's one of the approaches. There are other approaches also to sign container images. Using a static analysis tool is very important to any project, being a containerized project or not, run a linter, or even warnings from the compiler. So, turn on all of the warnings, making the warning errors. So, when you build the application, you can make sure that at least you are doing some checks at the search code before building the application. It's important to integrate this in your build system in CI, for example, to make sure that every time you build the application, you also run a linter or a static analysis tool. So, here in this example, I just integrated CPP check in the Docker file. So, CPP check is a very small static analysis tool for C and C++ projects. And I'm running it before building the application. Another approach to mitigate risks on building unsecured images is to scan the images. So, there are tools that were built to scan search files that generate the image like Docker files to see if there are any kind of problems in the Docker file itself. And there are tools that also scan images. It is able to take an image and scan the image to find vulnerabilities. Or even at runtime to trace the images and to try to find out problems with that image at runtime. Here I'm listing a few images that are able to do this. Three of them from Maca Security, Gripe and Claire from CoreOS. Three popular tools that are able to do that. Here I'm running a few tools like I ran Trivi on that image that I generated with Debian. And Trivi found 370 issues, security issues for critical. And it generates a nice kind of output listing all of the software. So, what Trivi does is open the image, goes inside the image. Inside the image it checks for the package manager that the image is using. And then it collects information about all of the packages that's running and their versions. And go check the CVs for that packages and show you like you're running this software, this version that has these vulnerabilities. This will just show you that you have a vulnerable software. Now it's on you, right? Should you remove this software? Should we apply a patch on it? I don't know. That's why it's important to have a minimal image because you're going to have less packages there with less vulnerabilities. The last thing about the image itself is that we should make it easily updatable, right? One approach for that is try to develop an animal table container image. An animal table container is an image that don't depend on anything else or don't have some kind of internal state or data inside of it. Because it's kind of, you want to update a root file system, right? If the root file system is writable and you've updated, you could lose data that you wrote to that file system. So that's why solutions that do update systems that are based on images, right? The file systems, they are always read-only. And that's kind of the same thing here. So if we are able to develop a multiple image, that's a good approach. It's going to be easier to update the container images. And there are some tools that we can use. I work on a distribution call it Horizon Core that is based on containers and we have a system to update containers there. There are other solutions like a Linux micro-platform from Foudre Zayo, Balena, that's kind of a container-based distribution that have this kind of update system integrated. Very well. Now I want to talk about securing the container execution. So securing the container image at runtime. And again, there are a lot of security mitigations here that we can apply when running security images. The first thing is restricting container privileges. There is a flag there, privileged. Some say that's the most unsecured flag in the world of computing. You probably don't want to use this flag in production systems. It was created so you could run Docker inside Docker, but sometimes people misuse this flag. And sometimes because we don't have a good idea of what we need or what we want to just put privilege at that, that we know that with privilege, we're going to have access to everything and everything will work. But it's a very secure way to run your container. Sometimes also people think that this privileged flag means that you want to run the container as root. And that's not the case. You saw I running the Alpine container and access that root file without this flag. So it's another thing. Buy the full containers run as root. Because buy the full, the container engine is running as root. You could also be using a rootless container. That's another thing. But it's kind of not common yet. As far as I know, rootless container is kind of experimental in Docker. Kubernetes also doesn't use it. So buy the full, your container will run as root. And you have the control to change that. That's the important thing. So we have been running the container with this command. And it is very insecure. Very. Because it's mapping the full slash dev. It's enabled the privilege flags. So you are basically a root to use inside the container that can do basically everything. So here I'm just showing that with this command I can start a bash. I'm root inside that container. I have completely access to the slash dev directory because I'm bound mounted inside the container. I can just mount the partition of my real root file system and mess with it. Do whatever I can. I can even switch to this root file system, right, and have completely access to my host machine. So privilege is really bad. It enables all capabilities, it enables access to all devices. It kind of almost disables API armor or AC Linux if you are using. So it's really, really bad. How can you run this container without this flag? Remember, what we want to do here is to make our application access that RTC file. So we don't need privilege for that. We need our application to have access to that file. So we need to map only that file to the container. That's one thing. And the second thing, we need to enable the access to that file for our application. By default, when you run a container with Docker, Docker will use a feature from C groups to disable access to device files. So you don't have access by default. And there is one flag that you can use, the device flag. The device flag will not only map that device to our container, but also create a C groups rule so you can have access to that device file. That's what I'm doing here. So I just remove it, the bind them out of this entire slash dev directory. I also remove the privilege flag. And I'm just using this device flag to map the RTC inside the container. And it works. So we are decreasing the attack surface of the container so the application can access the RTC. Now, let's talk about users because we are still running with the root user in the container. And it is recommended to try to not run as user. If you are running as user, you should try to remove all of the privileges from that container. And we're going to talk about this. But if you want to run with another user, you can. There is an option that you can use in the Docker file. Or there is a flag that you can pass to the Docker command to start your container with another user. So you can do that. And your container will run with another user. But in the same name space as the host OS. So by default, Docker doesn't create a new name space for a container in terms of users. So, for example, if you run the container like this, here I'm running without any flags. So I'm putting inside the container. Now, with this command, I'm running with the user 2000. User 2000, group 2000. And then inside the container, I am user 2000. And of course, I cannot ping because ping requires send how packages that a normal user doesn't have access. My app doesn't run because I cannot open the application. So our app, the way it is, requires hood. But let's think about all that approach. User name space. What we can do is we can map, we can create a name space for the container. So the users inside the container, they are not the same as the users outside the container in the host OS. This is possible via this feature called user namespace. And we have a way to run Docker with a different username space. So here what I'm doing is the following. I'm creating a username space. This is that is also nice article inside the Docker documentation that explains how can you run Docker in another username space. But the basically idea is the following. Here I'm creating a name space with 65 something users, right? Starting with this user ID that 100,000. So the user zero, I mean, hood inside the container will be outside the container, this user ID. User ID number one inside the container will be this number plus one outside the container. That's the main idea. And then if an attacker is able to escape the container outside the container, he will not be hood. And then to enable, you can pass a parameter when you start the Docker demo or you can configure the Docker via configuration file. And here I'm running inside the name space. So we can see I'm running and I am hood inside the container, right? I can ping because root can send how packages, but I cannot access the file. Why? Because still the file has only permissions for the hood user in the whole choice. So that proves that even being hood inside the container, I'm not putting the whole choice because the whole choice, I am that user with that 100,000 user ID. How can we fix that? There are two approach. We could change this file, I don't know, put inside a group that this user ID 100,000 is inside this group and then we could access from a container. That could be one approach. Another approach, let's run as hood, but let's drop the capabilities of the hood user. So capabilities are a feature that's kind of very old now. In the past, we had this kind of binary decision, hood can do everything, others can do nothing, and then capabilities were created to like break down what the abbreviated user can do in several different capabilities like manage users, manage network, et cetera. So even running as the hood, what it can do is drop all capabilities with this option, cap drop all to fix that. So it's really recommended to do this on containers that you are running as hood. Drop all capabilities and then just enable the ones that you need for your application. That is a nice tool called Tracy that is able to trace. It uses eBPF to trace capabilities and then let you know what kind of capabilities your application needs to run. So that's what I'm kind of doing here. If I run normally the container, I have all of the capabilities, but if I drop all capabilities, then I have no capabilities. And the application runs because it doesn't require any capabilities. It needs to be root to write that file, but it doesn't require any capabilities for that. So we can drop capabilities, decrease the attack surface, and improve the security of the container. I have just a few minutes, so I'm going to be a little bit fast here and talk about a few other concepts. Restricting CIS calls is another important topic. So our application is just doing open NIO control. We don't need like to connect to a socket. We don't need to create threads. So what we can do is restrict the system calls that application can do. There is an infrastructure inside the kernel called SecComp for that, and you can just use it. Actually, Docker already runs with a SecComp profile that disables 44 system calls. You can just take this default SecComp profile and edit it as you need. Usually people use sTrace to trace the system calls that your application needs, and then with that you can create this kind of profile. It's a very kind of simple file where you describe all of the system calls that you need. And when you load this profile, the kernel will reject any other system call that your application is doing. So here I have done just a quick test in the comments. One comment, I'm running with a SecComp profile that has the IO control there, so you can run the application. And in the other comment, I remove it, IO control from the profile, and then you cannot do IO control anymore. Managing resources is another thing that you can do with a container. The idea here is that you can use some options from Docker to limit the resources a container can use. What does that have to do with security? This prevents some kind of attacks that we call the denial of service, right? So if you limit the resources a container is using, the container will never use more resources that the system has to provide them. So that's the main idea. And here I'm limiting the system to 512 megs of RAM. Another thing that we can do, and Docker does this by default using API armor, is limiting the access to resources in the system. So the idea here is that we can use a secure module, like gasoline or API armor, and then restrict what the application can do. Just an example, here I have a simple profile, API armor profile that will basically only make it possible to run the application, and the application will only be able to read the RTC. So here I'm denying access to all device files, proc files, CS files, I could do everything I can here. And the application will only be able to do kind of what it needs to do. That's another approach. Secure network is important when you run a container. It will run in the same network as other containers because Docker has a kind of default bridge. So if you care about security, probably you don't want one container to mess up or send kind of connect via network to other containers. If you want to isolate them, you're going to need to create different bridges, network bridges for those containers. That's a good approach. Storage is also very important. So we try to design immutable images. So mounting the file system with the only is very important. That's one technique that we also use. If you want to write something to the file system, use TMPFS for that. I want to conclude my talk because time is almost out. In the end, it really depends on what you need. Security is all about risk management. So how far you want to go in terms of security? And sometimes you're going to have conflicts with usability, with, I don't know, debuggability anyway. It really depends on how far you want to go, what you want to protect, what is the cost of protecting that. So we went from this comment very secure, right? To this comment below, that does a lot of things. And in the end, both work, but the second one is very, is much more secure, right? We are isolating the device file. We only have to assess the device file. The file system is ready only. For read write access, we have a TMPFS file. We are dropping all capabilities. We are preventing escalation of privileges in the system. It's not really the case for us because we are running as wood, but that would be very important if you're not running as wood. We are running with our own second profile, our own app armor profile, on our own network limiting resources of the software. So this is the most secure container in the world, probably not. We could do more here, but it's much, much more secure than the first one. And that's it. Questions?