 Hi, dear. My name is Ashihiro Suda. I'm a software engineer working for NJT. And I'm a manager of several open source projects, including ContraD. This session will show the overview and recent updates of UDress containers. If you have questions, feel free to ask me at the cncfslack. This is a fully recorded session, so I can answer questions at any time during my talk. So why is UDress containers? UDress containers means running RunC, ContraD, Flano, Cubret, and everything without the root privileges. This is useful for protecting the host from potential vulnerabilities and misconfigurations. There are several techniques that may sound similar, but don't be confused. This is not about setting security context to run as a user. This is not about the Cubrance enhancement proposal for supporting user name spaces. Unlike these techniques, UDress containers means running the whole stack, including Cubret, ContraD, and RunC without the root. It's not just about running containers as a defined user. So why do we need UDress? It's because most run times tend to have serious vulnerabilities. There has been really a bunch of vulnerabilities in the past years. In every component, such as RunC, ContraD, DecalD, and Cubret, and probably more vulnerabilities are to come in the next couple of years. Most of these vulnerabilities could be mitigated if we had UDress containers. And users often make misconfigurations. They may try setting up potential security policy, gatekeeper, or other kinds of admission controllers. But setting it up properly isn't straightforward. And some people still expose the TCP ports of system components, such as Cubret and DecalD, to the internet without mutual TLS authentication. Or even if they could manage to set up TLS, sometimes they make mistakes about the private keys, such as exposing the keys as IS metadata that is accessible by any container in the cluster. So UDress containers is useful for mitigating impact of such vulnerabilities and misconfigurations. Even if the host gets compromised, the attacker won't be able to access other users' files, won't be able to modify firmware and kernel. Also, the attacker won't be able to do app spoofing and DNS spoofing. But of course, it's not a panacea. It's not effective against kernel vulnerabilities, DDoS attacks, and crypt mining attacks. And also, there are some caveats about network performance. But we are seeing huge improvements this year. Also, we can't use NFS and block storages. But this is not a huge deal when you can use managed databases or managed object storages, such as Amazon S3. Let's take a look at the history of UDress containers. It started about eight years ago. It was before I began to work on UDress containers. It was soon adopted by LXC, but wasn't popular until 2018. Also, UDress containers at that time was very different from modern UDress containers. Notably, setting up networking requires root privileges at that time. In 2018, BuildKick started to support UDress containers mostly for building images inside the Kubernetes clusters using container-based technology. BuildKick was a game changer. After BuildKick supported UDress mode, Docker, Padma, and Clio, all these runtimes also began to support UDress mode. We also supported UDress mode in Kubernetes. But it's still not upstreamed yet, mostly because we didn't have support of C-group at that time. But our work is already adopted by K3S. And in 2019, we gained support for C-groups using C-group version 2 and system D. C-group version 2 itself has been there for several years, but it wasn't useful for containers under 2019 because it had a lot of support for device controllers and frizzers. So starting with this year, UDress containers began to support setting up C-groups for limiting memory resources and for limiting CPU resources. And this year, we are seeing faster networking with a new kind of feature called SecComp at FD. Let's take a look into examples of UDress containers. For example, Docker has been officially supporting UDress mode since version 90.03. It was experimental in 19.03, but it's going to be GA in 20.10. This new version also comes with notable updates for C-group and huge overlay effects. UDress local can be easily installed by running a script from ncgps.com/.get.docker.com/.UDress. And you can run Docker command with Docker host environment variable like this. Unix.com/.UDress/.UDress/.UID/.Docker.soc. And if you run ps3 command, you can see that all processes, including container D and Docker D, as well as containers, are running without the root. So next is user entities. User entities is our Kubernetes distribution that doesn't require the root. Which able supports much of networking using Flournel and VxRAM. It provides a demo of much of the cluster as a Docker compose stack. So you can easily try it. For CRY runtimes, we support both container D and the choir. And these runtimes can be mixed up together in a single cluster. And you can see that all processes, including container D, Flournel, Kubernetes are running as a non-root user in this ps3 screen. So next is K3S, which is a CNSIF sandbox project focusing on edge computing. K3S also supports rootless mode by incorporating user-led patches ahead of the Kubernetes upstream. K3S uses container D as the CRY runtime. So next is Build Kit, a container image builder with container D technology. And also adopted by Docker build. Build Kit can be executed in several ways, such as as a part of Docker D, or as a standard demo, or as a Kubernetes support, Kubernetes job, or as a Tecdon task. To run Build Kit inside Kubernetes, you don't need to set security context.privileged. But you might need to specify security context.seccon profile and app armor annotations to allow calling several system calls, such as un-shared and mount. So the last topic is how rootless containers work. It uses several kind of features. But amongst these features, the most important one is user name spaces. User name spaces is a kind of feature that maps a non-root user to a fake root user with UID 0. It's not a real root, but enough to run continuous. It also sets up UIDs called subordinate UIDs to use multiple UIDs other than 0. By using user name spaces, a user can also create mountain name spaces. But the user cannot mount most of the file systems, especially the user cannot mount overlay fs, cannot mount NFS, and cannot mount block shortages at all. But starting with kind of 4.18, huge file systems such as huge overlay fs can be mounted. So you don't need to care about overlay fs. So next is network name spaces. A user can also create network name spaces with user name spaces, but cannot create virtual Ethernet peers for internet connectivity. So instead of virtual Ethernet peers, we need to use a slope which translates user name packets into certain system calls. This is slow, but we are seeing huge improvements this year. I will talk about this topic later. With regard to C-group, we didn't have support for C-group's version 1. That means on C-group version 1 hosts, we couldn't set up memory limit, CPU limit, and PID limit. But we can use C-group version 2. PIDR has already switched the default to version 2 recently. Probably other distributions will follow soon. So next topic is second user notification, which was merged in kernel 5.0. It's a new way to hook system calls in the user space. This is similar to PIDRs, but it's sitting front of you faster. This can be used for emulating supported UIDs without slash ATC slash sub-UID file. And kernel 5.9 added support for Seccomp IOCTL Notif at FD, which allows injecting file descriptors from hosts into containers. This can be used for emulating the throughput overhead of slope. Let's recap my talk. Utilize containers can protect the host from potential vulnerabilities and misconfigurations. It's already adopted by lots of projects, such as Build Kit, Joker, Continuity, Potman, Cryo, and K3S. It's also being proposed to the Kubernetes upstream. There are some doorbugs, but these doorbugs are being significantly improved using Seccomp user notification. Here's some resources on the internet, such as HTTPS colon slash slash rootlesscontain.rs. If you have questions, feel free to ask me at cncfslug. That's all of my talk. Thanks.