 Yeah, you need to press probably hard If you press hard, let's try it. Let me be here Okay, and just flip it All right next up we're gonna have a career and choose a bit talking about Kubernetes or rootless Kubernetes Hi, or thank you for coming to our talk in this talk We will show how we can run human it is and its dependencies such as Sierra and also runtimes as Unpublished user on the host I'm a key heroes to that. I'm a software engineer at entity corporation I'm a maintainer of mobi, which was formerly known as a droga engine I'm also a maintainer of visual kids Contrary and as a some small Contrary related projects and I am just a prescription I work for that I work on different project related containers, but my main focus is on the projects listed there This start with demo on this note my user name is a user and UID is a 1000. It's not a root user. I Don't even have pseudo And on this first I'm running everything as an unpublished user So you can see Docker D as a docker demo is running as a unpublished user You can also see a cubelet is also running as a unpublished user and Even front-end D is running as a unpublished user on the host to provide multi-node networking using BX run and my cluster is My cluster is composed on Contrary in load and Droga nodes and the cry nodes all of these nodes now support a rootless execution and I'm running Some ncx port on this cluster and We have a much of networking as well so I Run a switch shell on the disk and first one and I can do double git against Node container on other node like 10.5 80.3 Like this it works So the interaction to address Kubernetes So when we refer to rootless Kubernetes We mean we run everything as an unpublished user It's not just about running containers as an unpublished user. So please do not Confuse with security condition dot run as a user which just run container as an unpublished user or Node label user name space which is going to be added in Kubernetes 1.14. So we literally everything so Including OCI run times such as run C or CR run times such as Contrary or cryo as an unpublished user. Of course, we run kubelet kube fluency kube apere server kube scheduler as unpublished user as well So the motivation of rootless Kubernetes is to mitigate potential vulnerability of OCI run times and CR run times and Kubernetes itself Actually, we had a very Bunch of vulnerability in Kubernetes Also, the rootless Kubernetes is useful for for users of shared machines such as HPC to run Kubernetes Results risk of breaking other users environments The rootless Kubernetes is also useful for running Kubernetes on top of existing Kubernetes clusters so actually Kubernetes had a lot of vulnerabilities and for example two years ago Couple of vulnerability was found. It's a lot malicious container to access the host file system via some vulnerabilities related to volumes and Rusty there was a deep CVE that affix Kubernetes via digital repo volumes So a malicious repo could execute arbitrary binary as the root on the host when the repo was cloned Also, last year there was a serious vulnerability in Kubernetes that allowed malicious API client to gain cluster domain and haste the root privileges on the nodes and Also, just a couple of weeks ago. We found and analyze the serious Meekube breakout issue So on the Meekube malicious container could to gain the right access to ProcFS and C3FS because on the Meekube the host root file system is 8RD So you can gain right access with these ICI commands and mount commands and You can execute any commands as the root on the host via a ProcFS kernel co-opter or syscarnel you even help So how rootless Kubernetes works? So name spaces are the kernel feature that allow those two containers. They give a process a different view on the system The most important one for enabling the rootless case is the username space It basically gives a process The idea of running with a different UID and group a GID that in reality is running for the kernel And I'm privileged user can only can create a username space, but it can only map itself inside It can only map itself inside of the new name space So this is not enough for us for running images that it requires multiple users The the user in the in the name space can have a UID zero but And full capabilities, but these are restricted by the kernel. So In general the same limitations that he had as running on the host will still happen in the name space So for for allowing you multiple UIDs we we use two tools distributed by Shadow UT and New ID map and a new new GID map this allows to our privileged user to map multiple IDs inside of our username space so So we can see that The configuration for the multiple IDs is done through the ATC sub ID file So for the for user ID 1000 we we allocate 65,000 additional users So when we create the username space we we can see that There are two ranges of users the the first one is the user itself is mapped to route inside of the username space and then each additional ID specified in the configuration file. It's also added to the username space So with recent O'Connell Unplugged user can also create a network name spaces along with username spaces So an amplitude user can create IP tables rules and also configure a BX LAN and also They can isolate abstract unique sockets so but This wasn't very useful because amplitude user still cannot set up Virtual internet pairs across the host and name spaces. That means users couldn't connect to the internet But this is not a problem because instead of virtual internet pair we can use user-modulated work stack called swap And there are a couple of swap implementations such as BDE plug, BVN kit and swap for netness Among these three implementations swap for netness it's the fastest Especially when the MTU is about 64 kilobytes, swap for netness can reach more than 9 gigabits throughput This is still slow compared to the Beautiful virtual internet pair which can reach more than 50 gigabits But we consider 9 gigabits is Enough for a lot of use cases Slap for netness is the fastest because it avoids copying the packets across the name spaces We also plan to add more optimizations And we can also set up much of the networking Currently BX LAN is known to work BX LAN encapsulates Internet packets in UDP packets so it can provide a way to connectivity across rootless containers on different nodes Also protocols should work as well except ones that require access to raw internet So probably GRE is not likely to work So the container for being used needs to have It's storage configured and it needs to live somewhere many of the file systems that are used for Containers running as root are not usable for rootless Ubuntu allows overlay FS in a username space, but this is still not supported upstream as There are some security concern about opening it for an privileged user But the reference can also be used by an privileged user But it requires to be configured before and the device mapper is completely unreachable for For a rootless The simplest solution is just to extract the image as it is you for each container you You duplicate the entire image and this is known as the VFS back and And this works, but we lose all the advantages of the duplication we have with overlay This can be Improved using Reflings on file system supporting them so that the list the same files share the same payload But still we we need to create high nodes and this is as a cost if you if you have many containers a nice feature that was added to the Linux 418 is that it's possible to use fuse file system in a username space So we implemented the basically overlay in a user space. It's a fuse of LFS. It's It's a fuse implementation of overlay. It's It brings in all the advantages of using overlay. So we have the same the duplication model based on layers It's very fast to set up a container because we basically need to create an empty director But but it brings also extra complexity. It's a new Implementation it he had a few issues in the last month and in general I consider it a temporary solution until overlay from the kernel It's usable. So this was just to To not stop us for using overlay See groups is still the biggest problem We are having see group of V1. It's not considered safe for being used by Unpreveraged users the like the delegation model that is it's not considered safe See groups version 2 will solve these issues and it and can be used by unprivileged users as well But still It's adoption is blocked in the In the current OCI tools. It's it's lacking some Some features that that are still used that are used by their own times The next is implementation status in kubernetes so Actually a cube APS sub cube controller manager and the cube scheduler doesn't need any patch But kubernetes cube and cube proxy Needs to be patched because currently see groups Needs to be disabled and also some of Six CDL course needs to be disabled as well And we are planning to Proport our patch set to kubernetes signal the soon With regard to see around times Both cryo and continuity already supports a rootless mode It's not supported by docker at the moment, but docker version 19.03. It's very likely to support rootless mode And with regard to senior plugins Front-end vx run is known to work without any modification We also plan to work on cube adm integration as well And we provide User netis Which is a experimental binary distribution of rootless kubernetes That can be installable under users home directory wizard miss So you can just download a binary archive from gw.com slash rootless content slash user netis And just unpacks archive and just run run.sh and you can do a cube ctl wizard miss And We also provide docker compose dot yaml for demonstrating should marginal the cluster Of rootless kubernetes The cluster is composed of docker ctl node cryo node and contrary node And the front-end vx run is enabled by default But this cluster it's Just proof of concept status Especially tls is not enabled it enabled it in this cluster. So we welcome contribution We also plan to Provide kubernetes yaml for deploying user netis on top of existing kubernetes cluster Any questions? Thank you What if you want to run one kubernetes pod with a full privilege? Is it possible? Is it prevented by using user netis? If you want to run one pod with a full privilege as root, is it possible or is Like to run kubernetes as user and docker as user etc except one pod with privilege No, because it's running as the amplitude user So running as root still means that for the kernel you're running like your amplitude user So when we run user netis, there is no access to any root Capabilities everything is restricted by Unprivileged user namespace Thank you We provide kubernetes pods to developers, but we forbid sudo access How easy is this to replicate and can this be used to give developers sudo access for example for sudo advocate install So you don't need sudo when the host is properly configured So basically the host just Needs to have a sub ui file. This file needs to be configured by the real root on the host If this file is configured you don't need any sudo You just unbox archive as an unprivileged user and you can just run kubernetes as an unprivileged user I want to add one thing We tried really hard to not require any root privilege for running user netis the The only exception is as I showed before it's with Setting up multiple IDs and this is something just unprivileged user can't do So that's the only case where we require a set ui d binary any other question 418 So I saw on the demo that you were Doing wget to an ip I guess one anode ip and I guess that there was a service configure there Listening on port 80. I wonder how you did that without having root for using ip tables or They are using the unprivileged network we configured us as unprivileged user this is It's on top of the Of the network names pesos root we can The demo that you showed is was running in a virtual network. So and all of it was running as unprivileged I guess that's that's all. Thanks for coming