 All right, so next topic is the Linux container internals by great Ume Uwe. Uwe Uwe, yeah. Yes. It's a DevOps and Linux engineer with a passion for open source and communities. So you can talk to great about Linux, everything DevOps, everything cloud system. And when he's not into all of this is out there boxing and reading random stuff. So over to you, great. Okay. Thank you very much for having me. My name is great. And yeah, are we speaking about the container, Linux container internals? So like we're speaking about the features that make containers possible. So yeah, let me just share my screen. Trying to do that now. Yeah. So can you see my screen? Okay. So my screen is off now, right? And I don't know if you can hear me just. Yes, it is. Okay. Okay. So I was thinking about the Linux container internals. So what really makes them possible and how they really work. So yeah, okay. So meet me. My name is great. Zero X, great on Twitter. And I'm an SRE. I do SRE and DevOps at TCI. So TCI is a company based in the UK. And we basically have organizations move to cloud native and also build products for them. So I'm a Linux fanboy, pretty much. As far as I remember, I've been using Linux since about the age of nine. Yeah, we had Linux computers at home. So I talk about the Linux container and containers and yeah, I love the chicken and chips. I also like the technique, night walks. I'm generally funny and I read sometimes. So that's that. Yeah, so just to lighten the mood. This how I walk around. And so as soon as you say hi to me, I smile. So in case, see me anywhere and I'm burning my face. Don't hesitate to say hi. I'm always smiling. Yeah. So yeah, just a disclaimer just so that we can set it in straight. Containers don't run on Docker. That is like a misconception or a misrepresentation cause containers don't run on Docker. Docker is just one of those several container engines that interact with container all times. Container all times like container wrong and the rest of them, which in turn acts the canal to set up containers. So what that container engines you can find is like cryo and Portman are some examples of container engines you can find around. So the outline is pretty much containers. They will look at the building blocks, C groups, names, spaces, copy and writes. They will talk a little about container wrong times, Docker wrong, system empty and spam and then we'll do a little demo just, yeah. So what are containers? I'm sure you have heard what containers are. Containers like HEPOS, okay? So containers is like a form of operating system virtualization. So this is not like a full virtualization because containers depend on the system channel. Like in a scanner to like to set up and yeah. So containers is like a form of virtualization and isolation because containers are isolated from their host system. They have their own PIDs, that is process IDs and they have their own network and all that. So containers help you to package application go together with these dependencies so that you can run them in between environments. So containers come with a potability where you can run them on different environments. No need to start installing dependencies. So let's say for example, you're on a Mac OS and I build a Python application and I give you the code, okay, I want you to run this application. So instead of you starting to run Python, installing Python, installing libraries like let me say, for example, Panda and the rest is obviously done. It's already packaged into a container. So you can just run the container and everything will run perfectly. So one thing to take note here is that containers have run as processes on the operating system. So this is what we should take note of here because I won't be talking much on containers. I'll be talking about containers like from a higher level from an operating system level, how containers are run on the system than containers. I'll be talking from a lower level, yeah. So that's that I'll be talking about containers as processes. So yeah, so we have the control groups. So control groups is one of the, this is special mechanism provided by the Linux channel which allows us to allocate like resources like CPU, memory, devices, network to a group, a group of processes or a set of processes. So like I said, container are run as processes. So C groups allows us to like give memory to like limit memory to a container. A container is a process of course, limits memory to a container limit CPU to a container and all that. So the reason for even C groups are the first that was C groups were built as a security feature. So it was actually built as a honey pot so that for attackers. So we have some C group subsystems. We have the memory, PID, CPU sets, freeze, block IU. So all these subsystems are what you can control on what you can, the amount of what you can control to assign to a process. So I'll just talk about a few of them. So block IU set limits to read or write from and to block devices. Net CLS allows to mark, net CLS allows to mark a network packets from a DAX. Like allow to mark a network packet packets to a group and CPU uses the scheduler to provide CPU DAX access to the processor resources. And then PID sets number of, that is set a limit of number of processes in a group. So we can limit number of processes that can run in a container. So C group subsystems like PID makes that possible. So for namespaces, yeah. So when C group will say, okay, I'm going to limit what you are going to use, then namespaces will limit what you can see. So I can give an example of Kubernetes namespaces. So namespaces in Kubernetes, so let's say we have this namespace called dev, I will have this namespace called prod. So the objects, Kubernetes objects like replica sets, secrets and the rest of them can be grouped into a namespace. While in another namespace, let's say a prod is as is on what is it called objects to. So these objects can't really access, let me say a prod cannot really access a secrets in dev because there's like a form of isolation between them. So just take node C groups limits what you can use like in quantity and the namespaces limits what you can see. So we have different types of namespaces. So we have the mounts namespace, which are MNT, which controls mount points. So upon creation of containers, the current mounts namespace are copied to the new namespace, but mount points created as I was not a great between those namespaces. So we also have the PID. So it provides namespaces, it provides processes written in the independent set of process IDs. So PID namespace is what makes containers think that they have this form of isolation. So now for example, a process now, let's say we have an Nginx Docker container. I won't mention Docker. Let's say we have an Nginx container. The Nginx might be running with a PID of four or five or whatever on the container, but outside that container, that it will be running on the whole system, it will be running as a different PID. So this is what make full containers into thinking that they have their control of what they can do. So these are one of the features of namespaces that enables that. So network namespaces visualizes the network stack. So upon creation, a network namespace contains only loopback interface. So once you create a container that is at default, it's only loopback interface that you are gonna have I'm going to show an example of this in the demo. I would like give an example of how we can use network namespaces. So also we can also look at the user namespaces, provides a previous isolation and user identification segregation. So like the UID and the GUID. So like it gives you this kind of security feature. I think it was recently that it came into the Linux container. Okay, yeah, recently it came into the Linux container. I think at 3.8, yeah. So we also have the IPC. IPC, I don't even know if anybody cares about IPC in that process communication. So this was even recently added to the Linux container to isolate in that process communications. So yeah, clone is one of the system codes. Clone is actually a system code in the Linux container that enable us to use namespaces. So if you can actually like look at the code in the Linux container and like try to have an understanding of what clone does. So yeah, this is how process is called clone. If you can look for the int clone function there. So it calls it by the stack, passes some flags to it and the parent TID. So next we have the copier write, copier write. Copier write is really, really a complex topic. Like it's going to even have its own different presentation on that. So copier write is a bit complicated. Copier write is like an optimization strategy. So if you have also noticed, let's say you pull a docker container and maybe pull another docker container sometimes it's going to tell you that this layer already exists, right? So copier write makes this sort of smart enough. So copier write like does sharing, sharing of these files and all that. So we have different file systems which copier write uses. We have the AUFS, VTRFS, VFS and device mapper. So I don't think docker has to worry about this because docker would just, you don't really have to worry about this because docker would just use the most suitable file system. Docker is intelligent enough to do that. So for that stuff, we also have capabilities. So we have capabilities that enable you to say, okay, I want this capability psi time on this container. And yeah, and we also have a Selenok. So security-enhanced Selenok. So security-enhanced Selenok, most psi admins use a security-enhanced Selenok so that you can set some kind of, have more control over who can assess this container. So that's the part you have to look out for. And then we have the container runtime. Like I said, we're going to come into this. So we have the docker engine, container, open VAs, I'm duplicated there, sorry about that. Yeah, docker engine actually uses container and ronk for the container creation because what actually made docker stand out was that they improved the developer experience using containers. Before containers were just something that most likely you see with psi admins and other. But docker really made this easy with their whole family of two. So that's why it's called an engine. We also have LXC, LXC has been existing. In fact, there's no record of LXC in the NOSkender. You wouldn't even see anything about containers in the NOSkender because the containers you might see is like a very different thing from the containers. Like it's a big difference. So LXC was one of the container technologies that we had there and we also have open VAs to, which was also one of the container technologies that existed. So yeah, so we're going to go over ahead with demo. So for this demo, we'll be looking at creating a network name space. Then we can also test, copy or write with a union FS. So if you can, if you can... If, yeah, I'm trying to share my screen now. So I don't know. Yeah, I'm trying to share my terminal. I can't find my terminal. I don't know why it was opening again. Okay. Yeah, it's open now. Okay, so can I use my terminal? Just need a bit of confirmation. Yes. Hello? Yes. Okay, okay, sure. Okay, so we'll just go ahead on it now. And yeah, just trying to kind of adjust it so. So yeah, we'll try to look at the name spaces that exist on my PC. So let's do LSNS. So LSNS would like come up with the name spaces that exist in my system. So we have the timed name space. We have the C group name space. We have the PID name space, UTS, IPC and the rest of them. So yeah, for browsers like Chrome and Brave, they'll most likely have a lot of PID name spaces. And yeah, they'll also have the network name space like for network and the rest of them. So let's try to list those two these alias. So I don't have the types to do every time. So I will say sudo ls slash proc. So we're going to check a process. So if I do this, yes, ops. So we're going to actually check which name space my first process belongs to. So the init command, like the init process. So let's see which name space it belongs. So that was just pretty simple, sudo sls, yeah. So we can see that this process belongs to a C group, IPC, different type of name spaces. So this can actually be checked for all processes. So then we'll move on to say, okay. So we are doing the network name space. So say SIP add net and SIP net and S add name space one. Then say IP net and S add name space two. So we have two name spaces now. So say IP net and S list. So let's just list the name space. So we'll have name space two and name space one. So now let's create a pair of feature internet devices. So say IP link, always forget to add sudo, don't mind me. VTH one type, VTH pair name VTH two. So we just added, we just added a virtual internet device. So if I do IP, IF config, yeah. So yeah, you can see we are the visual internet is created. So now let's link each device, each device is created to a name space. So link set, internet, name space one. So link VTH one to name space one. Oh, so the link is VTH two to name space two. So are we pretty correct here? So let's bring up, let's try to bring up the devices and assign IP addresses to them. So let's announce exact name space one IP link. Set dev VTH one to do again. Sorry. So IP net and S exact name space one IP address. Name space one IP, so I'm adding like an IP to it. 192.168, 1.1.1 slash 24, then dev VTH one sudo again. I think it's high time I run this with internet space two. Now let's verify the connectivity between the two namespace as it's enabled by the virtual internet space on it. So from the name space one, since we have linked them, so if it's name space one, we should be able to ping the name, the second one. And from the second name space, we should be able to ping the first name space. So this is sudo, sudo, okay. IP net and S exact name space one ping. So let's do about five pings, 192.168.1.2. So yeah, it's working and it's like trying to access the second name space and pinging that. So we should be able to do that from the second one to ping this, so it works. So yeah, I'll just go ahead and delete this name spaces. Delete name space one, so IP address list. Yeah, I no longer have to say. So this is something that most of this container run ties to under the hood. They try to create network name spaces and the rest of them so that you can get this interoperability between your containers, providing network access to your containers. So yeah, and then also over the second demo, this wouldn't take time. So it's just to show how these container run times also do copy or writes. So make the arrow, so let's say make the arrow, KCD one. So let's talk if I inside KCD one. Hello one, I'll touch hello two. So we'll make another di arrow, KCD two. Oh, make di arrow, KCD two. Make di arrow, KCD two. And then we'll touch already as is. It's like KCD two, you can say hello three and hello four. So then we'll try to make a union between these two directories. So let's say make KCD union. So say union FS, they are KCD one, and KCD two into union, KCD union. So for LS, KCD union, you can see that these three files has been unions together into a namespace. So under the mood, this is what container run times kind of try to implement. I'll actually say you can also look into, okay, sorry. I can also say you should look into Boca. I don't know if I can type that, okay. Please look into Boca. So yeah, so Boca is actually an implementation of Docker in Bash. So you can try to read the source code and you would see most of how this container run times implements these low-level features of the, okay, these low-level features, these low-level features of the Linux kernel like to create a container. So that's that for my talk. And if you have any question, you can just leave them in the chat. Yeah, I think, yeah, he has done, Sadiq has done that. So if you have any questions on like these, you can try to like reach out to me or you can just write them on the chat and yeah. So it was nice speaking with you and I hope you have a great day. Thanks. Awesome, that was a great presentation. We don't have any questions yet in the chat. Well, thank you so much for that great. Oh, I think Anita is having some network issues. Internet, internet, well, I used to suffer. So here is a great Twitter handle at 0x, great. The guy is great that his Twitter handle is in lit code, self. So you can reach out to me on Twitter and share whatever questions you have and you'll be grateful to answer. But if you still have any, you can drop them in chat and he will be available on the YouTube channel to answer any of them in real time. Okay, someone just asked, how C Group and Namespaces are used by containers? Okay, so C Groups and Namespaces are used. Like I said, I don't know if it came in late, but C Groups is what limits the resources that containers can use, right? So it limits how many, it limits like there are some systems for C Groups, you have the PID, the Nets, the rest of them. So what C Group does is like, try to limit system resources that these containers can use, right? So like I said, C Groups limits what you can use and Namespaces limits what you can see. So Namespaces is what makes containers not to be able to see each other because if they are, let's say this container is able to see the process of another container, that's really bad because we can have things like racing conditions and it's also a security issue. So Namespaces is what brings in that isolation, right? Like how you can run multiple containers on your system, you can run Nginx, you can run busybox, you can run a lot of containers without having conflicts. But if Namespaces doesn't exist, things like process IDs will start clashing and even though that's a bit of a security issues, that is really bad. So Namespaces is a feature of the Ninos Canada, like try to convince containers, it's what full containers that, oh, I am in my own system. Containers are not really aware of what is outside them. Containers are only aware of their own selves. So Namespaces is the kind of feature that enables that. So I don't really know. Okay, demo. Okay, see group. Okay, yeah, I think I did this demo. I don't know if this demo is able to share the demo I did. Yeah, it shared it. What we will do later is the recording of the whole thing will be available subsequently and we'll also break each of the sessions so that you can have access to individual ones. So your guests, you can check our YouTube channel by Monday or Tuesday next week and you will have his specific video so you can rewatch the parts where he did the demo. Okay, okay. That's fine. Your sesh, you can also contact me on Twitter if you have any questions. So kind of available to answer your questions. Awesome. Thank you very much, great. We have our next session. All right, bye. Bye.