 Alright, is everyone here for the Linux container internals? Okay, if you're not here, we're gonna fly to Boston, so you're stuck on this airplane. So, this presentation has the wrong names on it, I realize that. These are my partners in crime from Red Hat Summit a while back, but this is a lab that we've run before, so we're running it again here, and we'll run this again at Red Hat Summit actually in May or June or whatever that is. So, how many of you guys understand and have played with Docker? Good, alright, that's good, because this class should go deep. How many of you understand and have played with Kubernetes? Good, that's more than most of the time that I've done this, so... Crap, alright. I feel like this thing keeps making noise. Alright, so the way this... how many of you have laptops? Okay, good, because to go along with the labs, you'll have to have a laptop, and is everyone able to connect to the Wi-Fi? Is there anybody that doesn't have Wi-Fi access? Okay, so that's good. If you didn't raise your hand, then I'm assuming you know what you're doing and how to connect to Wi-Fi. We're gonna attempt to do three of these four labs. These labs are available online, so even depending on how far we get, I mean, you can complete them yourselves afterwards, but I'm going to go through the presentations and present some stuff live. We don't... so we don't have anymore? Yeah, we're out... you don't need them though. You don't actually... it's not critical. So some of you got lab guides. I was hoping to give them to everyone that came, but we didn't have enough. I didn't know how many people were... I had no idea that we were going to actually move to a bigger room and then have more people, so we got moved to this room and now there's more, but it's not critical that you have it. The people that do have it, you'll have stickers for putting these in your lab guide, and then you can take them home, but honestly, I mean, all the drawings and stuff are available online, so you don't need them, and I'll show you how to get to this presentation and how... in fact, you'll even be able to watch a recorded version of it if you want with a system that we're going to use called Catacoda. So not to worry, you will be okay. You know, it's not critical that you have the lab guide. So we're going to go through probably... I'm guessing... my guess says we're going to end up getting through three of these labs. I said a four because we only have an hour and a half, and so what we're going to do is we're going to do a small presentation and then go into labs, and then we're going to walk around and help anybody that has problems, but you shouldn't have problems. It's a dedicated environment that has virtual machines that are configured in a very specific way so nothing should break, and the tool is actually really cool and really easy to use, and so you can click through. How many of you have done tutorials on kubernetes.io or the kubernetes docs? So it's the same system that they use. It's called Catacoda, and so it's very cool. It's like this interactive environment where you can read some text, click on something, it types it in, and it brings up a terminal and you can actually type commands on it. It's like a pre-configured virtual machine. So before I kind of jump into the presentation, is there anything I missed? Any questions before I jump in? Say that one more time? Yeah, the lab guide is available online. I'll give you guys all the links too. And also, I guess the easiest way is probably, here, I will just bring this up, not that I'm trying to get more Twitter followers, but if you want to follow me, I will post all the links here. I have no idea why this is doing this. Everyone has my phone number now, if you recorded that. Alright, come on. My name is FatherLinux. So if you want to follow me on Twitter, I will post all the links here, and everything that we're going to do here is available online. So the lab guide, you will not be able to get stickers online. You can make your own stickers if you want, but I will not be sending people stickers. They're expensive. But I will have the lab guide, and then the Catechota environment. So I'll just show it to you real quick. The Catechota environment looks like this. So it's under my profile at FatherLinux, but you'll see here it's like Catechota.com slash FatherLinux, and then it's under the intro open shift. There's these four labs that we're going to be going through, the one, two, three, and four. So these are the lab guides for the container internals. These are the actual interactive labs that we're going to go through today. Does that answer your question? Hopefully. Okay, cool. Any other questions? Oh, you want to see it bigger? Yeah. How do we do that? But it won't enlarge the URL. I'll do this. Let's do this. I got an idea. How about that? All right, so we'll add them in here as we go. So actually here we'll do this. So anybody that needs to type these things in. All right, so hopefully that helps. And then as we figure out things that maybe aren't documented that well, I will. Oh, yeah. Okay, that didn't work. There we go. All right. So if we don't have any other questions, I'm going to jump into presenting. Any other questions? Okay. So I want to try to get through at least three of these. Oh, actually here. So these are probably not right actually. So I will send out the right ones afterwards. And these are older. So I'm going to jump right into the actual presentation. So we start with architecture in this lab, kind of an overview and resetting what you understand and know about containers. And then we move into like a single host tool chain. So like what a single container host looks like. And then we'll move into a multi-host environment. What a multi-host environment looks like. And then we'll move into a distributed systems environment where you're actually trouble with shooting things like you would do in production. And so it's kind of a progressive setup where we kind of reset what you know, understand really the way container internals work, the way containers get created, and how to troubleshoot them. And the idea is that this should help you really be able to build better container images and really run containers in a better way. Then we'll move into the host tool chain so that you kind of get a feel of the way of the land of what the tool chain looks like, because it's actually a lot more complex than people think. I think you run a single command in Docker and you think it's easy. And then when you go to architect your own environment, you have to learn a lot more stuff than you think you do. And honestly a lot of it is just bridging some gaps. That's kind of the whole idea of this presentation is really to bridge some gaps. So first and foremost, and you'll see this again in the lab as you go through it, but the internet is wrong. So if you look at all of these architectural drawings, they almost all show like a blue line, this is not in color, but there will be a blue line that says Docker and it shows containers running on top of Docker. And it's incorrect. Like containers do not run on Docker. Docker is a demon. It's an API demon that accepts requests from a user. It then translates those requests into, actually it talks to a bunch of other demons, which we'll get into deeper. And then eventually this comes out to be a system call called clone instead of fork or exec that creates another process on the Linux system. And so most people kind of think, they either see one of two things, they see the blue line and they go, oh well if I just run Docker, then my Docker containers will just run and then the other drawing that you'll always see is like sometimes you'll see that Docker runs on, you know a Linux system and that the containers run, but you will never see the other processes running side by side. So you will not see other user space demons running side by side. You will not see regular processes running side by side and containerized processes. They're really all equal. They're all just user space processes. So there's two main ways that this is basically wrong, all of these drawings and they all lead to the conclusion that I can just run Docker and that's where the containers run. They run on Docker. They do not. Docker is essentially a demon that makes it really easy to run containers, but it's not actually what runs the containers. So at the end of the day, containers are really Linux processes. They're not, you know, they're really two things. They're Linux files when they're not running because when you pull them down and export them, they're nothing more than tar files and when they're running, they're nothing more than processes that just happen to be as I always say fancy. They're just fancy processes and they don't run the containers and they're not running the containers and they're not running the containers and they're not running the containers and they're not running the containers and they're sandboxed in a way that they look, they have the illusion or the extra isolation to where they look as if they are running on their own system, but they're not. And so at the end of the day, what you usually interact with is user space libraries to create these containers. So all definitions of a container it's a runtime that works in Kubernetes. All of these have their own definition for what a container is. For example, if you're running Docker and Rocket and you do a process list in Docker and a process list in Rocket, they will not show you each other's containers because those are user space definitions that each of them hold and they keep track of. But at the end of the day, they're using like Lego blocks in the kernel to create a container. So a container is this word that we use but it's not really a thing. It's defined at a higher level that you either have a Docker container running or a Cryo container running or a Run C container running or a Run V container running which actually is a VM, et cetera, et cetera. So it's important to understand that basically all of these libraries are kind of... So this drawing is really important to understand. The user comes in, they talk to you, they either create a... Normally you would type a command on the shell and a process gets created, right? It could be created with C groups or SE Linux or maybe not. That's one of those things enabled. But with a containerized process, you typically come in and you go through one of these libraries. You either use LXC or SystemDnSpawn or LibContainer or LibVert. And this is if you're a programmer, right? Maybe you would do some cryptic, you would like program some code that would actually talk to one of these libraries that would create a container. And then the typical technologies that would be used are something along the lines of namespaces, C groups and SE Linux. And C groups and SE Linux have been used for a long time. What really changed with Docker is that you can access the namespaces. So using the clone system call instead of a fork or an exec. And so in a normal shell, when you type a command and you hit enter, it either execs into another thing or it forks into another process. And so does everyone understand like basic Unix internals of fork and exec? So if you need more, I will answer any questions. So... Now, like going a little bit deeper into what most people have probably been exposed to is when Docker four and a half years ago got famous, it made it really easy to create a container. And so then we all started to internalize this concept of, oh, I just use a container. But really, like, this is what it really is. So I don't have my glasses on, but... What did I do with my glasses? Oh, there they are. All right, so, like, importantly, I don't have a laser pointer, but if you look at this box, the big box, you know, you'll see it's DockerD, containerD, and RunC. So really, it's three different processes that are firing off to actually create a container. And DockerD is a daemon. ContainerD is a daemon that runs. And then RunC is actually just a process that gets created that actually creates the container. RunC is what actually talks to the kernel, does the clone syscall, which then creates the namespaces. And RunC also is smart enough to go create the C groups in the NSC Linux. Actually, depending on how things are configured, Linux creates, you know, a new context and et cetera, et cetera. So RunC is actually responsible for that communication between the kernel, you know, between the user space and the kernel. But there are actually two other demons at work. And so, like, most people don't understand that this is what's happening behind the scenes. And really, all of these things are running in user space. And you'll see containerD, actually, not yet, this is actually a future, as of the last I checked, a future architecture. The logic for polling container images is created in DockerD right now, but we'll be moving to containerD if it hasn't already. And then RunC is just all it does is it takes a config file, config.json, and a directory. And you literally pass it a file and a directory and it fires up a container. And whatever's in that config file, which the run time, DockerD and containerD basically create entries in that config file, or in the case of cryo, it's the same thing. It uses a config.json. And then makes RunC basically just gets called with this config.json and a directory and it creates a container. But I don't think a lot of people, I think there's probably a very small percentage of people that have real good clarity of how that works. And so then, you're building upon this even further, you know, in this one, I still kept the DockerD. This is what a typical Kubernetes environment looks like, right? So now we've added some more user space demons. We've added the node, which is responsible for going and talking to DockerD, and we've added a master, which is responsible for talking to the node. And for effect, I've added systemD to show you, well, systemD actually starts, you know, the DockerDemon, OpenShift node, the OpenShift master and etcd. And so those are the four things that you can figure on a system to actually start. And that's kind of the entire tool chain. You know, that's what a fully configured Kubernetes environment looks like, or an OpenShift environment in this case, which is just a distribution of Kubernetes. That's what it looks like in its full glory. And then again, it starts containerized processes, and then there are other regular processes running on the system. And then the containerized processes use these, you know, technology in the kernel, but these ones don't necessarily. And again, containerD goes and pulls the image, or cryo, I have a drawing where I kind of show, cryo replaces DockerD and containerD and imagine that as one box, and cryo goes and pulls, it uses its own library to go pull images and then expand them onto disk. But that's kind of its full glory. Does that make sense to everyone? Any questions about that? Because that's actually... No, containerD, so in the latest releases of Docker, for sure it has containerD, but they're moving functionality between DockerD and containerD. So DockerD is becoming more just the API endpoint, and then containerD is doing the logic of pulling the container images down, expanding them onto disk, and then creating that directory that RunC needs to then go create a container. Say that one more time? That's correct. RunC is basically lib container nowadays. So it is basically lib container. Yeah, and the other problem with this, a lot of confusion, is that this has changed very quickly over time. And so four and a half years this has changed immensely. The original architecture was Docker just did everything. DockerD did everything. And we slowly broke them out into different pieces. So like lib container was the first iteration. It actually, I believe that if I remember correctly, it just used liblxc at first. It didn't even have lib container. So it actually used an existing technology to go talk to the kernel to create those containers. Then they broke out the logic and kind of made their own ones because they wanted flexibility and I don't remember all the architectural reasons why created lib container. Then at some point the world changed and we wanted to create what was called the OCI where we wanted to have a standard run time for a standard way that we pull images and a standard way that we explode those onto disk and then a standard way that we run those. And so like I have some other drawings. I won't go deep into them, but that's what drove the creation of Run-C. So Docker Inc separated off Run-C as its own thing, contributed it to the open containers initiative. And now Run-C is kind of that middle man. It knows how to take that exploit, that config.json on disk and the directory and then turn it into a Linux process in a very specific and particular way that's governed by that OCI standard. And so that's important because all the run times are basically using that now. So Cryo uses Run-C and Docker uses Run-C and I don't know if Rocket uses Run-C. I haven't kept close to it. I believe it does or can anyway. So does that make sense? Any other questions? All right, so now I want to kind of show you this is what a full, this is more of a, here's how it would look in production, right? So you would have multiple masters. So here's how, I'm going to walk through this. So this is how a user would come in and create a container, right? In a Kubernetes environment, it's multi-node. So you wouldn't just come directly to a node and create a container. You would come to the API, right? And in an OpenShift environment, the way our installer works is it creates an HA proxy that actually will load balance between multiple masters. Those multiple masters keep track of all the different nodes with STD and actually STD keeps track of everything in the Kubernetes environment and a node just happens to be another object inside of Kubernetes. And then each node looks like this. So each node would have its own docker D, its own container D, its own instances of Run-C running. There would be multiple ones for each containerized process. And then some of the nodes would run a registry server, you know, maybe one or maybe multiple ones, it depends. And in this scenario, this is an old drawing. I used NFS. The new ones use Gluster. I think out of the box it uses Gluster now. But either way, it's the same concept. At the end of the day, the backing storage for the registry in OpenShift. Whether you use OpenShift, which kind of is opinionated about how to create all this stuff and has an installer that shows you, even with regular Kubernetes, this is the stuff that you would have to set up yourself to really create a production environment. You would have to set up your own. Basically, Kubernetes gives you sort of this and this, you know, but it doesn't necessarily give you that and that. And it doesn't necessarily give you a way to add and delete nodes and like some niceties basically. And then also OpenShift does things slightly differently, which we'll get into later. Like any Linux distribution, Kubernetes is kind of like Linux in a lot of ways, that there are a lot of different distributions of Kubernetes. And so like where you put config files and things like that can be different between distributions. And so Red Hat's OpenShift is kind of similar. And then finally, I just want to walk through, I think this is another point of confusion. People don't understand. I tried to do this kind of super drawing of what is OpenShift versus what is governed by the CNCF and then the OCI, which is actually part of the CNC, but you'll see, you know, Container D, Fluent D, Kubernetes and the Technical Oversight Committee, which is a group of people that's not a piece of software, are all part of the Cloud Native Computing Foundation, right? And so they help kind of govern and manage these projects and there are a lot of technical people that are involved in this TOC, some from Red Hat, some from Docker, from CoreOS, from a bunch of different companies. I mean basically every company you've probably heard of are involved in kind of driving those, you know, I mean and then the Open Container Initiative, which is another important thing that you should know, they govern the runtime spec and the image spec and they also release a piece of software called Run-C, which is an implementation of the runtime spec. And the runtime spec is what takes, you know, basically knows how to take that directory and that config file, as I mentioned, turn into a container, but there's also governance around, you know, how you communicate with a registry server and what that format of that container image as you pull it down, which is actually a bunch of layers, it's actually not an image, it's actually a repository of a bunch of images, how you pull that down and how you expand it on disk and present it to, you know, Run-C basically. And then OpenShift has, you know, its upstreams are essentially all of these things, right? There's Kibon, Elasticsearch, OpenVswitch, which we use in RHEL, you know, as kind of the underlying, we create a flat network that's virtual in an OpenShift environment, that's again something the installer does. But there's more, you know, there's more to setting up a Kubernetes environment. Getting to this requires more than just, you know, just Kubernetes. Like, you know, there's more, there's the installer, there's the configuring the network. There's a lot of things, there's logging, et cetera. So does that make sense to everyone? Any questions? All right, so with that, I'm going to get us into, I want to kind of, I'm going to get us into the lab. We're going to do the first lab. And for those of you that have the lab guides, you're just going to do a real quick thing that everybody else won't do, you're going to put your stickers in. Now you should know which order those stickers go in because I presented it to you and you should understand that architecture. And so hopefully you all get it right. I'm going to check as you guys are going out the door. And if you get it wrong, you won't get a gold star. But I'm going to give you the URL. So this is the one that we want to get into. So we want to get into catacota.com slash fatherlinux. Actually if you can just get there, you can click through. So I'll show you, like, just get to catacota.com slash fatherlinux. You'll end up getting to this page. And then you can just click on introduction to open shift for developers. And then you'll get this page. And then you'll see very obviously we have lab one, two, three, and four. We're just going to do lab one. And then we'll break. We'll probably do, like, ran a little bit longer than I wanted to. So, well, we're about right where I want to be. We're going to give you about, I would say, like, 12-ish minutes. Maybe 15 minutes to get through that one. Probably 15. We'll do 15. Actually, I'll set a timer. So to get into this environment, you have to create a username and password, which is described in the lab guide. But here, let me just show you real quick, I'll log out so that you can kind of see. And then I'll start the timer. So if you do not have, here's what will happen if you go to catacota.com slash fatherlinux. So you'll go here. It'll show you this. You know, you can click on Intro to OpenShift. It'll show you this. And then you notice it doesn't show that this is completed anymore. You click on this scenario, and it will bring up this page. And so you're welcome to just use a throwaway email address and create a password. Or you can use your GitHub to sign in. You know, you could sign up for free. But this takes two seconds if you just use GitHub. I just used my GitHub because I was already developing in catacota and it's pulling stuff from GitHub anyway. So for me, it made sense to use GitHub. But you could sign in with anything. And then once you create an account here, this actually gives you access to anything that's on catacota.com and they're all for you tutorials. And we can walk around and help anyone that has problems with that. But I'm going to start a timer for like 15 minutes for the first lab, and then we'll walk around and help people. But it should be very... Actually, before I turn you free, one other thing I want to run through is I'm going to sign in here and then I want to show you something. So we're going to skip the video. We don't want to do the video on the first page because I basically just presented that. So right here's a video. If you get home and you want to do this entire thing, you can go through. You can watch the video. It's about the same length as what I just presented. And then you hit start after you watch the video. And then it will bring up a virtual machine on this side. And then on this side is kind of where the lab information, you can read through this. And then I just want to show you that it will take a while for this to start. But like each of these commands, you can actually just click on them and it will actually automatically put them into the terminal and type them. So there's no typos or anything. Honestly, this should go pretty quick. These first ones are pretty easy. And so although it looks like some of you are already getting in because this is taking a while. I don't know. I've never seen it take this long. Yeah. Say that one more time. Yes, open shift. The question is, is open shift open source? And yes, it's like everything else Red Hat does. Everything is open source. So the upstream of open shift is called open shift origin. And open shift origin is built off Kubernetes and all of those other things I showed. So like Kibana and all these other projects. Think of it as a distribution that pulls all these tools together. Very similar to like Fedora pulls things together. And open shift container platform is built off open shift origin. And origin is very similar to Fedora. It changes quickly and pulls all those things together and kind of proves it out, makes sure it works. And then our enterprise distribution is open shift. And this lab here, if it ever comes up, is built off open shift origin. Although now I'm getting worried. Yours is working? Okay. Has anyone got in? Okay, good. So only mine. If not, just refresh. If I remember right, I think he's using Amazon on the back end. So it should try a different VM if like, this doesn't work. Red Hat's paying for the cost of it, so I don't care. Just create another VM. No, mine's really hanging now. Huh. Let me try this completely from scratch. This is a good stress test for Summit. No, I logged out. I think it'll work now. We'll see. I've had to do this before. It's not working for you either? I can't get in either. I waited about three minutes and it did come through. It did come through? Okay. I was worried about this. Many people connecting do it at once. There is some on the networking, yeah, in there. In a later lab. I think under the single host tool chain, we dig in a little bit. I have more material I want to add actually around the way Kubernetes does it, because it's interesting and I think it's important to understand and know, but I haven't added it yet. I will eventually add it. You can bug me on Twitter and say, hey, go create that content. And under social pressure, I will do work. This thing is not that comfortable. How many of you are still waiting to get in? A decent amount. Four seconds for you? What? Let me try another one. I just want to see if it's my browser. I know it's not the internet for sure. It's not that. Who's asking for five minutes? Five minutes for what? For what? It was supposed to go till... It's supposed to be an hour and a half. I don't think this is right. Just when I thought the lab couldn't go any worse. We get kicked out. Yeah, that's true. It can always get worse. If there was a fire like right here, I mean a big fire. Not a little one I would run away, but... Oh, here we go. This one got through. So probably just reloaded a couple of times if you're still waiting, if you're impatient, because the other one I got got through. That's what I'm thinking. What? It takes two times to do what with a container? Pull the container. Yeah, that could slow it down, but these are all separate virtual machines in Amazon, so it should work. It shouldn't be broken, but... Well, I got one through. But either way, I'll show you guys, for those of you that are not in, but basically what we're going to do here is you literally just click on the command, and then it'll basically type it in. You'll see this is pulling the image down. It's going to run it. It is definitely going slower. Yeah, so we're all getting crushed. So apparently having this many people is tough for Catechota. Yeah, that's high. Probably. The script is not on standard path, and I found it only here. It's because it hasn't completed yet. You probably... So there's an intro... Step number three. So I'm really far? Yeah, but I fixed that. So it is the right script. It is the right script, and it should be working. Watch. Mine will... It should be the path. But it should work, because there's a script at the beginning that I copy it into user bin. What I'm thinking is, since it's so slow, the... There's like a... I could show you in this, but there's basically... ...containers with those stop, and everything works. Just maybe final step... It's what he said. So basically the profile lesson... Mine is working. Yeah, yeah, yeah. The problem is it's not done configuring itself. I understand. Yeah, on the back end it's so slow. Some of the steps haven't completed yet. Yeah, that's what I think is happening. So here, you can see it right here, in intro open shift, in container internals lab one. This file right here is what configures that VM as it comes up. And so since there's so many of us doing it, you see it copies the script into user bin. It hasn't gotten there yet. Even though I put it before all these ones, so maybe the git clones are not... I don't know what's happening. They're not done yet. Yeah. I was fearful that this would happen. Yeah, can you make Cateco to work better? No, that will not help. Because they would have to cut and paste this and connect a bunch of people, and it would be a nightmare. No, the problem would be we'd have to describe to everybody how to get in, not working. Yeah, they're all on different networks, and who knows what. And then two, the commands are all... These commands are configured for only this environment. I mean, there's a ton of configuration of this environment. On the back end, there's a ton of configuration of this environment to make this all work. None of these commands would work on a generic open shift cluster. I'm thinking that might be the case. I'm thinking maybe we just have everybody let's have a vote. Would it make more sense for me to just run through them and show them to you, as opposed to having everybody try to do them? Is anyone massively against that? All right. Yeah, I don't know what you mean. Say that one more time? No. Yeah, I think the problem is it's so slow that the configuration isn't finished yet. Because there's so many of us into it. So, I don't know how... I guess everyone hit log out, like get to the log out part and hit it because it seems to kill the VMs then, kind of, I think. So, let's try that. Let's everyone log out. All right. Is everyone logged out? Did you click on log out? Don't just shut your laptop because the VM will just keep running. All right, now let's see if this works. This is an experiment. It worked so beautiful all day as I tested. This part is so fast. That looks better. Good, but I have... So, I have this under my profile. I have two versions of this lab. I have one that is like a github repo, but you would have to create your own OpenShift environment and it walks you through the same things. It's the older version, but you have to basically set everything up and make it work right. It's a decent amount of setup to get an OpenShift working the way you want it to make it work for this lab. But when it works, it's beautiful because it just already configured the right way and you don't have to mess around. Honestly, if you do this by yourself later, it will probably just work because there won't be 50 of us all connecting at the same time. Although this still is not... It's still slow. We may be falling back to Plan C, which may be I can present more material and we can interactively discuss it. And then we can do the labs later on your own and then if you have anything in the lab that you don't understand, feel free to email me or tweet, you know, whatever, basically. Call me too, you have my phone number now. Remember, it's plus one to dial the United States. No collect calls though, please. Yeah, it's dying. I don't even know what it's doing anymore. It's definitely dead. I'm going to go complain. All right. So, you know what we'll do? We'll dig into the next lab and we'll talk through this stuff. I have enough material that we can probably fill out. When does this go till? It goes to 45, right after. So 545? Yeah, that's just about perfect actually now that we've killed a bunch of time. Container image is the next thing where I dig in, actually it's single host tool chain really and container images. So I get this question all the time and I have different versions of this drawing that are easier and more complex depending. I'm hoping that since you're in a container internals class most of you are pretty technical. You can handle the harder version of this but at the end of the day, I would argue in the traditional environment whether it's actually virtual or not you can add or remove this bottom piece if you want but most of the time we optimize for agility just with the application, right? I mean, the lowest dependencies in the kernel we optimize for stability and those are two competing engineering paradigms. So Red Hat for example with Fedora moves very quick every six months, new version, no problem. Rel, it's 10 years, 13 years we backport changes. That's a ton more work, right? Backport changes. But the nice part is every time you run a YUM update in Rel it works not every time in Fedora does it work not every time in any operating system I've used actually there's two that I've never been burned by I'm a Red Hat person but I'll fully admit I've never been burned by Suce Enterprise Linux I've been burned by every other distribution of Linux that exists they're mostly optimized for speed not for stability but in a production environment typically you would just do something like Rel or CentOS or Suce or something and it would move pretty slowly and it would backport changes but only the application. In a containerized environment since now this is basically what we have in the container we have the application and all the OS dependencies which you would run through in the lab and see when you do like LDDs on binaries and look at all the dependencies which a lot of people that have forgotten Unix and Linux probably don't remember you would see how these OS dependencies get bundled in the operating system inside the container image but then in this one I show here's like multiple container hosts they can still run as VMs it doesn't matter if you run OpenShift out on AWS they're still VMs so it doesn't matter either way they could be bare metal or VMs and so this is the part that we're going to dig into so if you kind of think of an operating system as this stack of stuff there's user programs those user programs are linked against libraries, interpreters depending whether it's a Python script you know it relies on an interpreter if it's a C program it relies on other libraries those libraries GLIB-C a lot of people don't understand this this is a nuanced thing that I think we've forgotten about over time is GLIB-C is the standard set of interfaces that we use for system calls it defines essentially what the set of API calls that we can make into the kernel is those are all documented so like file open, file read you know the system you know fork, exec all of those and a lot of you should know that if you've programmed you've probably used these functions maybe never fully understood that they're actually part of the interface to the kernel and those are called system calls they're special they're different than like in .NET where there's other higher level things where it's like get stream and like all these other functions that are not part of the core you know piece that you know you rely on the kernel for these are published and documented in GLIB-C and so only the ones that are in GLIB-C are considered the public interface there are other secret ones that I don't want to call them secret non-documented ones those are liable to change they're not necessarily you know governed by the same kind of stability requirements as GLIB-C although Lienus does beat people up if they change anything in that syscall interface but a lot of people I think have forgotten this you know like this is knowledge that is kind of common in the Linux world or at least deep you know people that really know a lot of things that come from that background but we've forgotten it with containers and we just think oh well I run a container it's abstracted it'll just work and you're like well it depends you know when you run a web server a web server probably uses we had a discussion two days ago Bird of a Feather where we talked about how many syscalls does a web server use I don't know 20, 30, 50 it doesn't use the 300 that are part of this interface you know I think I counted in rel it's like 381 and so so you know we don't use most applications don't use every single system call but if you think about what an interpreter is or even a bash script you can actually execute any system call it's what we call a Turing complete problem so a Turing complete problem means that until the program is running it can run interactively and literally imagine if I wrote bash, bash is essentially a Turing complete problem because inside of bash you can execute anything you want and so anything that the user's imagination can come up with they can even run undocumented syscalls they can you know they can write a little C program that basically puts the kernel into different modes I mean you can do anything you want so we call that a Turing complete problem when all you're running is a web server that is not Turing complete there's a finite set of syscalls that that code can make but when you write a form in that web server that then allows somebody to type commands in and the web server tries to run those commands and you turn Turing complete problem and so like there's a balance of like most of the time things just work if you put them in a container but you do have to be careful about mixing and matching user spaces and kernels and also as workloads expand beyond web servers you'll end up with a problem of there are certain things that access slash proc and slash sys and expect things to be in a certain place and so if your app is funky or a.k.a. not a web server like say it's an HPC and the need slash sys for something because it's trying to access something in some funky way it's some legacy thing you could end up in scenarios where you have very incompatible container images from the container host if you're not using like rel7 on rel7 and so people kind of need to think through this entire stack as they're building their applications so going further more into the actual like image side there's a ton of nuance that I think again a lot of people haven't captured all the way in their mind we do a docker run and it just seems super easy right you go docker run rel7 bash and just works its magic and you don't really think through what's actually happening the OCI and docker you know basically docker which became OCI basically the image format kind of has the concept of these tags and there's a bunch of different layers in these in these repositories we refer to these as container images we say the word container image all the time but it's not an image it's actually repository and it's basically layers and tags and these tags we usually use these tags to represent a version of the software because it's a natural thing to represent in the layers and the tags but that is not necessarily true we could have two different configurations we could have configuration a and configuration b and configuration a could use version 3.0 and this could use version 3.0 the concept of using these layers and tags as versions of the actual software inside of the container image is purely like a de facto standard it's what most people do it's not necessarily what's mandated to do and it's part of it's not part of the image spec in any way shape or form so this will come up as you roll this thing out to developers and people start to argue about what you should and shouldn't use tags and layers and images for you know essentially repositories and how you should break them down in fact we had this internal debate at Red Hat when we were building our registry and people were trying to figure out how well do you put rel7 in a separate repository and actually I'm getting ahead of myself but so here's the next piece of it that adds to the complexity right this is what a URL looks like you know you pull this registry.access.redact.com slash rel7 slash rel colon latest so latest points to whatever the dot release of rel is and then rel7 is the major release and it's part of the repository or the namespace I'm sorry and then of course you pull all the images from Red Hat's registry but these are like they're arbitrary definitions they're not mandated by anything like and so you have to remember that when you go to architect your own systems because it seems so easy when you go to pull it like instinctively you just look at the URL and you understand what that means but it's not intuitive when you get somebody in your group that uses it for something else why did they, I'll give you an example I go to Docker hub and CentOS 456 or I think it's 567 or maybe 67 are all in the same repository that seems like insanity to me like that seems like something that you should never do like in my opinion you want separate major releases in different namespaces because you don't want to run docker run CentOS and one day it's just CentOS 8 and everything you had just broke because you had no idea that it was going to roll to version 8 on Tuesday like I mean that's crazy but it's how it will work right now like the way it's configured on Docker hub when CentOS 8 comes out your stuff's all going to break so like you really have to think through this stuff and really understand what these things mean and it's so easy to use but it's so hard to design a new system using the same tools you have to understand it like five times as well as you understand it you know when you first start using it and this is all from internal arguments and debates that we've had the next thing that like kind of getting deeper with images yeah so these are all different tags that represent different image layers but actually something that I don't show on this is you can actually have like image layer image layer tag image layer image layer tag image layer image layer tag there's a bunch of unnamed image layers in between the tags there can be I should say you get all those layers pulls all the layers down they're all different they're essentially blobs of data and there's a JSON file called a manifest and you pull that manifest down the Docker demon looks actually it's a container D whatever library it uses to pull those basically looks at that says oh I need this image it crawls this and basically says oh I actually doesn't even crawl it it actually looks at the JSON file and says oh to create this tag to build up all the layers I need for this tag I need to pull this this this this this and it pulls those all down and that's what you're seeing that little bar go across all that magic is happening in the background for you then once it gets on disk it's actually exploding it out into a directory creating a config file handing that off to run C and then it gets ran and exploding it on disk uses what's called a I'm getting ahead of myself because that's actually the next thing but that uses what's called a graph driver and that's that translation between all those image layers into a single directory on the system and all that's happening magic right you don't even know how that happens but it's happening every time not the whole history only the path that builds up to your tag that you've pulled because there can be dead-end branches and actually sadly I show you that in the lab so you can have a tree structure of stuff and if you if you traverse down this tree structure well if you pull a tag that's down this tree structure Docker is smart enough and so are all the libraries that pull images they're smart enough to only pull that only pull the set of things that you need to get to that one they don't pull all the other ones necessarily so like you only pull the layers that you need to build the tag that you've decided that you want you can also randomly pull a layer you don't have to pull a tag that's another thing people don't realize you can pass it as this big ID and it will just pull only that thing and it will pull down to an image and then when you go to run that it can be completely broken that's another thing that people don't realize tags are a way for you to communicate and if you want to use version 4.0 call the 4.0 tag if you want to use version 4.0.1 use that and then latest always points to whichever the latest one is which again kind of insinuates that hey we should be using version numbers with these tags but it's not necessarily the case and then any image layer in between could be a half baked image that kind of half works like you know if you ever look at a Docker file every one of the lines basically builds a layer and you could literally like you know if you kind of look at a Docker file and say it says hello you know puts a user in there adds a user adds a user and then does some other stuff you could pull like here and the software is not even actually copied to where it needs to be in the image yet so you could pull down a half broken like shell of a container try to run it be inside of it with bash and look around and it won't work it's like a non-functional image that's very possible with Docker tag is just tagging to a layer that already exists yeah you those tags are basically for specific layers that already exist the layer and the tag are two different things basically every layer has an ID a tag is a named ID that's a one that you expect a human to use you're essentially communicating hey this is a layer that I expect you to use does that make sense it's kind of like GitHub I mean it's kind of the same thing you wouldn't just pull up that we you can pull broken things from GitHub in the exact same way if somebody makes commits and you're like halfway but they need to make a few more commits to get to a point where it works again and then somebody pulls like just you know check you know just now we always pull like again the latest one which should point to something that actually works oh another question yes I can I'm sorry hopefully that was enough description that you under I don't know what he asked anymore I forget the question is yeah yeah I have I can actually demo it too with docvis it's in the lab to show you with a tool called docvis alright so let me repeat the question the question is not a full question by the way but I will repeat what you said and what I think you are asking you said there's not multiple namespaces correct so so alright so what he's asking is since there are not multiple namespaces like there's only one if you go to like Docker hub like like how would you represent you know like other things right so that's again one of those things it's kind of like a de facto standard like if you go to docker.io there's only one namespace if you go to red hats registry server there's only one namespace there are a lot of registry servers that allow you to create multiple namespaces internally which creates a whole nother cluster of problems because now you know developers or whoever architects may want to create arbitrary meanings for each of these namespaces and and you're right they're not there's no that's left on your shoulders to figure that out which can be a pain in the butt I mean I've ran into that so that makes I mean does that I no no you well no layers is different registry or namespace having multiple namespaces this way is different than this way I mean we have any docker registry can have different namespaces you just can have multiple layers of namespaces where it's like slash rel slash rel six slash rel six dot four slash rel blah blah blah I mean you can't keep adding slashes but you can have different namespaces this way so you can have red hat dot com slash rel seven red hat dot com slash rel six you know and we have that now and so does every registry has that docker.io has that by default they only have if you go to docker.io like hopefully this works I thought that's how you did it maybe I'm doing it wrong hub.docker.com slash sent to us I know it's I think it's you need that's the thing in it I think you need I'm gonna just go to it because it'll be easier than me farting around but uh so if you look at the URL I thought I'd need that slap it needs this underscore so this is a default namespace right and what that means is it means basically go to the repository and just and and just you know like like like always use that same repository the problem with that right there is that look at these tags so like six six seven six eight seven one you're when this rolls to sent to us eight you're screwed like you're getting sent to us eight so like in this scenario I really recommend using a tag because if you don't use a tag you're you're in deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep does that make sense to everyone you should do that anyway but you're really a deep due because sent to us is only compatible within you know minor versions not major versions but they could if they they could choose to have not just the default namespace but the but the but but have multiple namespaces that defined the major piece of software easy to use things like that for marketing purposes, you know. Sorry, all right, good. Yes, correct. Yes, exactly. And if you do it with Red Hat's registry, you're less screwed, although I would still argue never do this because it's a bad idea. You will still probably get screwed. But you are less screwed because we have a REL7 namespace. And so the worst thing that would ever happen is if you use the latest tag, you would go from REL7.4 to 7.5. And mostly, as most people that have used REL, it's pretty stable. Like that user space doesn't change that much. There's an ABI API compatibility guarantee. It's not a guarantee. I always get yelled at, not to say that. But essentially, we publish a document that says we attempt to maintain ABI API compatibility in all of the group one, I think they're called, user space tools. We have different layers of user space tools, but all the core ones like G-Lib C, things like that are very, very stable. So the chances of an app breaking with this, even if you use the latest tag, are way less than if you have major versions in the repository. But this is something to think through yourself. You have to decide how you want to architect that if you're building your own environment. But it's more than I think. Again, it's so easy to type Docker run. But then when you go to build this stuff yourself, you just willy-nilly build stuff without thinking and then things break in strange ways because you didn't think through it. So I'm just highlighting problems that we've seen, basically. Any other questions? All right, so the next thing that always comes up, going deeper into images is people always like, this is one that I run into every customer I talk to, they always ask. So now the developer just controls everything, right? And you're like, well, not exactly. I mean, if you think about it, a user space in an operating system is a bowl of soup or it's like stew of soup, right? And we had always argued about what should be in the soup. We would argue, should you have garlic? Should you have more pepper? Should you have salt? Developers and sysadmins argued about this. I know that's my, I hear this stupid thing. I think that's my, oh, I thought that was my thing making that high-pitched noise, but it's not. But basically, it's always been a collaboration in user space. So if you think through what configuration management is, it's automation to help you configure the user space. If you think through what YUM and RPM are, they're automation to help you configure the user space and getting more and more advanced tooling, right? So RPM is kind of the first one. Then you have YUM which manages dependencies. Think of RPM as the set of things for them to transfer, you know, for the package maintainer to transfer knowledge to you, right? Here's how you should install this thing and run it. And then you think of YUM as here's all of the knowledge of what it requires to install itself. Then you think of Ansible as here's the stuff that we want to add to it that actually makes it run the way we want. So there are other tools to build images. Docker files become that. But really, this is all about user space collaboration and controlling what's happening in that user space. Just because you put it in a container does not change that at all. Everything has changed and nothing has changed. You still end up with this problem where historically, middle-ware people want to do crazy stuff. I mean, every job of a person I know is like pulled down a tarball or ran it, works. And you're like, okay, that doesn't really make me happy as a sysadmin. Like, I'm kind of like, that's crazy. But they're like, yeah, I pulled down six different JVMs. They're all running in slash user. I'm like, why did you put in slash user? You're like, that should go and opt at a minimum or in user local or something, whatever. Either way, this thing still happens. Even in a container, you're still going to get into this argument. And then the app developers are just like, whatever, we have a Warfile. We don't care, right? So the nice part is with a container image, you at least now are speaking a standard language, right? Like if the operations team says, hey, we're going to pull down the REL7 image, we're going to modify it in a certain way, add some stuff, like whatever the stuff is, security stuff, scanning tools, whatever they want to add to it. Maybe just simple things like lib SSL or whatever. They're going to add that stuff because this is a concept of DRY, like do not repeat yourself, right? So like, if you're going to have G-Lib C, you don't want to have different versions of G-Lib C in every single middleware build. So like, if you have a Ruby image and a Python image and a Perl image, you don't want them to all have the same copies of the same stuff. You don't want to install the SSH libraries like three different times with three different versions and have everybody have their own version if possible because that creates massive problems and it's DRY. It's a dry problem. So now at least whatever the operations team goes and builds, then maybe they do with a Docker file, maybe they do with something like Ansible Container which helps build images in a way very similar to Docker files except using Ansible code, which again is a collaboration of changing how you change the user space. Maybe there's like rel7-corebuild and that's actually what I create, like when I have some demos that I do. Maybe that's what the operations team puts in the registry server. Let's everyone use that says, this is the single source of truth. We've added all the stuff that our company wants, like all the magical stuff that records if you type commands or does whatever weird stuff that your operations team wants that makes your life either harder or easier. And then hands it off to the middleware team and the middleware team is like, well we pulled down our tar balls. You're like, well make a Docker file that pulls down those tar balls. And then also, oh by the way, the fact that you're pulling that from an external web server is not gonna fly. So make sure you copy all that stuff locally, blah blah blah. But at the end of the day, you get to now negotiate that stuff and at least it's all in a language like either Ansible or Docker file or whatever you end up using to build that container image, at least now it's codified. Like it's at least codified and now you have a single. And then again, the next time that maybe you create a standard web application server that's Java based. Maybe you create a standard Perl image and a standard Python image. And then you have these experts in either databases or Ruby or Python help craft that image the way you want it. And then everybody that has a Ruby app uses that version of it. Or you can branch as necessary and you should use layers as necessary but you shouldn't think that just because you have layers like the entire problem of collaboration with other people goes away. Like that doesn't go away magically with containers. You don't just let the end developer build whatever they want into the thing and then just like do whatever they want and then have 50,000 different permutations of images in your environment. You don't want that. Like that's still a bad thing. So and then the other, the next thing that people always get confused about, they're like, well, okay, we'll just create 50,000 permutations, let developers do whatever they want. And then like how do you solve this problem? So say you create 50,000 different images all with like different versions of libSSL in them. And then one day the security team says, oh, by the way, we need libSSL at version whatever X, Y, Z. And you go, okay, well developers have fun patching your 50,000 different images. Like I mean like that's never gonna fly, right? Like it's never gonna fly in a production environment. So at some point you hope that you have this model set up so that the operations team and the different middleware teams, I call Perl and Python middleware which a lot of people get mad about but whatever, Java people really get mad about it. But I consider anything that kind of sits on the OS that doesn't exactly do anything yet that needs an app to run kind of middleware. Basically at some point you're gonna want like a cascading version of this where maybe you have 10 different types of middleware but when you update the core build all the middleware builds get rebuilt and all the applications get rebuilt. And that is really painful for people to understand. I mean but you have to crystal clear understand this. Like you have to be ready to rebuild this stuff at any time and it has to have cascading builds. So every time, okay so the question is would all of the tags get rebuilt if you change the core image? And the answer is no. So you would end up building a new tag. That new tag would be a new version, right? So like you wouldn't go back and pat, I mean you could set up an environment that would do that but that would be insanity. And the chances of making that work 100% is probably near zero. So imagine if you had a five year old application and you're gonna try to rebuild every version of that application off the new image, I'm almost guaranteeing that there are time constraints in there where things will get misaligned and they won't work. You would, the way I've done it is always just build the latest again. Like so build a new version. So like it'll be 4.0.0-135. You know how Red Hat does build numbers? We do like if you have bash 2.6.5-275 or if you look at the kernel it'll be like 3.10.7 or whatever. I don't know what it is right now. Dash and it'll always have a build number. I would just use a build number is what I would do. And then always roll the build number forward. And then this gets into the infrastructure that's necessary for CI CD. You have to be able to do that. You have to think through this before you can get to CI CD. It is something that OpenShift can do as long as you use build configs. So I have a demo, actually I was gonna do it in the lab. There's the one where I pull down a GitHub repo that I've built that shows this working. You use cascading builds. So build configs are an object in OpenShift which are another Kubernetes object basically. And they have what are called triggers and they interact with what we call image streams. And it's a lot for me to explain without drawings. But basically image streams are kind of a spider web and anytime an image changes it can trigger other things to happen. And it's a way for like event driven automation to happen inside of OpenShift. And so whenever the core build gets built it will send off a trigger say, hey core build's been rebuilt, go rebuild all the other things that depend on this core build. After all those things get built, all the applications that are built on top of those get built and you can cause this cascading wave of images to happen. Of course you would never want that in production. You'd wanna do that in a dev environment where you rebuild all the images, make sure everything works, have smoke tests, have CI CD, ready to run the app, run all the smoke tests for the app, make sure all those apps work. But people don't think through this. Like they think, oh I'll just move to container images and it'll stay static forever and it'll be fine. But that's not the case. In fact it's gonna be more painful if you haven't thought through all this stuff. Because you're gonna end up with a bunch of different images that are all stuck in some specific state for a long time and they could end up with security issues and two years later the developer doesn't even know how to rebuild it yet and if there's no tests you're gonna be in a really bad state. You're gonna be like firing up a version of the container, manually patching it and saving it and exporting it and trying it. But you're gonna end up in that same crafty way that you've had on the old Unix system that nobody wants to touch. I know your question so I'll repeat it. So he's saying, so if you build a new version of the core build, how will it, and you tag it with a new version and the developer did what they should do which is use a specific tag. So like I say it was 1.0.0 and you roll the core build to 1.0.1 and the developer has specifically used 1.0.0 and the cascading build will never happen, right? Well it'll happen, it'll just rebuild with the old version because the build config will trigger a new build and all those cascading builds will happen but you'll never get a new version. That is true, that is a problem so you have to think through that. So the core build will definitely want to use definitely want to use a tag to only pull the specific version from Red Hat that you want, right? But maybe the middleware team should use the latest tag and maybe the developer should use the latest tag to cause those cascading builds to happen. And so if they don't want it to happen then they can call a tag but then they're on their own, right? So now it becomes policy, you're like, okay fine if you don't want to use latest you don't trust the internal supply chain of software go figure it out yourself. Sorry, this guy's first. I don't quite understand the question. Yeah, I kind of understand what you're saying there is where, what image are you pulling? Can you give the actual example? Okay, is it something you pulled from external? Yeah, yeah, yeah. But, basically you're extending an existing image and when the base image change all your image is changing, is this the case? So you are using a base image, you are extending your container with your own data or whatever you want to apply then. And when this base image change you need to rebuild it, this is the change, this is the case. Okay, maybe. So I know what you're talking, I think it clicked for me, I think I know what your question is so I'll repeat it and then you tell me if you think this is it. Whatever configuration file they used in their Docker file they didn't deliver that with the Docker file. All you have is a Docker file and you see that they're pulling some external configuration file but you don't have access to that stuff. Okay, because that's one I've seen. We actually have that problem with software collections where we don't actually deliver to use some of the missing stuff so if you go to rebuild the software collections yourself you will be in a world of hurt because you don't actually have all the stuff that you need to rebuild them. That's a common pattern problem that I've seen. Yeah, yeah, no worries, we can chat offline too. So that's the end of that one. I mean that is images, right? So I tried to tackle some pretty deep problems but the idea is that this stuff is so easy to use but you have to go back to all the Unix stuff that you understand. The Linux stuff doesn't change. You still have a dependency manager like Yum, you still have package managers like RPM. You've now added new tools like Docker and Builda which is one that Red Hat has, Ansible, Container, things like this to build new images but all these business problems that you have which is basically how do people collaborate to build a user space the way you want? It still exists inside of Container Images. Sorry, go ahead. Yeah, yeah, in fact I can demo some of it if you want. Is it working fine? Oh, all right, sweet. Mine never started. Hopefully this will start now. Yeah, oh yeah, it's way faster. So now you guys get to see what it's like now. All right, see that's how it happens normally which is actually really nice. Actually, you know what? The problem is the question you asked is like what happens when you pull an image? I want to, I have it in a lab, I can't quite remember where it's at is the problem. I think it, oh actually I know it's not here, hold on. It's in, it's in like just walk through the lab two stuff so I know it's there. So this lab actually goes exactly through what I said. So okay, so the image isn't pulled yet but let's pull an image and we'll do a little experimentation on it just to show you what's happening. See how much faster that is? Oh, God, this is better. Don't all get in it. Yeah, in fact, if you get into Catechota, did you notice at the beginning the videos are right there? They're embedded inside. Oh, I don't think I added three and four yet. I think two's there, I think one and two are there but I don't think three and four, I will add them soon. They're, yeah, yeah. All right, so, oops, what happened? Oh, this is some weird thing I've been seeing lately. You gotta go back, I don't know why it's doing this. So long story short, so you can do a couple things. You can look at the history of an image with Docker which shows you kind of like what, this is basically showing you kind of what got created in the Docker file. And now, you're gonna find that with Red Hat images we do what we call squashing them. So you don't see all these layers created. We actually have a tool called Image Factory, I think is what it's called. And it basically takes the exact same thing that it does to create an ISO image and our VMDK images and our AWS images and it just creates a Docker image. And so it's squashed, so you won't see much. If I change this, you'd see a bunch of things. Oh, you know what's sad is I'd go through that, what you're asking it, like seeing the different image layers. I do it in this exercise and the challenge is it would take a long time to build right now. So like, you have to build a Docker file. You have to use a Docker file that has multiple lines so that it will create multiple layers. And then what I do is I show you the tree structure and I could show you that if it pulls one tag, if you pull one tag you'll see the thing go across and it will pull down all the image layers and it needs to get to that tag. But if you pull a different tag, like say you pull, if you pull tag like 7.4 or whatever and then later you pull 7.5, you'll see it pull some more information down because it's pulling different image layers to kind of traverse the tree down to a different version of the tag. Does that make sense? I can't really demo it because I'd have to do a lot of stuff to like really show you it. That was your question, right? Yeah. All right, I'll get through real quick. I will try to go through the third one. That was as far as I was gonna get anyway. It was number three. And then you're always, I mean this is all interactive so you can do it yourself. Yeah. When you push, you push everything. Well, you push all the image layers that you've created. So when you, think about what pushing is. So pushing is pulling an image down, pulling a tag down technically, pulling all the image layers that it's necessary to create that tag actually. Then you add some stuff and then you only push those differences back. And actually you don't even have to push those differences back, you could push those differences to a completely different repo. You could actually re-tag that image. So tagging in Docker actually changes the name space and the repository name and the tag name. So that's actually something that I just thought through for the first time ever. The word tag, the directive when you create tag, again, these are all arbitrary definitions for things. But the Docker tag command allows you to re-tag that image with different namespace, a different registry server. You can literally re-label all that stuff that I showed you. Like I could show, you know, if you pull down an image, you could like do Docker tag this thing, then give it another name and change the registry, the namespace, the repository and the tag. And then you could push it somewhere else. And then it has to push all of the layers. Yeah, you can, because I've done it. So when you pull from one registry and push to another, you pull it down, you re-tag it and you push. But it pushes all the exact same layers to this other registry server. Exactly. Does that make sense? Yep, exactly. There's a lot of these defaults like that that are ambiguous that you don't realize until you go to do it. Another one that is really common that I do in the lab is the name resolution. You could type Docker poll sent to us. What happens? It uses the default namespace. It uses Docker.io and it pulls down the image. That's easy until you go to do something else. How do you set the default for your own registry? Better than that, if you pulled the namespace slash the repository, it actually, there's problems where actually it won't find it. Unless you, normally it would pull the latest, but it doesn't know to do that because now you specify the namespace and the repository. And so then you have to actually specify latest. And so there's all these weird arbitrary resolution problems. So that's something I forgot to tell you guys. Just always use the full URL because if you don't, you will end up in a world of hurt at some point. It will cause you pain because you won't know exactly what you got. It's really nice for Docker polls when you're playing, but when you go to build something real, use the full URL because you will end up in URL hell. It's in the lab. I show you the different order where things break. It's insanity. It's like DNS that doesn't work right. You're like, have no idea. You're like, well if I, you know, DNS is very specific and you know exactly how it's gonna resolve. This doesn't have that. That's a very good point. And the same is true with tagging. Tagging can be the whole thing or it can be just a tag or it can be the repository. In fact, I'm guessing you'd end up in namespace resolution problems if you don't specify the whole URL. I would guess, just taking a stab off top of my head. I wouldn't even try it. Let's put it that way. It's dangerous. So this is the next step. So I showed you the first step, right? I dug into user programs and interpreters and what's inside the container image. But now I dig into the system calls in a kernel space, right? So now we're going down into the container host. So you can really think of the top two gears as the container images and the bottom two gears as the container host. And really like, it doesn't really change because of containers. It's the same tools and tool change you're using. But I always call it, the container images are fancy files and the processes are just fancy processes. So if you do a file open with a process, right? Like if you write a bash script, you open a file, you cat a file, whatever. Cat uses syscall open, you know, then does a read, whatever the heck, I don't even know what syscall it uses. You could trace it and watch that. You know, but if you do it in a container, it does the same thing. So if you like cat slash etsy red hat release or etsy hosts, you know, it's doing a file open on that, reading all the data out and then putting it, you know, basically writing it to the terminal. So like, so like that happens, same thing happens in a container, right? The only difference is you've now created with a different SE Linux context in a C group with a different namespace. And what people don't understand a lot of times is that we, this came up very crisply in the birds of a feather we did two days ago is the clone syscall allows you to choose which namespaces you want. When you use Docker by default, it uses all the namespaces that it's configured to use. You know, so it will use the process ID namespace, the network namespace, the, I forget what the one is for time and, you know, and network, I'm sorry, local host name or host, there's a, sorry, was that? UTS, that's right, it's the weird one, UTS. So the UTS namespace. But you can turn each one of them off when you do a clone syscall. So you can create a container that a container, again, what is the definition of a container? You can create another process that is only in the network namespace of that container, but isn't, isn't limited by its C groups, isn't limited by its SE Linux context, isn't even in the same process ID namespace, so if you do a PS it'll just show everything on the box and you realize, oh, wait a minute, I mean a container's just in my mind, that's not a real thing, right? Like it's just a construct, it's a user space construct, but yet we use the word as if it's real. So really it's just a fancy process and you can decide how thick you want this around this, right? You can isolate this thing in different ways. You can isolate just the network, not the network, just the process ID table, not the process ID table, and there's a bunch of different namespaces they can use. I don't remember how many, there's like seven or eight or 10 or something. And then, so when you start multiple processes, this is what it looks like, right? So this is the global PID data structure, right? So there's a process ID table inside the kernel. And if you think about what a process is, it just adds another number to the process ID table. When you do PS, it just shows you all the information in that process ID table. It's just like doing stat on a file, right? Like it's just dumping the context of the contents of that process ID table. And inside of a namespace, a process ID namespace, it just creates another index. It's just a different list of information that's separate from the global process ID. So it's like no different than a file. Like you understand the difference between having stuff in this file and in this file, right? Like it's just essentially a sandboxed version of the process ID table. And then in a Red Hat world, we use C groups and Sverts, Set Cop, SE Linux. We use all of these things to like add further isolation. But again, those are arbitrary constructs. They're not, that's not a standard. Not every Linux distribution does that. Some Linux distributions use app armor. Some don't use anything. Sometimes they use some things like SE, you know, LXC, LXD, Docker, Rocket. They're all left to choose which ones of these technologies they want to use when they create a container. You know, there's no definition for that. And one last thing I should point out, the execs VE. So like if you do a PS in bash and you strace it, that's the system call that creates the PS or the sub command. And you notice like if you do a PS, you know, like it execs into that PS and then it returns. So it execs VEs into that and then it returns. With a clone sys call, it creates, you know, in this one, you may just have a process ID that in this particular example, I'm only showing that it creates a process ID namespace, you know, and then that clone sys call, this would be like a, this would be like a little block of C code that you wrote yourself that just creates a process ID namespace to kind of show how this works. And then this would be like something like a Docker container, right? Like we're like this one, it would create all these namespaces, you know, so these are all the global ones and then these are all the namespaced ones. So you see, you can create a namespace around all of these things. Does that make sense to everyone? You kind of get to pick and choose the different data structures in the kernel that you want to basically virtualize at that moment as you start the process. And in fact, something people don't know with a clone sys call, there's actually no way to list easily all the namespaces that are on a kernel. Like you can't just like get namespaces and it just like shows you all the namespaces that are running, that doesn't happen, that doesn't exist in the Linux kernel. I tried to mess with Eric Biederman and understand how or what. As a Unix admin for a long time, it made sense that I should just be able to see the namespaces. Like that seemed like something that I would, it seems like it should be a data structure, right? But it's not. There's not a concept of a container in the kernel, so there's no way to just list all of them. There are actually namespaces that get created, it's just, and you can actually add, like you can add another process to that namespace. Like you can actually add other processes. So what people say is like, well how do I get inside of a container? Well you don't actually get inside of a container, you just add another process to the same namespace as the other one. And so like that's a really mind-blowing concept for a lot of people. So when you do a Docker exec or a Kubernetes, kubectl exec and get into the container, you're actually just starting another process with a clone sys call and adding it to the same namespace as the other one. And then you just happen to be in that namespace. But what most people don't realize is you can actually use programs like NSEnter, so namespace enter, and only enter part of the namespace. So like you can like just enter like the network part and be able to do network traces and things like that. But not be limited by its C groups, not be limited by its SE Linux rules, not be limited by its process ID, you know what it sees, and you can see other process IDs and things like that. And that's pretty mind-blowing for a lot of people. Yes, it essentially just creates another process in the same namespace. And in the Docker world by construct when you do the exec, it chooses all the same namespaces because that's the thing that's the most logical as a human being, right? Like when you want, but you can actually disable like with a clone sys call, it has the potential to enable and disable different things. So like NSEnter for example is more granular and you can control which namespaces you would enter. With a Docker exec, it just enters you into all of them. But that's by construct. That's not necessarily, it's not necessary, you know. And so then here is a full rundown of what it looks like. Cause again, I think people have forgotten this stuff. So when you cat, you know, when you cat a file, you don't even think about it. You're like, I just cat a file, blah, blah, blah. I've forgotten the hundreds of lines of code that, you know, whatever cat is to do this. So actually probably you're probably exacting more than that. I would imagine there's probably more than hundreds of lines when you get into the file system, the VFS layer and all this other stuff that's happening. But this is what it looks like, right? You do an open sys call. Open sys call talks to the virtual file system layer. Virtual file system layer has a driver for XFS. XFS has a driver for whatever the block devices and it finally accesses the blocks, right? But there's nothing different with a container. The only difference is that the mount namespace is virtualized. So now the list of mounts looks different than the Unix system itself. It's a virtualized list of mount points. And it just happens that Varlyd MySQL in the container is mounted on, you know, some other volume. But it still has to go to the VFS layer, the XFS layer and the block device, right? So like it still uses the exact same storage subsystem in the kernel. There's really no difference. It's just a fancy process. And I remember, I will admit that like a few years ago, a couple of years ago, I had a crisis because people asked me these crazy architectural questions and this drawing didn't exist and I didn't have it in my head. And I was like, how does that work? And we all end up in these crises because there's this around storage and network. And this exact problem is true across all the things that containers use. So like network, storage, RAM, CPU, like we have these crises where we're not quite sure how it works. And then you kind of have to go back and be like, wait a minute, let me think through this again. I know how processes work in a Linux system, so I know how this works. I just need to make sure that I think through it properly and explain it to people around me. And then I go to the full bore and show you. So this is what happens on. So these are two different REL7 systems. This is another one of these ones that I think people forget. These are image layers. And you pull basically when you do, oh, this actually will hopefully answer your question. I just realized I have this drawing. I forgot. So say you pulled down like MySQL, right? You're doing an H2PT connection to a registry server. You pull all the image layers down. They get pulled down, they get cached on the host, right? The host then smashes all those layers together using what's called a graph driver. The graph driver either lays them out on, there's two main ways it happens in Red Hat Enterprise Linux. And there's honestly, I've probably argued two main ways that the universe is doing it. There are a bunch of drivers for graph drivers, but the two big ones are probably Overlay2 and DeviceMapper. And those are the two that Red Hat uses. We're moving the default onto, our default has historically been DeviceMapper, but we're moving to Overlay. And so basically long story short, there's a couple different ways in a Unix file, in a Unix or in a Linux system to basically map a bunch of image layers into something that looks like a single directory. Because at the end of the day, that's what you need. You need something that looks like a single directory for Run-C to be able to go fire up a container. That's basically what's happening. And so that thing that takes all those image layers and maps it on a disk and makes it look like a single directory, that's a graph driver. And so this word graph driver never made sense to me and I had one of these crises and I had to go study container storage and understand what was happening. So pulling the image layers down is one thing. That's a library that needs to know how to do that. And then that's container slash storage, which is a library that like Builda and Scopio and all these Red Hat tools use. And then the graph drivers are what exploded on disk. And then what people don't understand is you explode them on a disk as read-only layers. And then when you look at an overlay file system and you look through it, it's read-only, right? But if you create a file in that file system, it creates a copy on write layer. And what most people don't understand is that this layer is always read-only to disable this copy on write layer, you have to pass Docker the dash dash read-only flag. And most people don't do that because things break in random ways when you do that. And so I had a guy yesterday asking me this crazy question. He's like, we're having this problem with metadata not being fast enough inside the container. And I'm like, why? And he goes, he goes, I don't know, we're writing a bunch of files. It's like 50,000 files that uses Yachto or something. And I was like, and it even took me a second for me to click and I had to snap back to like what I tell everyone, think through the Unix like basics of this stuff. And I'm like, but I'm like, you're bind mounting it, right? Like you're writing through like as he's like, no, we're writing into the container. I'm like, well, it's using a copy on write layer. So every time you create metadata, you have to like basically, you're making a change this copy on write layer, which is way slower than if you just did a bind mount. So like a bind mount, this basically is the same path that you would use if, this is the same code path that you would use if it was a regular process, right? But like in a Docker container, when you use a bind mount, you're essentially creating a mountain name space that then makes it without the dash dash read only option, you're now creating this copy on write layer. And this copy on write layer is slow by default because we're trading off slow writes for convenience and branching. And so now if you run four versions, again, I'm showing three versions of this container, they only have one read only set and a bunch of copy on writes. And so we're now saving space and making this easier to use, but it's much slower. So does that make sense? Go on, I don't know who was first, but I'll go. Okay. That is correct, which leads me to another thing that people always, so your question, what your comment slash question is, if you use this bind mount, and then I run a container and then I think like I just push, you know, I create a new version of the container, I push it to a registry server, that data is still stuck on that node, right? Like it doesn't go with it, which is a good thing and a bad thing. The problem is is now, you need to think through basic DR for that bind mount. Like it's the same thing that you've always had, which is, and I'm actually working on a talk for this, it's basic DR recovery. It's transaction replication, file replication or block replication. There are three options. That is it, that's how the universe works. It's basic UNIX, you know, again, you know, file replication, block replication or transactions. And so if it's a MySQL server, it should be able to know how to, you know, copy its transactions over another server. And so then you shouldn't care because if you're running a container here and a container over there and it's doing a replication, you don't have to worry about the container layer. But if you're doing file, file replication, maybe you use something like Gluster with geo replication for those bind mounts. Okay, now I can ship that data off asynchronously, great. Or maybe I'm using SRDF with an EMC system and it already does the block replication. So like you still have to think through that stuff. Hold on, before you, he had a question. Okay, so your question is, and again, it's a, and this is exactly why this stuff's so hard, cause like it's hard to even ask the questions. You're saying, so is this the last layer of an image? So like if you pull down a container image, is this the last layer? No, the read-write one. The read-write one, the white one. Yeah, I'm sorry. The short answer is no, kind of. So like it's not the layer. In fact, this is an ephemeral layer that gets deleted by default. Yes. So like it's gone if you do nothing. Now with a Docker save command, you can actually export that. It does become a new layer. And then when you push that, if you tag that thing and push it, it will become the next layer. And so you can do that, that's correct. And I think we're getting kicked out. But that's it anyway, so we're good.