 Hello everyone and welcome. It's my pleasure to introduce Scott McCarty who's going to tell us everything we need to know about how a container engine works. As soon as he finishes slides. Alright let's see if it updated. Nope, you know what we got to do. We got to make sure that yep it did nice. The cloud. Alright how are you guys? Alright before I start I want to kind of gauge the audience. So how many of you have used Docker? The command line. Good. Alright how many of you are familiar with OCI standards? Decent amount. Alright how many of you are programmers? Alright that's good. Alright so we have a pretty good technical audience. Alright this will be good. I just want to kind of understand where people are at so I know how kind of deep and how quickly I can speak because I have a tendency to start speaking very quickly when I get excited. So I'm going to jump in because we have till what time do we have until? I'm good at winging this stuff right? 35 minutes. Alright so we have 35 minutes we're going to rock and or roll. Alright so I designed this talk around some premises that are typically wrong right? So I hear people say this all the time like you know I just run Docker so I can just run my containers and then you will see like all these architectural drawings on the internet that basically shows like containers just running on the blue line. You'll see like a blue line smeared across the top of like container hosts one or multiple and then you'll just see you know containers wrong so just start from scratch it's not right. And then sometimes you'll see another drawing I didn't put on here but if you look at this one this is an example of a Google search where there's a bunch of wrong ones that way. The other one you'll see is you'll you'll see just you know just a bunch of containers running on the host with nothing else and that's kind of like 1980s containers is what I call those that that also doesn't tell the whole story so I'm going to try to like explain it from the ground up and actually explain how it actually works. So that one you'll understand it better and two you'll make better architectural decisions. So because I've answered so many crazy questions that I start to understand where people's black boxes are and let me point out one other thing since you're all programmers and many of you are abstraction is good right it makes our life more convenient but still understanding what that abstraction does is important for architectural choices and making the right choice so that's why I think this is important. So start with processes versus containers what is the difference how many of you feel that you could write this on the back of a napkin right now raise your hand not that many. All right Dan could I know Dan could nail and could I know I'd probably Thomas could but a few people could but not everyone and the reason why is because there's really no difference in a lot of ways. I mean so so in this drawing I try to show like there's one there's one structure in the kernel that tracks processes in its process ID table and every time you add a process the ID you know gets incremented and another entry gets added that's it there's nothing else in the kernel there's no other concept of like there's no flag in that process ID table that says this is a container or this is not a container there's no that doesn't exist it's just another process as far as the kernels concern now there are other pieces of technology that get turned on in the kernel whenever we create what's humanly called a container but container is a human slash user space which I'll dig into concept that is defined at the human user space level not in the kernel per se which and I'll show you what I'm going to do is I'm going to walk through certain operations and then show you which pieces of technology get turned on in the kernel so hopefully you'll be able to walk away from this with like a very good understanding what a container engine does but and here I tried to like boil this down you know container engines are really one technical implementation which both provides a methodology for how to create containers so like if you think about the Docker run command that's a methodology for how you go and create a container but it also implicit in running that command is the definition of what can you know what technologies in the kernel will get turned on when that containers ran and so it's really you're kind of you're kind of accepting two things at the same time so just I'll dig in deeper maybe it'll make more sense hopefully so you know this concept of a container engine is new that's like 2000 what 413 ish you know concept so it didn't really exist if you look before there were libraries to create containers but it wasn't really the concept of an engine and so I like to refer to Docker as like the biggest proof of concept ever invented it's a really good proof of concept and it's interesting and it was probably the only way to sell us on actually using this whole thing was to kind of see it all working together you had to see all the container images sitting in a registry see that you could pull them down and run them with one command kind of understand how to interact with a container engine which then goes and pulls those images but you were buying into a lot of stuff all in one big like you know proof of concept essentially if you think about it you were kind of in a walled garden where you're like okay now I see how this all can work because I was I was I was essentially demoed it in a nutshell like I was given an API a command line registry servers pre-built images all of these things as a proof of concept that all work together but today we need to break them down into separate pieces to do a lot of other stuff like an example is a perfect example is a couple days ago I was working with a customer on a Hadoop thing where they were trying to use a container executor with yarn as the scheduler that's a scenario where you don't really want all the complexity of the as I called the docker proof of concept where you don't want client server interactions happening to a demon to then go fire off you know talk to some other demon to then talk to you to basically fire off processes and I'll get into that but but in a nutshell I'm gonna break this all down so we can see all the moving pieces inside of this giant proof of concept and then I'm gonna explain that drawing on the right but I'm not gonna explain it right now because it'll be too much for you and then the other thing is is there are other alternatives and this was the side that I was working on because I realized I left it out of my story don't ask me how you know everybody's used docker but there are alternatives right like cryo is a container engine that is used for inside of Kubernetes and then podman is a great container engine that is used outside of Kubernetes for firing up single node you know containers and pods and so think of pods as multiple containers living in the same name space on the same network accessing the same storage that kind of thing but like I'll dig into kind of how but this talk is actually going to dig into generally how all container engines work what they all do what they have in common and like essentially kind of understanding the nuts and bolts of what's going on underneath so another concept that I want to kind of tackle first an assumption is you know thinking about the container engine versus the container host if you really think about it the container host is the container engine because there's all these things that you need to think about and when you think about especially in the context of Kubernetes I think about the container host as the thing not the container engine I don't really want to swap out the container engine because it's literally like changing an engine once the cars been shipped it's there's a there's a lot of engineering that goes into putting that together like you don't swap a Ford engine into a you know Chevy car after it's been shipped like that doesn't make sense and and I'll dig in a little bit why and the last slide in particular kind of highlights where it goes beyond just the container engine into the kubelet and things like that and pieces in the kubelet actually talk to the kernel and so there's actually a wider ecosystem of software that's talking to the kernel not just the container engine so that's the one last assumption I want to tackle and then a short commercial break and I will also include her Vashi's talk in this but although I didn't look up when it is but like there's another talk that goes deeper into Podman tomorrow by her Vashi and someone else Dan who else is it nailing I don't know who Sallio Malley so like check that out if you want to dig deeper into the specifics of Podman as another container engine and then tomorrow I will be digging into the container standards around you know essentially what makes all this work and why we can have three different container engines and they and they're not guaranteed to work there's nothing in technologies guaranteed to work but the reasons why this will work so long story short I won't dig deeply into that because it's too much to cover in one talk but but I wanted to at least kind of do a commercial oh yeah by the way my Twitter handle so I'll shamelessly plug myself is at the bottom of every slide so you're you're welcome to follow me all right that was a bad joke sorry so all right so what this is the main drawing that we're gonna tackle so we're gonna look you know at all these different components within and we're gonna kind of walk through this is what most people that raised their hand actually let me ask him how many of you's pod man how many of used cryo all right so so this is good so I start here because I think most people have used Docker and they kind of understand even though a lot of people have used Docker though they don't necessarily always understand what's happening under the covers and so I kind of show here you know it's not just Docker right it's container D Docker detox the container detox to run C which then talks to the kernel of fire containers and at some point all of these technologies get turned on in the kernel and so if you really think about it's the kernel run C Docker Kublet you know this whole stack so if there's some feature that you turn on when you're like scheduling something in Kubernetes say it's like a privileged container privileged container you know if you're telling the security context to run on you know privileged you essentially you think about it that has to be supported from the master API and Kubernetes to the Kublet from the Kublet to the engine from the engine to run C and then run C to the you know to the kernel and whenever you make changes to to an API and you add a new command line flag for example it has to be supported all the way down that stack and so really we like to think about the whole host as the container engine because if any one of these doesn't support something it pretty much doesn't work and then here's a simplified drawing a joke so here's cryo so I asked if anybody had used cryo well you wouldn't really notice it in the bigger context of things probably in like a Kubernetes environment because it just gets rid of a box so we get rid of that box so get rid of container D and Docker and we merge them into a single container engine that understands a protocol called CRI so that the Kublet can talk to the cryo demon and then the cryo demon calls run C which then talks to the kernel and then you go on but it's simplified stack a bit so if you haven't checked it out and you're running how many of you're running Kubernetes so not that many yet all right so that's good well yeah those people that raise their hand go check out cryo all right so I tackle this because I think it's been lost in the ages I how many of you understand the difference between user space and kernel space a decent amount all right that's good this makes me feel good about life a lot of the times I I've had I've done this in a crowd in like one person raises their hand I had one time there was like 300 people in a startup conference and like one dude raised his hand and I was like what do you do and he's like I teach operating systems I was like all right we're done here and so I was whiteboarding stuff on the wall showing people stuff all right but in a nutshell the pieces we're gonna tackle like kind of the container engine lives in user space right and anything that it does it needs to call into the kernel to make that happen so in a nutshell think of a system call as a special function that the kernel handles as opposed to other user space code that you wrote yourself handling so like sometimes you write your own functions but many times like people if you do a file open you didn't implement the file open that goes and like you know goes to the VFS layer and the VFS layer driver that then talks to XFS and blah blah you relied on somebody else to do all that and all the role-based access controls that are in play there to do all that that's all handled by the kernel for you so you don't have to do that stuff that's all a system call is so now let's talk about how like links processes are created so the two most common functions or system calls that you would use our fork and exec and so like if you've ever done a system command in python or pearl or ruby or whatever like you're essentially doing an exec or a fork there you're basically firing off another sub process that goes and does some work which I've done nasty stuff like that in python where I call out to bash and I do all kinds of nasty stuff in bash because I'm lazy and then it comes back and then I get the results essentially fork and exec are the two main ones and so in this example I show the user show you know you type a command into bash bash forks or execs typically execs and then and like so if you run a top command you know or a ps like it execs into ps and then returns and so like you everybody kind of understands that's how you run a process and in bash basically and in that scenario you know you turn on some pieces of technology right you have access to the t-speak stack you have access to the VFS layer XFS you can like write files you can read from files you know in bash you kind of understand all that right like so that's a regular process and then here's what what I would argue is the magical place where it becomes kind of a container or the piece that one of the foundational technologies that really allowed the docker thing to happen essentially is the clone syscall so there's another system called called clone and it's a special version of fork and in that clone syscall you pass it a bunch of other flags which honestly most people are not used to doing but each each of the few essentially pass it what we everyone's probably heard of namespaces so namespaces are like the hostname the the net you know the net namespace the process ID namespace there's all these sort of virtualized data structures that can get created when you fire off the sub process and essentially what you're doing is carving off just like virtualization carving off like a sort of little piece of the kernel you know a copy or a reference to these you know the actual global namespaces like so if you think about the process ID you can create a separate process ID table that's virtualized but it still points back to the global process ID table and so like process ID one in the container might be process ID 527 outside the container and so even when these things are virtualized there's still a real representation of them in the kernel hopefully does that make sense to everyone all right so then here's what it looks like right so when you do you know with a with a regular exec VES call you know you a process ID gets added to the global you know namespace it uses whatever UIDs and GIDs are in Etsy password you know the net will be the same as the host so we're all we're all used to running like Apache on a host right and it uses the global you know namespace for all this stuff so essentially like you know it's just normal that's running a process in a normal way then if you run it in a namespace so like I do this here if you instead of catting Etsy hosts up there you do it inside of a docker container you're gonna do it in a virtualized place right so the process IDs are gonna be virtualized the UID GID can be pro you know can be virtualized the net can be pro you know virtualized although it doesn't have to be each of these is optional because these are optional things that get passed to the clone syscall and so this is kind of the first step where you go okay now I kind of get to understand what a container is this is the first step in the definition of like what a container is and then I think it's useful to take a look at this because this is two different containerized processes running in two different namespaces but I think it's important to notice that when you do something like a mountain namespace it's still relying on these drivers in the kernel so like the virtual file system layer XFS driver the block you know driver that like reaches out to say an ice-cozy volume or to a fiber channel volume or something like that there those are not virtualized right those are shared code so like as soon as as soon as you do a file open inside the container it's still relying on all those underlying subsystems in the kernel to go do with the standard work it does and those are not namespaced and so so it's important to understand it's not virtualization right within within full virtualization you have a separate running kernel in each virtual machine and they have their own copies of you know all of that stuff so they are fully virtualized all of that stuff up there would be different between VMs but it's shared within a content with containers so hopefully that kind of gives you some aha moment I see people shaking their heads so that's good so alright so that's that's step one is a clone syscall right and so I kind of start there at the kernel and kind of build up so now I'm going to kind of go up a layer and talk about container runtime so again I won't dig deep but the but the open containers initiative defines what a container runtime is and it's an open standard and then there is a reference reference implementation called run C which is the most common container runtime Docker uses it cryo uses it podman uses it build it uses it everything most of the things on the planet that are written use it because it's a very extensible and well you know it's supported community driven open source project that's kind of managed by the CNCF so actually by the by the open containers initiative which is part of the Linux Foundation I CNCF but but long so short it's a community driven project that everybody uses now other people implement their own container runtimes which will dig into a little bit towards the end of five enough time but there are other compatible OCI runtimes that are like catacentainers or gvizor other things but in a nutshell what a container runtime does is it expects a file system mounted in a directory and it expects a so it expects like a root FS so it if you were to CD into that directory it should have slash Etsy and slash user and all the things that you'd see a few SSH into a server and then it expects a config.json I still have this wrong in here it's not manifest.json I fixed this in another drawing I forgot it in this one it's actually called config.json and essentially it's a json file that looks if you were to tease apart that config.json it looks very similar to the command line options that you would pass to Docker so it's got like a CMD and an entry point a whole bunch of stuff but things that you might not see like sec-comp rules and you know se linux things and things like that so like there is there is like a lot of things that come default in the container engine that gets stuffed into that config.json then they get passed on to run C which will dig into in a little while but now you notice in this side now in this in this drawing I show you look namespaces have been turned on because that was with the clone syscall right but now we're turning on things like se linux cgroup sec-comp capabilities and so now we're starting to see more of the kernel technologies get turned on by this runtime right and what we're doing is we're standardizing the way that we talk to the kernel to turn on these technologies so now we're really starting to see the the formation of like a definition of what a user's you know container is in user space and so run C helps us do that based on that OCI standard and so that entire set of config options that you can pass into that config.json is pretty much the definition of what a container is and that is when we say the words containers colloquially we're essentially referring to that in a nutshell so then it looks like this right so it's more than just the clone syscall which I show here in the gray box that's kind of the clone syscall but then we turn on C groups se linux s for sec-comp all these other technologies get on you know get turned on and then we start to say okay that's really a container that's kind of what I think of as a container when I think of a container is that bottom thing right the normal exec v that I showed you earlier you know all the none of the things are contained this one the syscall gets contained you know essentially by these name spaces and then we also turn on C groups to limit you know resource constraints of CPU memory constraints things like that so think of those as resource constraints so that's to prevent noisy neighbor problems and then sec-comp as for I would call these more discrete controls that are mandatory in nature so like sec-comp is think of it as like a firewall for syscalls so you can block certain syscalls or you can whitelist other syscalls and then S vert is a way to dynamically generate se linux labels which I have a whole thing I go deep into that but but think of it as preventing data structures within the kernel from talking to each other so you know you can say this process can access these files and this socket and that's it and like so it's a way of discreetly limiting dynamically you know each container gets its own dynamically generated label and they can only talk to other data structures with that same label so now you're starting to see a lot more powerful isolation form in this definition of a container beyond just the clone syscall and then you know this wasn't the beginning of this though like if you look at like the docker definition for a container that wasn't the beginning right there was other things there was libvert there's LXC there's and you know system DN spawn clip container all of these have their own definitions and their own user space like command line options or library based you know function based options you can pass to them to turn on containers that basically determine which pieces of technology get turned on so all of these have their unique definitions and now they've all kind of standardized on some of these similar things you'll see like you'll see like se linux and see groups and you know set comp and all these things are pretty common technologies that each of these define but the permutation of which ones you use is not you know is not standard essentially does that make sense to everyone alright so now we build up to the container engine level this is where the meat of it is right so this is where what does this do well the container engine itself provides an API in a nutshell and then it's able to go pull container images and then it prepares you know the configuration to pass it to you the runtime which I showed you I told you the runtime expect a directory with you know the full file system in it and a config.json and then you basically call run C with those two things it will go fire off a container in the way it should but the container engines responsible for following that container image down decomposing it pulling pieces parts out of it which I'll dig into deeper you know creating that root FS then handing it off to run C does that make sense every container engine has to do that whether it's podman cryo docker it doesn't matter even build it because it has to basically fire up a container to then add stuff to it any anything that builds or runs containers basically has to do this piece so okay so now what is what does this look like in action first so like providing an IP API right so in the Docker world it means the Docker D so if you think about Docker D it's really what's providing the API so you know if you if you connect to the Docker socket and you pass it you know API driven not using the CLI not using Docker CLI but actually connecting to it with say Python or something and then programmatically you know interacting with it you're essentially using that API now the command line also talks to that socket to go talk to that API there is something called a CRI shim that also talks to that API that works inside of Kubernetes so if you kind of look at the most common way that Kubernetes is set up today the kubelet talks to this Docker shim using this protocol called CRI then that Docker shim talks to the Docker D API so you're translating between CRI and then the Docker API then the Docker API you know the Docker API demon Docker D is talking to container D which is then firing off copies of RunC which you know passing it that config.json on those directories to then go fire off containers and so that's kind of what I'm trying to show here like what a system looks like in real life fired up you know this would be a containerized node you know a container node inside of your Kubernetes or open-shift environment so then the next thing that it's supposed to do I mentioned it's supposed to provide an API pull images and then you know prepare those to hand off to RunC pulling and caching the images is often uses you know root level permissions things so so if you think about file system operations you know like mapping to overlay layers or mapping to device mapper layers people don't realize that when that container image gets pulled down it typically gets mapped right into the file system so it's again one of the world's best proof of concepts like all this stuff is hidden from you you didn't realize it but it was actually doing all kinds of funky stuff under the covers it's actually doing like root level operations to go map these container images into the file system caching them locally and then preparing them so that they're ready to run later essentially and so that was happening when you do a Docker pull and most people don't realize that they think it's just like a file you pull it's living in the file system but it's not that simple and that actually that hung me up a while back like I didn't realize you know like that that was happening and so here I show look graph drivers getting turned on you know you know X overlays getting turned on so you're seeing you're seeing the like this file system caching layer operation is actually using things in the kernel to actually you know decompose and map those container image layers to file system layers and then you know then preparing the storage for runtime so this is kind of second phase of that right once we've once we've cast the container image locally and it's mapped to these file system layers be them device mapper or overlay to this thing called a graph drivers what does that but then we also have to prepare for in this you know in this example on the left I show my skull because it's something that everybody should understand if you run a container in read-only mode that cow layer that I show there will not be there but if you don't run it in read-only mode which most people don't run it in read-only mode that cow layer will be there and that cow layer is what handles if you say echo you know hello world into slash etsy hello like it doesn't fail in a container right like all of you have gotten into Docker and played around and interacted with it in a shell you can write files it seems normal but really what's happening is you're writing stuff into that cow layer and when you Docker kill you know and you're basically that cow layer just sits there on disk and then when you do a Docker RM it deletes that cow layer but then if you do a Docker commit it commits that cow layer as another layer to that image layer and then maps it in the file system and then you can then becomes part of the container image if you want then you can ship it back off but a lot of people don't realize that's happening and I've had for example questions where people were I have an anecdote that I love to talk about with this one where a guy came up to me at a conference and said well we're building Yachto Linux in a Docker container and it's super slow and he's like why and I'm like are you buying the mounting the data you know through to like a layer and they're like no we're just building in the container and I'm like well if you're compiling a Linux distribution which has everyone here compiled a kernel or at least how many people compile kernels anymore raise your hand if you've done it alright so enough people have done it that you understand there's a ton of file system operations right when you compile a Linux kernel and there's all kinds of you know metadata changes and things like that like I mean so it's a slow operation when you're building a Linux distro so they were doing it in the cow layer so like every single one of those file system operations was was basically writing new data and doing it in a cow layer so of course it was slow so I'm like we'll do it in a VFS layer you could do it on ice because you know be faster than doing in this cow layer even though it's a local disk and so long story short they did and it was faster but but if you don't understand that this is happening because it's again a black box you won't know what to do you'll you'll just do it wrong because you're like oh just I don't know magically happen you know handles it for you but it doesn't know until you until you pass a volume into a container engine to tell it hey bind mount this thing externally like put var live my show on an external name put slash var yachto or wherever the heck it builds I don't even know where it builds you know where it builds that but if put the build route for that kernel and for that entire Linux distro because they actually beyond it's beyond just building a kernel it's building a whole distro so it's laying out the file system setting all the permissions doing all the things that a file system if you've ever done get to you kind of get a feel for this but you know building a Linux distro is doing a lot of things and you don't want to do that in a cow layer because it's gonna be super slow all right so now they've ranted about that hopefully all of you will use bind mounts from now on and then just run you know all the other thing I say it's just running in dash dash read only then you can never have this problem if it fails you'll know that you need a you need you'll know that you need a bind mount all right so now digging deep into you know essentially the container engine quickly so if you think about what a container engine does it takes CLI options from a user often through that API it combines those with defaults that are set up in that container image and then it adds them to you know defaults that are in the container engine and then it creates that config.json which I mentioned it creates a root of s from all of the layers in the container image plus a cow layer so you know it adds a cow layer and then it and then it basically fires up that container and so if you think about the directory that gets passed to run see it's got a cow layer on top so you can write stuff into it so it seems writable even though it's not and then it's got a conglomeration of all the user you know I'd say defaults that come in the image overridden you know and then defaults in the container engine and then finally overridden by user options so right you can tell it there so the default for example if you run a container in Docker you know it's not dash dash privileged but it's not dash dash read only either so if you as a user you tell it you know dash dash privileged and dash dash read only which I think you can do I've never tried that actually it would override a whole ton of things that are in the container image and in the container engine it would disable se Linux would disable net namespace it would disable these things and and so like you essentially think about the user has a lot of control over a lot of these different options of how that config.json is going to get built and then ran does that make sense to everyone because this is the money all right and then I make it more complex so I had realized that as I was explaining this to people I had left out actually CNI and CNI does a similar thing for network right so there's a CNI config blob that gets generated it's actually quite similar so if you think about what network you connect to or what ports you map things like that there are some defaults that come in the container image there's some that come default in the container engine and then there are some that overridden by the user and so if you think about networks really very similar except that we pass it off to CNI to go do that and then there are these binaries called CNI plugins and those CNI plugins expect environment variables and a config you know blob to be passed to them and then they know how to go configure again in the Linux kernel these CNI plugins talk to the Linux kernel to configure the network in that network namespace does that make sense to everybody it's a separate program that does it run see doesn't do that piece of it that's where the CNI plugins come in and there's really a couple different binaries are working together to create a container so does that make that because this one's pretty much the full money all right so and then as a kind of a final you know I kind of I mentioned oh and then in this one I I show that that's kind of a separate thing but like I did oops so I wanted to show in this last one see how I turn on IP tables the final frontier I do a demo here where I show like in Kubernetes if you scale a pod you know from like one or if you scale a a you know from one pod to like ten pods you'll see it go from 13 IP tables rules to like 40 and then if you scale it up to 100 it'll be like 400 and then if you scale it back down to one it goes back down to 13 and you're like that is a ton of beating on the Linux kernel like adding and removing IP tables and then also if you've ever had some real-world experience for this if you run a production node for a while you will see nasty stuff happen to like the IP tables config because it just gets all kind it gets beat on really bad and so this is kind of my final argument for like you should really treat the entire host as the node in an orchestrated environment because they're all you know even the Kublit beats on directly for the service layer is everyone from out of the service layer in Kubernetes so like the service layer is essentially a way to give a name to a pod you know essentially it does net pod and the way it does that is it adds IP tables rules locally so it feels like magic when you just access the name you're like oh this is cool it just works you don't know what it's doing DNS or what it's doing but it's actually doing a ton of IP tables you know redirects essentially magic and IP tables to make like so if you have ten pods you can just access that all ten pods by one name and they get round robin door balanced however but it looks like magic to you well the way it's doing that is it's going and beating on IP tables to go add a bunch of rules quickly and then when you scale down it's beating on IP tables again to delete all those rules and so people don't realize that and so that's why I emphasize that really the kernel is is is a you know a really important piece of thinking when you think about the container host all right I don't know that I have time to do justice to the bonus information because we are at 31 minutes but I think I may break for questions and then hold on this and let people approach me if they want to but I will say these were the three things I was going to go into in bonus because I think and I want to always break out my joke Kuvert is not really one of these things but I know that your brain wants to know it so that's why I added it but but in a nutshell I kind of dig into each of these and kind of show what they are too but there's no way I can do that in three minutes because these are even more complex but with that I will break for questions so any questions on all this I know there was a lot I think it was a lot was it a lot boom we have a question good question I will repeat it so his question is does the spec that defines run C define the binary format of the containers themselves the answers no there are and I'll dig into it deeper in the talk tomorrow which I encourage you to come to you but there are three specs there's a runtime spec a distribution spec and an image spec and they're all part of OCI and the image spec defines the binary format which is not magical it's just a bunch of tar balls it's a manifest.json which I screwed up in that drawing the manifest actually points to the tar balls which ones for different architectures and then and then there's a list of tar balls essentially that are on disk you know in a registry server for example. Okay I don't quite understand the question so you're saying like like so like I guess I'll answer it this way if you like a running a Linux container on Linux all of those binaries are elf binaries and they're not defined by the OCI runtime no that's defined by Linux so like so like if you're running Red Hat Enterprise Linux they compile their binaries and we compile our binaries in a certain way you know and those days are typically dynamically linked for example and that is actually something I go into deeply in my Linux container internals talk because most people don't get that so when you fire up a container if LD.so is the first thing it's linked against it will go find all the other libraries on disk load those dynamically into the memory space and then fire up that inside of that container right like all normal process rules apply and that's beyond the spec of RunC RunC is just defining here's the different command line options essentially that can be taken from a you know from a command line in a container engine here's the summation of what technologies get turned on in the kernel to limit those you know to limit that binary basically being ran but no you're left on your own so like Windows containers have to run on Windows Linux containers I generally just say have to run on Linux but there is obviously a syscall layer that's written in Windows that theoretically can run some Linux containers I would argue though run the distribution like if you have an Ubuntu server run Ubuntu containers if you have a rel server run rel containers if you have Windows run Windows and schedule them as such if you start breaking rules with boundaries I guarantee there will be pain there like my old sysadmin gene twitches and I just start getting PTSD from when I would get paged it two in the morning because somebody's running a you know compiled sento s container on Windows and some weird syscall wasn't implemented and it breaks in some weird way it's a turing complete problem so I argue just don't do it all right we're at time so