 So there's one for block IO. I think you can imagine what that one does, right? Block IO. So with the block IO, for example, you can say, hey, I have a C-group blue. You can only use 50 IOPS per second, all right? Whereas C-group red, you can use 150. And you can also get detailed metrics about that. So you can say, how many IOPS to this particular C-group breed, or how many bytes has it written? And there's other ones. There's one for CPUs. There's one for memory. There's other ones for the huge translation, look-aside-buffler, et cetera. We're not going to go through all these. We could spend a day, maybe an hour on each one of these. But we're not going to. So how do you work with these things? They use the Pseudo file system, right? You guys may have used this. So imagine you just got a file system, and you're looking at it in your Bash shell or whatever. Looks just like a regular file, but it's magic, right? So you can write data to this particular Pseudo file. And what it's doing is it sends it into the kernel module, and it's a way for you to configure or send information into the magic. You can also read from them. So for example, you can just do a cat on a Pseudo file, and it could give you some data back. And that data is not just in the file. It's coming directly out of the kernel module. And this is the way that you interface with C-groups. So there's particular files to allow you to set. For example, how many IOPS per second this particular C-group can use. And likewise, you can cat a file and get the amount of block IOs it's done, or the amount of bytes it's written, et cetera. So how does this look from an overall perspective, right? The C-groups. So these things are mounted, just like a regular file system. And the way that they work is when you mount them up, you get a directory for each of the subsystems, which is shown in those. Yeah, you can kind of see it in the blue. So you'll get one for each of the subsystems, and it'll have the Pseudo file system populated underneath it. That is the quote unquote global or root C-group. Now if you want to create a new C-group, let's say I want to create LXC. Literally what you do is you go create a directory under there, and boom, kernel magic populates all these Pseudo files underneath that particular directory for that C-group. And then, you know, if you want to assign processes to that C-group, it's pretty cool. What you do is you just take the process ID that you want to be in that group, and you just cat it into that task file. And this task file says which processes are in the C-groups. And once those processes are in the C-groups, you can get metrics about them by querying their Pseudo file system. You can set limits and constraints with them by echoing data into them. So this is the interface into the C-groups. All right, so first aha moment, or maybe not. We have our beautiful Linux container. We have some processes in it now. Now what we've done is we've added in a C-group. And so the way this looks is in a Linux container tool set that you would use, let's say you say, hey, Linux container tool, create me a new container. What it will typically do is go create a new directory under each of the C-group root directories for your container and then put your process IDs in there. And then let's say you want to tell your Linux container tool, hey, I want two gigs of memory for this particular container. The tool set will go echo that data to the C-group file system to max it out at two gigs, for example. So there's our first step in building a Linux container. We now have the ability to control and get metrics on the actual processes in that container. So the next step, name spaces. So this is all about resource isolation. So imagine on the standard traditional Linux system a couple years ago, five years ago, you have all these global resources. You have interfaces. You have mount tables, et cetera, et cetera, process IDs. What we need to do here is we need to provide an isolated view of these for each of our containers. And Linux name spaces allows us to do that. There's about, what is that, six of them that are available. So there's one for the mount. There's one for process IDs. And they all work the same, but they operate on different resource set. So for example, the actual PID name space will give in your container, if it has its own PID name space, it will have its own process IDs inside of that container. But outside of the container and kind of the root level of the system, they're different. And that's actually a pretty cool construct, because imagine you wanted to take one of these things and then migrate it somewhere else. It really doesn't matter what your process ID is outside of the container. All that matters is what your process ID is inside the container, because it's isolated. So the other ones work very similar. You have one for network. You get your own network interfaces in there, your own routing tables, et cetera, et cetera. I think you guys get the picture. Here's kind of a very colorful picture of how that might look. At the top, we have kind of the global or overall name space. This is what you would get if you're not using name spaces. And then we have a purple and a blue name space. And just kind of notice here in the actual text that each of these, for example, in the mount name space has their own set of mounts. And they don't see these mounts outside of the name space, so it's isolated. And the same comm sets apply to the other name spaces there for IPC objects, network, et cetera. So this chart is just a mapping, high level mapping of the name spaces and C group functionality into the kernel version, approximate kernel version that they went in. The main thing to take away here is that some of this stuff is still pretty new. In fact, the user name space is still under development. So bottom line, the newer kernel, typically the better if it's stable. And then on the bottom right, there's a mapping into some of the standard distributions of Linux. I need to add a bunch of 14.4 in there. All right, so next aha moment. Now we have our Linux container again. But now we've added in isolation with name spaces. So your Linux container tooling spins up your C group with all your processes. And now your processes are also inside their own name spaces, so they can have their own host name, their own ethernet adapters, their own routing table, whatever, their own process IDs. So now we're kind of starting to fill out this picture of how to build a container. All right, so what about a file system, right? I mean, you have to have something to run, right? You need some binary, some library, some configuration files. What's traditionally used with that is something like change root. I'm sure you guys are all familiar with that. Basically, it changes your apparent root directory. Change root is great. What I've seen is pivot root, which is just a more secure version of that is typically used. So to visualize how this is important, imagine you have a raw file system, which has MySQL and its dependencies on it, right? Now, you want to run a Linux container of that image, file system, raw image. The underlying toolset will take that, explode it on the file system. It will then actually use something like pivot root to change you into the root directory of your image. So this kind of gives us the notion of the file system inside the container. So we'll sprinkle that into our container, right? So now we have the root file system there as well. This is how we can realize and actually run an image with some binaries and some libraries. So we're starting to fill this puppy out. So security, this is probably a big one for a lot of people, right? So there's a lot of different mitigations for security in Linux containers, right? One of them is Linux security modules, right? So this is a feature that's provided by the Linux kernel. It's kind of a pluggable feature. And there's a number of implementations out there today, for example, App Armor on Ubuntu, SC Linux for Red Hat. And these guys allow you to do mandatory access control. And if you're not familiar with this, here's a very generic depiction of how it's different from discretionary access control, which is what we probably typically use. So imagine discretionary, which is more standard or more common when you own the system. You, as a user, come in and you grant privileges to your files or your resources that define which users or group IDs can access those, right? So you, as the owner of them, are giving that, you're granting that privilege to some other group or user. The way that that's different from discretionary access control is that only a user or a special admin process can grant those particular accesses or restrictions to a process. So instead of the user saying, hey, you can use all this stuff under Root and this and that, we have kind of a magic user or a magic process that says you can use this stuff in a circle. So it's a very more controlled and more secure way to do this type of realization. There's also Linux capabilities. Not sure if you guys are familiar with these, but imagine your process in Linux having a little flag or some kind of a value that maps to a capability. There's a whole bunch of capabilities. Now imagine your process making a system call under the covers. What's kind of happening conceptually here is when that system call is made, there's a check and that check says, oh, do you have the admin caps capability? If not, you're out of here. If so, yes, we're gonna let you do that. So that's the very bland version of capabilities. And you can assign capabilities to your processes in your Linux containers to secure which system calls that they can do. There's some other security measures, right? So there's some magic. You can bind, mount and file systems to your Linux containers. And when you do that, you can choose the access, right? So you can say, oh, you can only read it. And in that case, you're kind of constraining, you're sharing a file system, but you're constraining what can be done on that. You can also use things like security constate to clamp down the actual SysCalls. Obviously, you keep your Linux, your kernel up to date, right? I mean, the kernel is really your hypervisor here. And, you know, so it's just kind of standard knowledge to keep that up to date. There's this, there's this work stream going on right now to, to hash out the user namespace, right? So this is, there's really two pieces to this. The first piece is allowing users that aren't root the ability to launch a container, right? So right now you need to be root to launch it bad. No one likes to do stuff in root. So this feature, which is in the kernel, but still being solidified will mitigate a lot of that. They'll allow you to launch it as a non-root user and they'll also allow you to map your user IDs and group IDs right into the container. So very exciting. Hopefully it's gonna solidify here soon. All right, so back to our container here. So now we've added in confinement around our processes. This is the kind of Linux security modules confining what the process can do, putting boundaries around it when it's created. And now we have capabilities. We can assign a list of capabilities to our processes, which limit which system calls and things that can do on the system. And all the other security stuff I mentioned too, which isn't in this beautiful picture. All right, so we built a Linux container. And like, let me see what time it is. Well, I guess we built it in like 15 minutes. So tonight go home, write a Linux container tool set, open source it, see you tomorrow, okay? So here's just a broad view of the industry. We gotta go kind of faster, run on short on time. So there's a lot of tools out there. Everything from commercialized stuff like virtual also parallels, which is like a whole solution. I don't want to compare it to VMware guys, don't kill me. But it's a whole solution for Linux containers, the whole management stack, everything. There's Docker, right? Everyone knows about Docker. There's other tools, you know, Google just open source their LMC and TFY. Let me contain that for you. So there's a bunch of tools out there. They all have pros and cons. They, more or less they can do a lot of the same things, but the main point is that there's a broad set of tools out there. There's orchestration and management. So how do you do these things across hosts in your data center and, you know, orchestrate things, right? So there's stuff in OpenStack. We're gonna talk about that right now. There's a couple different ways to get through there. All right, so you can do like libvert LXC. So you can do, you know, use the libvert driver to realize Linux containers. You can do Docker. Docker has a heat plugin. Docker has a vert driver. And I think there's something in the works for OpenVZ. I don't know. There's CoreOS, right? So there's these new operating systems emerging that are really focused on just a very minimal operating system to run Linux containers. And there's a couple other ones, Project Atomic. Red Hat just released a couple of weeks ago. It's pretty sweet. And then there's other third-party apps. You can go to GitHub and find all kinds of things to orchestrate across containers and hosts and things like that. There is some work being done to support migration, right? In the CREU project, CRIU, and these guys are, I believe they're associated with OpenVZ. They're basically bringing migration to the Linux container side of things. So really exciting stuff there. But the story isn't quite as good as it is with traditional hypervisors. This is still emerging technology, right? So there's a few gaps. All right, so that was the first section. Let's see if we start in there. So guys, we gotta go pretty fast through this. So the second set of charts is about some semi-active benchmarking that it did. And what it is, it took OpenStack. I basically stood up two nodes in SoftLayer. One was just a controller. It had all the standard Glance, Nova, and all that. And then another bare metal node with Ubuntu on it. And I did some benchmarking with OpenStack Project Rally, which is basically benchmarking as a service. And I compared the results of what I did with Libvert KVM versus what I did with the Docker driver. And that's what this is about. All right, so here's the environment. I kind of just described it, right? I wanna be clear, KVM is awesome. So don't think I'm trying to put down KVM. I love KVM. I love Docker too. So I did a couple of tests here from what I'm calling a Cloudy perspective, right? And the whole idea here was to understand from a Cloud user perspective, what did Linux containers buy me, right? So the first test that I did was kind of this steady state test. So I in a secretly jammed out 15 VMs to the host, and I gave it five minutes to bring the CPU and memory to a steady state. And I measured everything on the compute host while that was happening. So we're gonna look at the results in one minute. The second test I did was more of a boot test. So, serially, I booted one VM or container depending on which driver I was using waited for it to become active, spawned another, another in that fashion till I had 15 and then I tore them down. And you can see the visualization there. The other one was about rebooting. So I fired up a VM or a container, waited for it to become active, and then I soft rebooted it five times in a row. And I did that till I had a total of five VMs there. And then finally I did a good old snapshot test, right? So I'll throw a VM or a container out there and then snapshot it to image. So let's just quickly look at these results. So here's the average boot time with a Docker vert driver versus KVM, right? So about three and a half seconds is what I saw with Docker about 5.78 with KVM. So that's the boot time for VMs versus containers in this test. Here's the reboot time, right? Takes a little longer to bring down a whole guest and bring it back up in KVM. Pretty fast in the Linux container world. Delete time, about the same, okay? Nothing special there. So how about snapshot? Now Linux containers are a little bit faster, still gotta take the image and capture it on the compute host and throw it up to Glantz. So some of the same stuff going on, although it's a little bit different in Docker versus KVM, but there's still a difference there, right? All right, and here's where we get into, so these are the compute node metrics, right? So while I was running all these tests, I ran DSTAT, which basically pulls a bunch of stats from your system on the compute node, and I graphed them. And what you're seeing here is the steady state test. Now remember, I started this test out by throwing 15 VMs all at once at the system, giving them five minutes to stabilize and then tore them down. And so you can kind of see at the beginning how it spikes up and then it flattens out and at the end a little bit of some bumps there when it's tearing them down. The top one is Docker and the bottom one is KVM. Kind of an interesting metric in my opinion. So here's what I did, and this is not scientific, right guys? So I took the graphs when it kind of stabilized from the previous one and I segmented it across the same timeframe, right? So the top one is what the Docker CPU looked like with 15 VMs stabilized. The bottom is what it looked like with KVM, right? So about 0.2 user CPU average for Docker, 1.91 for KVM, I'll let you do the math. So kind of interesting factors, right? I mean, this is, to me, this is showing something that's kind of interesting. I'll leave it at that. Here's the memory, right? Red, KVM, blue, Docker, right? So I did some magic math up there, right? So about 49 megabytes per VM with Docker versus about 292 with KVM. And again, remember, I said these things are lightweight. You just need application and its dependencies. And so it's a smaller image, it's a smaller runtime and, you know, this is proving it. So here's that serial boot one, right? So I booted one, wait for it to come active, booted another. So basically the compute node was almost always seeing a boot going on. And you can see in KVM, the user CPU starts to really kind of balloon and grow as you go through the 15 VMs there. Docker, I mean, it had some bumps, but it always came back down to a stable state pretty rapidly. And there's some averages there. You guys don't worry about, I have all this on SlideShare and actually like it's like a 70 page deck on SlideShare. So just Google, Bode and Russell, SlideShare. And this whole benchmark thing is called out there if you want to check it out later. So here's what I did. I took the, you know, from 8 seconds to 58 seconds of the serial boot test and I kind of just grafted out and drew some trend lines. Red stuff is KVM user CPU. The blue is Docker, right? I mean, KVM seems to kind of ramp up as you throw VMs at it in a consistent fashion versus Docker, which is pretty flat. This is, again, just a memory graph, one on top of the other, you know, so you can kind of see the way that the memory grows as you throw VMs at the system, right? I did the same thing. I took that memory graph and did some, and I did some linear lines on it. So you can kind of, I mean, if you want to do some very simple math, it's about, you know, one third less memory here, growth and a booting perspective. All right, so the second set of tests I did here, and there's a lot of stuff on these guest tests, right? I mean, if you go to the OpenVZ page, they have a whole section on performance tests they did. They did some really cool work there. I checked that out because OpenVZ is Linux containers as well. So what we did here was actually go into the guest and do some testing inside the guest to see, you know, what does it look like on the guest? What's the performance realization? So basically KVM versus Docker, same network throughput. Here's some other numbers. The top right is some memory tests using MBW, right? What the thing to realize is bare metal is green, Docker is blue. I mean, Docker and bare metal are equal, right? I mean, again, back to my point, Linux containers run at bare metal speeds, period. That's it. And at the bottom, we did linpack across them. The, we scaled up the CPUs all the way up to 30 some and compared versus bare metal. You can see bare metal on red. Linux containers in blue-ish, about the same. This is just random read, write, IO, right? Using Sysbench. This is a synthetic test. So take this with a grain of salt, right? I mean, benchmarking is an art. It takes time. This is a semi-active test. This is not a fully active test. I'm in there looking at every IOP and what's in the IOP. So take some of this with a grain of salt but, you know, take it. So Docker in blue, KVM in red. So then what I did was take MySQL and did a Sysbench, you know, L2B test on it. And you can see they grow, you know, pretty much at the same rate as we scale threads from one to 64 there. And that's the number of transactions. And we did the old index insertion test. I don't know if you guys are familiar with this. This is a new popular one that basically just throws a lot of data, a lot of write data at MySQL. And along the bottom, you see the table size growing in increments of 100,000. And then you see how long it took to write that 100,000 rows to the table. You know, they're pretty much on par between the two Docker and KVM. So I want to take a minute here to step back to things that weren't necessarily reflected in the benchmark, right? If you go to the Docker CLI or to some Linux container tooling and you say run Buddha container, right? On Docker when I tested that on the compute host, it was less than 200 milliseconds, right? And with the Nova VRT driver, it was about five and a half seconds. Now that's not a hit on Nova. What I'm saying is that with the technology that's this fast, you're sometimes limited by the throughput or constraints of the actual orchestration capabilities, right? So just kind of think about that. We need to make our things, we're really gonna need to make fast orchestration and management systems to keep up with these Linux containers and realize their benefits, right? I mean, three and a half seconds isn't long, but man, I don't like to wait, okay? Especially if you're doing a lot of short-lived workloads. And then finally, here's the difference in the image sizes that I tested, right? Docker was about 380 megs and the KVM was about a gig. So in summary, Linux containers are fast, they're damn fast, right? They're fast at runtime, they're fast in the cloudy ops space. They reduce the image size, they reduce the footprint, they reduce the resource consumption on your host. What does that equate to? Greater density, probably. Greater return on your hardware, probably. So I'll leave you with that from a benchmark summary perspective. There are gaps, right guys? These things aren't perfect. They're new, there's not a lot of knowledge in the industry about them. That knowledge is still growing today and it's becoming more and more popular. And there's still some other gaps. There's concerns about security. I would argue that there are mitigations for security, but yes, there needs to be some work done. And there's some other gaps. I really want to try to tie this up because I think we're almost out of time and I want to catch some questions here. And finally, here's some references, right? If you want to go check out the benchmark results here on SlideShare, I have like a 60 page thing on deeper into the namespaces and stuff. And there's some other great links there, like Docker and OpenVZ and all those guys. And then a word from my sponsor, right? So there we go, IBM Track. And here's some more technical sessions from IBM and rock on, I guess. So that's it guys, questions. I think we have, we've got five minutes? Yeah, so we've got five minutes. If you want to ask a question, please come to the mic. Great presentation. So one question is that I believe that OpenVZ, the OpenVZ kernel has allowed users to run as full root in a container. But it kind of seems that with LXC, we're sort of waiting for a few things, be that improved kernel support or improved capabilities or ways to better contain that. So I guess like, how is that the case? How is it that OpenVZ handles this to a point to allow full root? Whereas we're sort of still mitigating this with LXC. So there's probably some OpenVZ guys here, right? But my understanding is that the OpenVZ team is actually, so when you install OpenVZ, you get a set of kernel patches, right? So they're patching the kernel with some extra stuff, whereas traditional standard LXC is using upstream kernel. So LXC, the OpenVZ guys have the ability to get stuff in there faster, what they need versus standard LXC, which is kind of relying on the development community in the kernel. So I think OpenVZ is a little bit ahead because they don't have to have that overhead of the development cycle. Did that answer your question? It does, although it's kind of curious to me that OpenVZ has been around for a long time. And so it kind of seems odd that some of that hasn't been shared into the mainstream kernel. I mean, maybe it's an implementation detail that's undoubtedly harder than we're probably looking at it. But if there's anything else that you might have to say on, like your comment about moving Docker into being able to run an underprivileged user, the Docker process would help mitigate this problem. But it kind of seems like it's more a kernel space issue rather than, well, let's just run Docker underprivileged. Great, it is a kernel space issue. And I know the OpenVZ guys do work with upstream kernel. I don't work with them. So if I'm wrong, come up here. So they are working with the community, but the development process is a little bit slower. But I think you're right. This is a kernel issue. We need to get development upstream in the kernel to make these things more solidified and realized for the consumption of standard LXC, right? Thank you. Thank you. Hi there. Yeah, you were booting your Linux containers with Nova. And connecting them to their own isolated network, did you find a way of connecting the network that the Linux containers were running in with a virtual network that VMs, like KVM, would be connected to? So you could talk from LXC to VM? So right now, from an OpenStack Docker driver perspective, there's some gaps. It's still under development. They've done a lot of great work, but there is still some work to get the networking support in there and some of the other features. So the answer is no. I would encourage people to put on their Python hat and help out with this. I think we'll see it. But today, right now, it's not a full-blown story, if you will, right? OK, thanks. My question is about a VPN connectivity to containers. Last time I looked at this, you couldn't really do Ipsack or you couldn't do OpenVPN inside of containers. Have you run into this or did you ever notice this? I've never tried it to be perfect. I don't want to pretend to know because I don't know. OK, because I had a unique workload that needed IPsec inside a container. Yeah, I think there's probably some really smart container people here and around. I know a couple of the Docker guys are here. I'm sorry, I don't personally know. But if you want to find me later, I can try to find one of the Docker guys or something or OpenVZ guys. Anything else? Oh, guys, they're tearing me down. Hey, I got some business cards if you want to grab one, stop up. Thank you.