 Okay, I think we'll start. Thank you everybody for coming. My name's James Botterly. I'm CTO of Civil Virtualization at a company that was called Parallels, but has now been rebranded to Odin. The marketing department always gets very disappointed if I ask everybody in the audience who's, if they've heard of Odin and nobody sticks their hands up. So instead of subjecting myself to that embarrassment, I'm just going to tell you who we are. So Parallels was originally the company that did desktop for Mac and Linux containers. It actually began life as SWsoft, which was a Linux container company. I do actually have a history here. So Parallels is in fact the oldest container and company in the world. As SWsoft, it released virtuoso containers in 1999. In 2005, we had an open source version of that, we call OpenVz. In 2006, after the publication of the OpenVz source code, the Linux process containers were actually put into the kernel. They're not related directly to the bean counters in OpenVz, but they were something that a group of people headed by Paul Monage looked at what OpenVz did and decided it was very useful functionality. So it went into the kernel, firstly as process containers, and then later as C groups. On top of C groups in 2008, LXC version 0.1 was released. So LXC is actually a container manipulation program. You know it today as a sort of fully-fledged, functional container manipulation system that Canonical is now actually basing or Buntu is basing LXD on. But back in the 2008, and actually all the way up to about 2011 and 12, it wasn't terribly functional. Every time you brought up an LXC container, there were easy ways of breaking out of it. One of the things that we at Virtuoso had actually done a long time ago was to make containers that were secure because our bread-and-butter business when we released Virtuoso in 1999 was actually allowing service providers, those are the people you go and buy infrastructure as a service from for say $10 a month and they give you route access. A lot of those service providers today in the world, the way they give you route access, they're actually giving you route access to a container. And that wasn't possible with LXC for an enormously long time. In 2011 is actually when we began working on the Kernel Container API and I'll get onto that in a little while. So just for the marketing hype, I'll remind you that Parallels is now Odin. So I've actually worked at Parallels as a container evangelist. I'm also an open source advocate, so I've been working for a long time on the business of converting businesses to open source and I'm actually a kernel developer as well. I still maintain the SCSI subsystems so the reason I wasn't actually present at the first three days of OpenStack is because someone very helpfully put the kernel summit directly over the OpenStack summit for Monday, Tuesday and Wednesday. So I had the immense pleasure of flying out of Korea very late last night and landing here at about one in the morning, when it was raining by the way, so thank you whoever was looking after the weather. So what you're here to learn about is container basics. Now it turned out that when I wrote the abstract, I didn't think I'd only have 35 minutes plus five minutes for questions to give you all of this. So I'm actually going to skip some of the history bits that I assume a lot of people who do container stuff I've already told you and in fact, if you've been to a few of my other talks I already give it as well and we're going to skip straight to the what can containers do today? What are they? What do they actually do? So to throw you all into the deep end, oh by the way, one of the things that I'm going to explain to you is the incredible painfulness of the Linux Container API. The things you actually have to do to make containers work on Linux from the command line are awful and it's actually going to be, I've had to deal with this pain for a long time sort of working in a container company. So part of this talk is my pleasure to share that pain with you. So if any of you are marketing people now would be probably a good time to leave before your heads explode. The head exploding bit won't get, we won't come onto that until the demo which will be much later in this. So I'm going to give you an overview but before I actually sort of introduce you to the rack I'm going to show you what it looks like. So the main difference between containers and hypervisors is that hypervisor emulation is based on emulating hardware. So you take a software machine, you bring up a machine monitor that emulates virtual hardware and you bring up another operating system on top of that virtual hardware. For those of you who work in the enterprise which I believe is most people at OpenStack you think this was the only way of doing virtualization and have done since VMware came on the scene many, many years ago. Those of you who are older than most of you look may possibly remember mainframes before that and it turns out that a lot of mainframe technology specifically IBM had containerization features just because hardware emulation was too difficult to do in those days because mainframes were phenomenally complicated things. So the vision I'm giving you is the one that I think appeals to most of the age demographic in the room which is you've never heard of containers up until they suddenly got incredibly popular last year or just about the year before now. So what containers are about is they're about virtualizing the subsystems of the operating system itself. So instead of working out how I do a hardware description to bring up a completely new operating system, we work out how we take each of the services the operating system provides, you know, networking, file systems, so on and provide them in a way that's fully virtualizable. So in effect, I can bring up different copies of exactly the same operating system but based on the same kernel. So the true difference between all of these is that containers only have a single kernel running underneath them. Hypervisors always have multiple kernels because there's always a kernel running inside the hypervisor itself even though VMware would tell you this is not true, it is. There's always some sort of operating system running in the host and then you boot up an entire new operating system including a new kernel in the guest to do all of the features that you want this sort of virtualized operating systems that you're bringing up to do. This difference between single and multiple kernels is actually one of the reasons why container technology is embraced in the service provider space but not embraced in the enterprise space. So if you think about the problems the enterprise was thinking of back in the very early days it was sort of things like DevTest but it was also heterogeneous environments in those days. So if I'm sharing a single kernel I cannot bring up two instances of an operating system that do not share the same kernel. Now back in the early days when VMware was around this was Windows and a tiny bit of Linux and it is impossible for Windows and Linux to share the same kernel so with containers you could never bring up Windows and Linux on the same box this is why the enterprise really didn't like it. This back in 1998 was a huge problem this is what killed container technology for the enterprise. Service providers embraced it just because their problem wasn't really bringing up Windows and Linux it was we have a large homogenous group of machines and they all run the same operating system so to us it doesn't really matter. We actually provided Windows containers as well and they were perfectly happy with one set of machines installed with Windows to run Windows containers one set of machines installed with Linux to run Linux containers. So this inability to bring up different operating systems was seen as an Achilles' heel of containers back in 1999. And when I say different operating systems if your operating system is good enough like Linux I can still bring up different what you think of as operating systems so I can still bring up RHEL and CentOS and SLES all on the same kernel because it's the same Linux kernel underneath it. So all it cares about in this operating distinction is that the kernel that I'm running that I'm sharing will actually support the operating system. And to be honest in the early days this was an Achilles' heel of Windows as well. The reason it works well for Linux is usually because the kernel ABI is such a fixed and strong thing that any modern kernel can run almost any older operating system that was released before it. So we have on Linux almost no compatibility problems bringing something like, I don't know, RHEL 5 up on a 3.10 kernel. It can easily be done just because the ABI still supports it. On Windows the situation is very different because there's a lot of interplay between the actual user space and the kernel space of Windows. So there's a lot of sort of swapping that goes on and it means that it's impossible even for the next generation of Windows say going from Windows 2002 to 2003. You can't actually bring up containers. If you've got a Windows 2002 system you cannot bring up a Windows 2003 container usually because there's something wrong in the kernel and it breaks. So main difference is single kernel for containers with virtualization sitting in that kernel to support multiple operating system functions being brought up with it. Hypervisors, multiple kernels. And obviously one of the immediate advantages to you and the enterprise of running a single kernel is that it solves a lot of the patching problem for virtual machines because the virtual machine image of a container system does not actually contain a kernel which means that you don't have to patch any of the vulnerabilities in that kernel or do anything else. It also means that if I use something like Kpatch or any of the other patching technologies like Ksplice I can apply a patch once to that entire kernel and immediately all of the guests benefit because they're sharing that kernel. It's actually a fairly significant advantage to service providers and it means that some of the problems in the enterprise of image drifting and image patching don't exist in containerized systems. But that's not the main thing that containers give you. One of the really big things they give you is elasticity. This was also why they were more important to the service provider space than the enterprise space. So in the early days of enterprise, hardware budgets came out of your ears, went above and through about 10 stories beyond. They did not have a problem with buying more hardware to do more stuff. In fact in the early days of the enterprise the reason virtualization came along is the CIO was actually struggling to find stuff to do with the hardware. So adding virtual machines was actually just something to do with the hardware and then provide extra services based on top of that. With container technology the advantage for service providers is that when you squeeze these sort of containers down into very small constrained systems the performance under that squeezing is far better with containers than it is with hypervisors. And it's partly to do with the size so I'll show you some diagram for that later on but it's also to do with the fact there is only one kernel. When you start to put Linux or any hypervisor system under extreme resource pressure the host starts trying to steal pages of memory from the guest. This is a standard thing that happens. When you do that with a hypervisor it turns out to be an unstable system because both the kernel and the guest and the kernel and the host are trying to do reclaim to solve the resource starvation problem and the way they do it tends to cause them to fight with each other over it and the result is they don't quite deadlock the system but they make it bog down and go much more slowly than it should. With containers because it's a single kernel and one kernel is used to being put under resource pressure when you actually put it under resource pressure it just does all of the things that a kernel naturally does and it solves the best of its ability in that single kernel that resource pressure and resource starvation. And this makes containers the reason that lean and elastic is because under this resource pressure they can still deliver performance that was required from service providers. So this manifests itself to the service providers as this wonderful thing called density. And in the early days effectively because of this resource constraint problem we could pack onto a single physical system three times as many operating system containers as we could hypervisor systems. And the objective of a service provider because they're only selling you this box for $10 a month for the login the more logins they can pack on one box is the difference between profit and loss. And this is why container technology was essential in the service provider space. So if we look at it this is showing roughly I can't really use the pointer this is showing roughly what I might be able to use a mouse on here this is showing roughly what I said there's a hypervisor kernel here if you can see it and there's another kernel in the guest here. If you compare it to containers there is only a single shared kernel in the system so this is the actual operating system container coming up from init all the way on up. And then obviously the new use case is application containers. So there's the application container sitting on top of that. One of the advantages for application containers is not only can they share the same kernel they can also share the same versions of all the operating system subsystems in it and libraries as well. So this is sort of the Docker and RKT rocket use case. And obviously just in terms of which looks better from which stack is less fat it's this one it's the container stack there are just fewer boxes in the stack. So just from how much did I have to put in to get this to work containers Alina. And in fact hypervisors this is sort of a hypervisor image is typically gigabytes. A container image on the other hand especially if it's an application container image can be of the tiny order of megabytes. So this is why container technology was very attractive in the first instance because the lightness just made them denser and far more elastic but it was far more than that having a single kernel manage all of the resources solved all of the problems that hypervisors had because it's effectively two kernels not even trying to cooperate just communicating with each other over a hardware interface. It works if you have enough resources but when you put a hypervisor under resource pressure it suddenly falls down a lot faster than it does with containers. The other thing that's really useful about containers is their scaling properties. You're probably all seen with hypervisors like to take memory away from a hypervisor you need to do memory ballooning. So you inflate this balloon cooperatively in the guest and then pull memory pages out of that balloon into the host. The very act of inflating the balloon tends to annoy the guest especially if it's under pressure itself. With containers the mechanisms for actually controlling resources is the C group things. The knobs already exist in the kernel. So I can take memory away from a container just by writing a couple of variables to a single file in the CIS C group system. So it's as easy as that. I don't need to inflate a balloon. I don't need to do anything else. That container will respond within milliseconds to microseconds depending on how fast your machine is. And this can be done with any resource that it's controlled by the C groups. It can be done with CPUs. It can be done with memory. Because the operating system that's running on a container doesn't actually see physical CPUs or physical memory I can do it effectively through the kernel subsystems themselves and kernels are very used to doing this. So containers have much, much better scaling properties than hypervisors. If I built a very strong, very powerful, very scalable system it's actually much easier to do it with containers than it is to do it with hypervisors. And obviously the shared kernel system as long as the kernel is a good one it makes container resource decisions much more efficient than hypervisors because there's no two kernels fighting each other over the resource decision. There's a single kernel arbitrating all of the resources in the entire system and it sees everything at the correct granularity. So a hypervisor only really sees when it wants to control memory resources it sees pages and it doesn't even see what the LRU list of the guest kernel is which page is going to be reclaimed next. So under memory pressure it just takes a random page out of the guest. That happened to be at the bottom of the LRU list instead of at the top then the guest is going to evict this page and then it's going to come to the one that the hypervisor just took away and the hypervisor is going to go through some horrible thing where it tries to swap that page back into the guest again. Whereas the kernel that's arbitrating containers not only sees all of the pages that everybody's using it itself controls the LRU list because the container operating system only starts it in it, it has no kernel piece and it also sees all of the objects and how they're being used inside every container. So it has much fuller information when it comes to making resource decisions. And this is why the resource decisions tend to be made much more efficiently inside containers. So now we come onto the pain bit, the Linux container API, I really like this. So containers in Linux are controlled by things called C groups and namespaces. Realistically that's all you need to know to control the kernel container system. And for reasons that are best buried in history the control planes for C groups and the control planes for namespaces are radically different from each other. It just so happens that you wouldn't believe this but open source development is a lot about personalities and one personality developed the namespace isolation and a different personality developed C groups and they could not agree on the API. So we are stuck inside the kernel with two separate APIs. This is part of the pain so don't worry I'll be showing you all of the differences later on. Well, if I have time, which hopefully I will. The point about this kernel thing is that everything that claims to be a container system like our openVZ, like Docker or like LXC is all orchestrating this C group and namespace system. One of the key things about this is that the kernel API is the same for everything. So LXC, openVZ, Docker, any other form of, I mean I could have put up a dozen different container systems on top of this. They all at a base level talk to exactly the same kernel API. There is method in this because it came from an agreement at the kernel summit in 2011 which was actually driven by us. And it's sort of, it's one of the useful success stories in open source. So I went to the kernel summit. I just recently joined what was then parallels and is now Odin. My job was to get virtuoso, openVZ, upstream into Linux. And instead of just trying to push it upstream by brute force and effectively forcing it as a parallel subsystem into the kernel which had been done before, this is what happened with KVM and Xen. Xen was forced into the kernel as a completely separate subsystem from KVM. So even today when you choose a hypervisor Xen talks to a different kernel API from KVM and we have two completely separate systems to support that. Instead of doing that for containers we came to an agreement at the kernel summit with all of the inKernel which was LXCC groups and out of kernel which was actually virtuoso on one hand and the Google container technology on the other. And we came to an agreement that we would actually merge all of our implementations into a single ABI. And what this effectively meant is we had to have a show and tell of what do we have, what do we do that you don't. So we'll just shove that straight into the kernel. What do you do that we don't. And then for the things that we all do separately who does it the best so that's the API will adopt. So nobody came out of this with the complete agreement they want because obviously what I wanted is that everybody would just agree to use the openVz API that didn't happen. So what we got is this hybrid of C groups and namespaces. And our namespaces exist in openVz so they were a fairly easy port. C groups have a parallel inside openVz which is called bean counters. So effectively our agreement was that we would abandon the bean counters in openVz, fully adopt C groups. And as part of that adoption, we would add all of the missing pieces that are causing resource performance problems in LXC. And additionally, we had additional security things and namespaces and so on. So as part of this agreement, the entire kernel got a kernel level enhancement of the container system that actually gave us the ability to bring up fully secure, fully isolating, fully resource controlled containers. In 2015, that program is almost entirely complete apart from one particular piece, which is kernel memory accounting for the entries and inodes. I can see people's heads already begin to spin. So this is a very esoteric area of the kernel. But if you're using containers for service providers, the reason you want memory in these things controlled is because the kernel will just give you as many dentries and inodes controlled by file. Once you open a file, usually you get one inode and a bunch of dentries. There is a trick you can do on a service provider where you make a directory change into it, make another directory change into it, and just keep on doing that recursively. And what that does is it runs the entire kernel out of inodes and dentries. So if this is not controlled, anybody buying root of the service provider can immediately run the entire kernel out of these resources, which mean that everybody else's containers on that system crash, which is why it's important to us to get this fixed. But that is the only piece that is missing. For the rest, security and isolation is already upstream in Linux. From about I think the kernel which has almost all of the safety features, you could say it's 312, but you'd certainly make a safe bet that it's 316. And for the enterprise kernels, which as you know are based on 310, they've backported most of these. So a rel enterprise kernel, by the way, does not really look like 310. It looks mostly like 4.0 in today's world. But we organized all of these interests to converge on a single unified upstream ABI, partly because it was really the only way of getting everybody to agree to do this because open source is about cooperation, but also because, in my opinion, strategically, the vulcanization of the virtualization subsystems set us back in Linux by several years and allowed VMware effectively to conquer the enterprise. So we spent so much time fighting over what the hypervisor ABI would be before finally putting both in that we lost an awful lot of time in actually improving hypervisor subsystems in Linux. And we were determined the same thing wouldn't happen for containers. So in that sense, it was very successful. Slightly less successful is I hope that by leading this effort, our name would be up in lights as the people who did this. And unfortunately, if you ask most of the people in the room who the container people in Linux are, they'd all say Docker and no one would say Parallels and Odin. So that was a slight failure. And of course, it led directly to the ability of Docker to run on upstream containers. In fact, Docker for a long time had no kernel team. I believe they're just spinning one up, primarily because they relied on the kernel enhancements that not just us. I mean, it wasn't just us. It was us, plus Google, plus a lot of guys in all of the distributions, Canonical, Ubuntu, SUSE, and Red Hat. So if we look at these C groups, what a C group does is it controls resources within the kernel. So there's a C group for controlling IO. This is how you actually do IO partitioning between containers. There's one controlling the amount of CPU you use. This is how you partition the work in amongst containers. There's one for devices, which is used mostly by containers that want to bring up hot plug things like USB. There's a really important one for controlling memory, which is how we make sure that when one container exceeds its memory allocation, it asks for more memory. It doesn't take it from another container unless we've actually authorized it. It actually gets tipped over into a swap situation and it starts to bog down. So this memory C group is really, really important. And there's another one for networking to make sure the network packet bandwidth and everything is also per container. And then there's this weird thing called a freezer, which is was mostly used for suspend and resume. Realistically, all it is is a C group where you put a bunch of processes in it and then you put them all to sleep simultaneously without having to worry about all the resource dependencies between them. Or if you're thinking of you've got a producer and a consumer running in separate processes, the problem is how do you put them to sleep? Because the producer would notice the consumer has gone away and the consumer would notice the producer has gone away if they're really tightly coupled. So the freezer was the way we did that. So we put both these processes into the freezer and then we suspend both of them at the same time instantly and neither one has a chance to see that they lost the other. And then on the other side, there are these things called namespaces. Namespaces are a pure isolation layer inside the kernel. Most of the time, resources only belong to a single namespace within the kernel. So we have a namespace for networking, which means that we can take network devices and place them into separate namespaces within the kernel. One network device can only belong to one namespace within the kernel. We have the IPC namespace, which is because System 5 IPC has to be virtualized as well, because you can't see the message queue from container Y and container Z without an information leak. So there has to be a separate IPC message queue, for instance, for each container. So this subsystem had to be virtualized. There is a mount namespace because the file system tree of each container should be different. It doesn't have to be, but it should be. So this allows us to actually put a separate root file system into each and every container if we wish to. Or by using the wonderful Linux thing called a bind mount, we can actually move portions of the root file system also into containers using exactly the same piece of technology. There's a PID namespace. This is primarily to satisfy init systems. If you're bringing up an operating system container, if you go back to the old diagram I showed, init has to run inside that container. It's amazing how annoyed the init process gets if it's not running as PID 1. So we had to virtualize the process subsystem and Linux so that PID 1 was available for all of these separate little inlets to run inside operating system containers. The PID namespace doesn't just do that. So it was designed because that was a problem. But now it actually isolates the process tree of each container from the other. So if you're running in separate PID namespaces, you can't do PS in one container and see all of the processes in another container, which is also important for resource separation between containers because to do otherwise could possibly cause an information leak. There's a UTS namespace based on nothing more simple than the fact that each container needs a separate host name and for reasons best known to UNIX people, host name is set by a system call. So it was a system call that had to be virtualized. And then there's a username space, which is really, really important because this is the way that we actually pretend to be root inside a container without actually being the real root of the system. So early on with LXC, one of the real problems was that if you ran root inside an LXC container, that version of root could actually break out of the container and become root in the host, which is obviously has devastating consequences if you can actually achieve this. And the fact that early LXC containers were leaky meant that it was actually fairly easy to do this breakout. And so the username space was invented primarily to allow you to bring up fully unprivileged containers. The username space is effectively the way we do security in virtuoso today. We have a slightly different system, but what we use was merged into the username space. And so it's all available to us today. So for those of you who are falling asleep, if you think this was the pain, we're just coming to it. Because this API is completely toxic. It is very difficult to use. But because I'm who I am, I thought I'd give an example to you of exactly how bad it is. So I said there are name spaces. So on my own system, this is a, oh, I should add, this is an open SUSE system running the 316 kernel. So I've got most of the namespaces and C groups that run here. Let's see. I actually, I think this is about as big as, sorry, this thing will go. However, let's see if we can bring up a GNOME terminal. Sorry, this is probably a system deproblem. Oh, no, it's because I'm running root there, isn't it? Which is the font one in here? It's what? OK, is this more visible? So here are all the namespaces that are running in the system. Namespaces are represented basically as inodes. So all of these numbers here represent the effective inodes that the namespaces belong to. These are the six namespaces that I actually have in the system. And a simple demonstration, I can actually enter into it. So if you look at my ID here, I'm myself on this computer. I'm running as not the root user, but my own user ID on this computer. One of the interesting demonstrations you can do is I can just enter a new username space using unshare. The minus R option means do all of the PID and GID mappings that make me root. Sorry. And there's me as root. So using a fully unprivileged, non-SUID executable, I can become root just by taking advantage of the properties of username spaces on the system. Now, if I do an ID, you think I'm root entirely in this username space. If I do an LS minus L on proc self namespace, you can see that the number of this is where it becomes really painful. If you look at the username space, it's just one up from the bottom. That number will change from the horrible eight-digit number it is to another horrible eight-digit number it is. But I think if you look at these two, you see all of the other namespaces is the same. And I have a different namespace for the username space. So all I've done is enter a username space. I'm sharing everything else. I'm not namespace separated on everything else. So if I do PS, oops, sorry. This has obviously fallen off the bottom of the terminal. Let's see if I can bring it up. If I do a PS on this, I can see all of the processes running in the system because I haven't entered a PID namespace. However, one of the interesting things if I do an LS minus L on my home directory, this is no longer owned by me. This is all owned by the root group. So if I do an LS on something that root would have ordinarily owned in the system, like let's have a look at the shadow passwords, for instance, they're now owned by nobody. Now, in theory, root can open a file owned by nobody. But my fake root inside the container cannot. Because all that the username space has done has remapped my UID. This is actually controlled by the mapping files that are called UID map and PID map. It's not LS them, let's count them. So I'll show you what they contain. So I can actually, what's there? Yeah, sure. They're mapped to the ID for nobody in my system. So we've actually done a numerical UID mapping for this. And that numerical UID mapping is actually in that inside that proc self map. Because if you look, the way proc self map is constructed, it has the ID that I am. I just can't do this kind of. OK, first number is the ID I'm mapping to, which is zero, the root one. The second number is the ID I'm mapping from, which is my user ID 1,000 on my laptop. And the third one is the range. So it's how many IDs go up here. So I'm only mapping one ID. I'm mapping myself to root. If I had 10 there, I'd map ID 0 through 10 to ID 1,000 through 1,000 and 10 to ID 0 through 10. So it allows for range mapping inside it. And these files can be written to separately as well. So I can actually echo interest. So if I want someone to be bin, let's say it's going to be 1,001. So I'm going to map IDs 2 and 3. Thank you. Musical accompaniment is always good. Sorry, I need to be root, and I'm not real root. Let me get real root back. Actually, just pretend I've done this. I've got seven minutes left, and I'm not going to give you time for questions unless I do this. So have your heads exploded yet? Because this can get an awful lot worse. So I'm afraid you're probably going to have to put up with something rather small for this. Because for my next trick, what I'd actually like to do is show you how C groups work. So I can show you the interface from here, which is the big terminal. But I'm actually going to need the root terminal to do stuff. So let me look at what my PID is here, 24196. So if I go to Etsy sys, sorry, sysfscgroup, this is where all of the C groups in Linux live. So if I do an LS minus L, you see there look to be about 12 of them. But in reality, several of them, the ABI for C groups has been changing over the years. So all of the linking in this directory is actually the ABI changing. Each of these C groups is separately mountable in Linux. So if I do a mount system call, you'll see each of these C groups are separately listed. I'm going to be a WIMP. Because if you look inside any of these C groups, you see all of the control systems for these C groups. Only the simplest control system is actually in the freezer C group. All it really has. So even I'm going to WIMP out here and show you a really simple C group rather than a really complex one. Now the way C groups work, if you change directory and actually let me. So I've just exited the name space. I'm actually I am going to become root because I can't do it without being root on this. So the way you control C groups is actually via file system calls, mainly make directories. So right at the moment, if I do a cat on tasks, I'm standing in the root C group of the freezer. Every process in my system is actually a member of the root C group for every C group. So in order to actually move processes into different C groups, I first of all have to create them. So if I actually do a make directory for a test C group, this is now a sub C group from the root C group. There are no tasks in this at all. So it's completely empty. And it also has, if you look, additional files, which is freezer state. So the freezer state is what's actually controllable in the C group. So this C group is currently thawed. I'll actually gonna bring back this. It's actually root, but it doesn't matter. So this is a shell that has process ID 21154. So I can move this process into the C group simply by echoing it to tasks. So this process is now inside the C group. There's nothing happening in that terminal at the moment. That's because the C group, if I cat it back, you'll see it inside tasks. And now what I can do is I can actually freeze the C group by echoing it to freezer state. This, by the way, is why these things are so horrible. If I go here, you can see I'm pressing enter on the keyboard, this process is now frozen. Any set of tasks I put into this freezer would now be frozen. And I can actually get all my key presses back simply by unthawing. If you do a PS on this, you can actually see that this process looks like it's running. So it's stuck in disk weight. And the reason for that is because it doesn't look like if I'd done a kill stop, you would have seen the stop signal there. It doesn't look necessarily like a stop process. So it's very difficult to tell from PS whether the process has actually been put inside a freezer or not. Most of the time they would actually account this to disk weight. So it depends on what tools you're actually running. But remember, the way I'd usually be using this, especially if it's in a container, is I'd have done a separate PID namespace so I can actually render this process invisible to all the system tools that are actually doing that accounting. Because accounting is also broken up into namespaces and things. I was also gonna do a fairly clever demonstration with the network namespace, but I think I'm running out of time. So for everybody, the IP net namespace command is actually the one where you have the most useful thing because a network namespace actually allows you to build multiple networking systems on the same box that you're actually running. So it gives you a way of actually playing with networking protocols without actually having to be multiple separate systems. And so I was actually gonna do a VF demonstration, but I'm afraid I've run out of time. So we'll just go back to, I was gonna put that for the demo. Other necessary tools, which was to match hypervisors, we also need migration. So we have a migration project. In the interest of time, I'll just skip over that because it's basically a way of migrating groups of processes and it matches what VMotion does for hypervisors. And I'll get on with the conclusions, which are that thanks to a lot of upstream work, containers are here to stay. The native container control plane, as I hoped I showed you, is excruciatingly complex. And remember, I walked you in that demo through two of the most simple namespaces and C groups on the planet there are. There are much more complex namespaces and C groups that you could have played with. But nobody's head exploded in this room, so the chances are that this isn't actually as bad as most people think it is. It's just slightly excruciatingly painful instead of incredibly excruciatingly painful. But that's also not an excuse for not using them. So what I'd like you to do now you understand the basics. The only things you really need to know to manipulate all of this is where to find the C group interface under CIS, which is CIS-FSC group. How to use Unshare, and there's actually another thing called NS Enter that I didn't show you, that allows you to enter namespaces. And effectively with those two, three things, you can manipulate any namespace you like. You're actually going to find it's much more complicated than you think. And if you want to manipulate only the IP namespace, then the IP Tools 2, the IP Route 2, the IP SetNS command is actually a very, very convenient way of doing it. It actually has its own really convenient control place for doing this, if that's what you want to play with. So it's not as bad as I've been making out, but it is still pretty horrible. So with that, I'd just like to say this presentation is done by using Impress with Barthak Sopa. It's all written in HTML5 and CSS3, which makes me a web developer rather than a container developer. And I'd like to say thank you and entertain questions. So, does anyone have questions about the horribleness of, okay, back. So the question is, what is the benefit of running a container in a VM? And the answer is, there is no benefit. The reason why people commonly claim you have to run a container inside a virtual machine is to benefit from the security properties of a virtual machine. But as I've just demonstrated, we can actually run containers natively when properly set up in Linux so that the security is all present. So it's perfectly possible in Linux with a recent kernel and a good orchestration system to actually run containers that are fully secure and fully isolating. But remember that the poster child for containers is Docker. Docker began in the 3.8 kernel. That kernel did not have a lot of the container security features. So in order to get some of the earlier systems to work, there was no choice but to run them in hypervisors. But nowadays, there is a choice. You do not have to run containers in hypervisors. So for service providers, for instance, we run a system where we actually use nested containers to run Docker. The only reason we use nested containers is because the current Vogue in service providers is actually providing the end customer with access to the full Docker control plane rather than the service provider. So if the service provider controls Docker, they can deploy to all of the containers the user applications. But if the user wants to control Docker, you actually have to virtualize that Docker so you run it inside two separate containers. Does that answer the question reasonably well? Okay, how much longer do I have guys who timekeepers? Okay, next question then. So the question, if I understand it right, is about effectively same page merging inside containers. Can they do that? The answer is yes, it is perfectly possible to do this. So in our commercial virtuoso product, we actually have a same page merging algorithm that goes and finds pages that look like they belong to different devices in the page cache and merges them into one using knowledge that's actually obtained from the execution pattern of these containers. So this is one of the things that's actually been plaguing Docker for a while because as you bring up the cascaded name space in Docker, you bring it up with a different instance and a different device. It actually sprays copies of the same page throughout the page cache. And we have technology that can actually bring them back together again. But that technology for us is actually an add-on you'd pay for in virtuoso. But other people are working on the same technology in open source. We actually have a project with clear containers where we're looking to unify the reclaimed techniques that KSM uses along with DAX, which is used to control file-backed memory and actually anonymous memory communicating the LRU list between the guest kernel and the host kernel, which would actually give us a para-virtual memory interface that would solve a lot of the hypervisor performance problems that I alluded to earlier. So there's an awful lot of work going on in this field. Okay, so the question is, do I think virtual machines will be a random extinct by a container? So working for a container company, you might think you have a slight bias in giving this answer. So I will try to be neutral. If that question were asked to me a year ago, I would have said, yes, it's perfectly possible. Containers with operating system containers can do almost everything a hypervisor can do. The only thing they can't do, the use case of completely separate operating systems brought up on the same machine is going away because as we know, everything in the cloud is becoming homogenous. But if I look at what's happening today, there are certain companies that are very well funded that don't have container technology and are feeling burned and left behind by the container bandwagon. So they have a vested interest in pushing hypervisors to match the container use cases. And obviously I'm thinking of VMware with their secret containers project. Intel is doing the same thing with clear containers. That's effectively hypervisor technology that's being pushed towards container elasticity. And Microsoft is doing the same with Hyper-V and Nano. So there are a lot of vested interests trying to make sure that containers actually have the ability to do all of this. Sorry, I think we're almost completely out of time. So I think that there will be a use case for hypervisors that can match operating system containers. If I want to make a bet nowadays, I tell you that probably the operating system container business will be subsumed by hypervisors just because they're a lot easier to play with. They have a lot of advantages and if they match containers on density, why wouldn't you do it? But because containers are a virtualization of the operating system, there's a lot more use cases we can use those individual virtualizations for that hypervisors cannot match. So for instance, if I look at Apache, it has a thread pool. That thread pool is the source of most of Apache's exploits because you can break out of that thread pool using a SQL injection attack or a CGI attack. So one of the things that we can do is actually make sure that every one of those threads runs inside a container. So Apache is actually managing containers for its thread pool instead of threads. And that means that if you do a breakout in that Apache thread, you cannot actually get out of the container. So it allows us to do much more security isolation and past systems. That's a use case for container technology that still cannot be matched by hypervisors. So I think both will exist. There will always be use cases. So you've got two minutes to get to your next talk. So I think I should probably say thank you very much there. Thank you.