 again for our last talk before lunch. Jake Holes is going to talk to us about the security options for containers. Oh cool. Hi, I'm here to talk about security options for containers. This is going to be a bit of a whirlwind tour about what it can do to patch in a bit of security to containers and what you should be looking for in existing container implementations to see if they're secure. So a bit about me. First, I run dogged.io where I'm trying to document all the low-level features of containers in case you ever want to go and write your own containers or container stack. It's a lot simpler than most people would think. If you can fork it's that's about all you need to know to be able to pull something like that off. On Twitter I'm follow or I am container dog so if you like what I talk about here please think about following me and shout out to Anchor for sending me out to speak. I wouldn't be here without them so let's get into it. These days there's a much wider variety of the type of people that are going to attack you. In the old days it used to just be botnets and script kitties and I tend to find that these days their requirements match those of my customers in that they want access to the customer's data so does the customer. They want a bit of storage and CPU and protect against those sort of threats you really need application level security OS level security isn't going to really cut it but these higher-level attackers the organized crime, general activists and NSA level government threats they've got a lot more skill they've got a lot more time and effort they can use into attacking you and they tend to want to move vertically so they're more than willing to attack a different customer on your hardware and then jump into a different and use that to attack other customers on your kit and so they're more the type of attackers the thing security frameworks I'll be showing off here today are helpful for preventing. So what is security? I've probably got a more concise form there I think it's important to define it basically it's about retaining administrative access to the box you want to be able to both detect that your box has been attacked and continue to administer the box so that you can try and remove that threat. In this case you also want to restrict knowledge of other containers so you don't want other customers knowing about having a full list of the other customers on the box I think that's fairly important. Now UNIX has always had a whole lot of security features built into it that are useful you're probably going to be familiar with a lot of the subsystems on the left and might have used some of the stuff on the right. A lot of this is user-focused as a UNIX security model was user-focused however containers throw a bit of a wrench of the works here in that a container can beam or have multiple users here so there's some subsystems up there that actually don't work very well in a security context. Quoters are probably the classic example here in that as you've got multiple users and quoters allow you to limit the disk allocated to a single user several users in a container can actually collude to exceed the limit of a single one and thereby fill up the disk. Blacklisting via ACLs is also very very similar if the container and the people in the containers have the ability to swap between different UIDs then blacklisting a single UID is definitely not going to be effective and I found that there's nothing really that useful in our limits that's helpful in containers as much better subsystems as I'll go into later on that are useful for providing similar features there. So capabilities these are what capabilities look like this is actually a list of capabilities that are not very useful in a container context probably one you might be familiar with is Capmake node which is the ability to create device nodes. Capabilities were basically a way to an attempt to cut down roots into smaller bundles of authority that could parcel out to both processes and files and unfortunately that backfired a bit there's a capability known as capsis admin that's heavily overpowered and is almost a god to capability but these are generally not these are capabilities you definitely don't want or you most likely don't want in a container Mac override for example would allow you to turn override mandatory access control such as SE Linux or app armor and you really don't want the ability to turn that off inside a container for using that to protect the container itself. Make node can be pretty dangerous if someone creates a node for the root file system or where the root file system lies and mounts that file system and then starts making changes to it then that could allow them to escape the container so you definitely don't want that and you tend to find that most of the capabilities you don't want to do in a container have to do with low level management of the box such as dealing with the devices so it's fairly easy to track down which ones you don't want and at the very least it's a very very cheap and easy thing to patch in. The most containers have an entry point which is an in its system or an application itself and instead of calling that directly you can just call Capshell which allows you to drop these capabilities and prevent them from being gained again so you know once again very very easy to patch in very easy way to add a bit more security to your traditional container system. You might want to just be careful that setpcap does allow you to actually fiddle with capabilities so you definitely want to be dropping that one as well and if you're doing to trying to drop capabilities by hand make sure you drop that one last as if you drop that first you can no longer drop any other further capabilities which cause me to tear my hair out for a very real long time. C groups this is not traditionally a subsystem you'd think of as security but if you're trying to retain an administrative control to a system this is vital importance. The ability to account for resource usage and track down what's used in your system is vitally important but probably the most important one here is tracking of processes. I don't know if anyone's ever dealt with an Apache process where PHP processes decide to fork off a worker and it's no longer the and that worker is double forked and is now the parent of it is int rather than Apache. If the workers or if Apache is inside a C group it's very very easy to track down that errant processes the process itself can never escape out of the C group unless you've explicitly reassigned it. So using C groups to bundle together a group of processes and track them collectively is of vital importance. One of the less known features is it can actually block device access or give much more fine grain access to devices which if you need to grant access to a real physical hardware device dynamically this is a great way to do it. In the case of making over the capabilities you might think that's enough but if you've got customers bringing their own file system images and the device nodes are already on the file system then it's very easy for them to just put a device node for the root file system and access it that way. So you're really going to want to layer up in that case and also use C groups to restrict access to devices. It is actually implemented as a file system it looks a bit like this I don't have a cursor here but yes I do. So the tasks file here just contains a list of PIDs and these are the PIDs that are tracked inside the container. If you want to add a process into the C group you just echo its PID into the tasks and suddenly it'll be tracked. Here I've shown an example of what the device is not listed which allows you to see all the access policies for all the devices on the system. The A there stands for all devices but can be B for block or C for character and this just allows access to any major or any minor device. So as you can see here I've granted read write and make a node for everything for everyone which as this is the root C group is probably a good idea because I want my Udev to be able to create my device nodes. But on other systems you might find there's a lot more other files in here such as CPU counting ones to see how much CPU time has been used or CPU share to actually limit how many CPU slices per second a process can get as well as some memory counting ones which are also very handy so very handy for tracking memory usage for the container as a whole. This is patching this in to an existing container solution is a bit harder but nearly every container solution I've seen has C group support out of the box. If you do have to patch something like this in just where you've got your S-bin you want to replace it with a wrapper script that just echoes into the tasks folder. The only caveat there is that the C group file system might not actually be exposed to the container so you might have to pause and somehow communicate out of band your PID and have that damage complicates things but as I said most container implementations rely on this functionality anyway so it shouldn't be something you have to patch in. Instead you can just focus on the devices limiting access to raw devices or the devices.list. The next two subsystems are the Linux security modules Aparma and SE Linux. I don't really have much to say about one versus the other. They're both great. You should be using one or the other if you've got containers on your system. Which one you use is probably more dictated by your distribution choice and what actually works. If you can use SE Linux I highly recommend it as I'll get into the next couple of slides it brings a couple more things to the table. However Aparma is just as good however it's more focused on protecting the host system from containers rather than protecting containers from each other. If you have a moment and you're interested in these sorts of attacks you can actually launch from a container against the host system. I highly recommend going and grabbing the LXC source code and having a look at the Aparma policy. It's quite easy to read but it makes you realize just how much damage you can do with access to the sys file system. Echoing stuff into the right file and they can cause lots of damage to the host system. So let's talk about SE Linux a bit. SE Linux is also used to protect the the system from a malicious container but it can also be used to help prevent dire leakage from one container into another container. When most people think about security they start to think about unclassified secret and confidential. That corresponds to the multi-level security. I actually haven't found or seen a use for this in day-to-day usage with SE Linux for protecting systems and I generally try to ignore it. However the multi-category security is very useful in the context of containers. So SE Linux in itself is probably about three separate security frameworks. You've got your multi-level security. The other one is known as type enforcement which is what is used to prevent the host system from being attacked by a container or one process attacking another and this is where you tag some files and the file system is belonging to Samba and others to Apache and you say Apache unless there's a rule to specifically allow Apache to talk to Samba that access is blocked. But the extra features that SE Linux adds to containers compared to APARMA is a multi-category security. So it can actually tag a container as belonging to the category say NSA or category ASIO and even if a file is copied from the NSA container into the ASIO container because it's tagged with the NSA it's going to block access. It's sort of like poly instantiation for security frameworks and it's just something that unfortunately APARMA doesn't have. It's a very very powerful feature. This is actually something that QMU or sorry Libvert uses as part of the framework. So every virtual machine gets its own security context. If you have a working Libvert setup then this should everything you need should be in place in terms of the policy. This does need a working SE Linux setup and after trying to get something like that set up as a live demo for this conference I realized it's not something you can do overnight and requires a bit of knowledge. So you'd have to put a bit of time into getting that working or get Libvert installed with SFIRT. A security context looks a bit like that. S0 corresponds to the unclassified and can generally be ignored and then you've got the tag C1 and C4 which might correspond to NSA or ASIO in this case. And you tag the files with just a Chacon command and then RunCon is what you use as your entry point. So you have RunCon switched to the appropriate security level and then invoke your standard in its system or entry point. So once again relatively easy to patch in but there's quite a bit of sort of work to initially get that up and running. Finally the Sec Comp. This is a great framework that's sort of like a firewall for syscalls. Unfortunately it's not something you can patch in after the fact. It really requires container support from your container framework. But there's a lot of syscalls can generally be divided into two classes. Device and systems management and what I'd call user space. So user space would be open read write create socket. And the management obviously a couple of calls that I consider management there. Adjusting the time for the clock set in s allows you to jump between namespaces which is effectively a container. You generally don't want that inside of container if you're trying to prevent access or jumping around between containers. Swap on and swap off that's really not the responsibility of the container that's but the the outer system that's running everything. So you generally wanted to disable that. But Sec Comp has the nice feature in that it's ultra ultra fine grain. You can actually filter on the arguments to the syscall itself. So in the case of a socket you can make it so that it only allows sockets of type ipv4 or ipv6 or unix to be opened. So you can actually grant much finer access to the to kernel resources than say the security frameworks presented on earlier pages. You do have to compile and load a policy and that's why you do need support from the container framework in that regard. So yeah as I said earlier the only one there that you really can't patch enough. The fact is Sec Comp it does give you quite a bit of extra security. There was a vulnerability for example in EchoNet back in 2010 where if you created a socket of type EchoNet and then did the right things with it you could gain root and Sec Comp would have allowed you to only allow create ipv4 or unix sockets and protect against that. But the rest of these security features can be patched in with a tiny bit of work. If you do want to secure an existing framework, sorry existing container solution, I highly recommend SE Linux as I said before. The more I've played with it the more I like but if our Palmer is available definitely go with it. If you're trying to assess the security of a container solution they might might bring in their own custom security stuff that is all good and nice however you generally want to support at least any two from that list just so that you know that they're using the extra features of the Linux kernel to help keep everything secure and these are all battle-tested technologies. If they're not using something like this there's avenues for attack that might not that can't necessarily be protected against from the the container framework. So that's generally it. I did promise a well wooden tour and I hope it gives you rough idea of what it can do to secure a container. Were there any questions? We've got a couple of minutes of questions if anyone's got questions. One of the obvious ones being that Docker sort of gives you root on the container is pretty much root on the host. How big a difference to the security do you think getting the user mode version of that finally out to the public will make? Can you run that second part by me again? In LXC natively you can run it completely as a non-privileged user. Yeah. Docker currently doesn't give you that. Do you feel like there's additional security in what Docker provides over LXC itself to make using Docker with user mode which is not possible yet and a worthwhile thing to pursue? As far as I'm aware Docker's position has always been you don't have you don't ever make or run anything as root inside the Docker container. I think that's rather valid for their use case however I think in LXC for example if you're trying to emulate it more like a full system rather than just predicting an application you are going to have to hand out root. There's no good answer here. It's always going to be a bit of a whack-a-mole. If you are going to have do full containers and grant root you are going to have to probably do at least three of those subsystems just because they all overlap to some degree and I consider that to be the minimum necessary to protect the system but yes you're always going to be vulnerable to root inside a container taking out the system because of something you haven't thought of and I can't see a good way structurally to prevent that. That's a new question. So you mentioned the SACOMP. Do you see any container implementations like Docker or Rocket or any of them having support for SACOMP or Capsicum and unboxing in place at this point or in future? Possibly and I hope so. LXC I would like to give a really really big shout out to because they've supported all of this for day one. They've basically done the security right and up until recently Docker was nothing as far as I'm aware was nothing but a very very thin wrapper around LXC with most of the security features turned off and they've now gone off and re-implemented lib containers which is great except they've thrown all the security out and are going to have to re-implement it. I would like to see them implement SACOMP. I think there is a lot of benefit to it. I mean there's always going to be privilege escalation bugs and if those privileged privilege escalation bugs are not a namespace aware they're going to get root inside the container and you still want to try and protect against that and that's where SACOMP really comes into its own. It can help limit the attack surface to get root and pivot to other containers and that sort of thing so I hope they do bring it in. It's not particularly difficult it's just I think they're going to have to find the time and get it done and it's kind of a shame that they abandon LXC in that regard. Okay please join me in thanking Jay for his talk.