 Okay. Hello, everyone. My name is Salman Basit and together with my colleague Philip Estes and some of the colleagues who could not make here, Stefan Berger and Dimitri Ospinorakis, we'll be giving an overview of Docker security. And Docker, as you know, is becoming very popular. Going into Magnum, people are building services around it, and security of Docker is a major concern. Hopefully, with this talk, you'll walk away with an understanding of what Docker security is about and how we can build a cloud service using Docker. All right, so good afternoon. First of all, I think I won't belabor the agenda because obviously we'll walk through it as we go, but we're going to briefly talk about what is Docker, and I hope to approach it from a slightly different angle that will help us kind of walk through the rest of the talk that'll make more sense for looking at threat models, how do you protect against them, how do we configure Docker and related components best to handle the kind of attacks that you might see, and then Salman will wrap it up with kind of some lessons learned that we've had. Before that, definitely we'd like to acknowledge the IBM Container Service Team, who have done a lot of work to run a production deployment in a public cloud. Obviously, we've worked hand-in-hand with the Docker community, with the OpenStack community, with obviously the Linux community. There's a lot of moving parts as we get into this of all these layers of software pieces that are part of having a secure way to run containers, so we will walk through that with you, but first of all, again, I think in most talks on containers at the conference, there's an opening question about who's heard of containers and everyone raises their hand. We all kind of generally understand what they are, but a minor pet peeve of mine is that the same kind of image has been used as long as I've been around Docker, and I guess as far as an introduction, I work in the upstream Docker community on behalf of IBM as a core maintainer in the Docker engine, and so I work closely with Docker Inc through our partnership with them and also with the broader community. But anyway, a lot of folks approach this initial, what is Docker with this image of you remove the hypervisor and we share these layers and there's your binaries and libraries and your app, and things can be smaller, they're not these gigabyte images, and those things are all very true. And definitely some of the space and cost savings in increased performance, definitely all those things are true, but I think a good foundation for what we want to talk about today is to see Docker as its actual isolation components that make up a container, and that's what makes it hard to really describe a container is it's not one unique thing, it's a set of isolation technologies, mostly residing in the Linux kernel, and if you want to know more, I know James Bottomley has a talk later this afternoon that's great on digging deep diving into those components, those isolation technologies that you find not just in Docker but in LXC and other container technologies. So really, you know, a container is composed of a system that can assemble those things in a model where it feels like you're in your own system and so in the Docker world, someone sitting down and typing Docker run Redis or Nginx feels like you're starting in some ways like a VM, you have a root file system and you can install tools in there, but in Docker's case, there's an engine that's actually in that time assembling that root file system, whether it's pulling an image from Docker Hub or from a private registry and assembling the C groups and name spaces and capabilities, et cetera, to create what you think of as a container. Now, we know Docker has grown beyond just that core engine that most people know about in the Docker Hub to include things like compose and swarm and machine and all these other advanced capabilities that are coming around even beyond Docker when we look at Kubernetes and Mesos, but this talk, we're going to focus looking at containers and container security specifically. Obviously, if you run Docker, there are various deployment models you might have, the simplest being I've got a host or a set of hosts that I control all those aspects of it. The code that I run in those containers on those hosts is unknown, maybe it's mine, I'm a single tenant and maybe it runs on bare metal or VM, but anyway, I'm sort of in control of that universe, whereas there's the multi-tenant model where I no longer am in control of all the containers running, there may be multiple tenants, the code running there may be unknown, they may be running on the same machine with virtual networks, maybe the Docker API is exposed to various tenants, and so this model is actually much more like a VM-based multi-tenant cloud, like a public cloud offering, and that's going to be our focus because that's what we're running in the IBM Container Service, that's where we have our experience of these challenges, and obviously the security challenge increases as we move toward this model of a multi-tenant, we don't know and trust all the things that we're running, so Salman is going to take it from there and talk about the threat models. So when you're running a public cloud using Docker containers, there are various types of threats that you have to guard against, and the first one being that if an innocent user is running containers on a machine in your public cloud, can it be attacked by some other malicious users running apps in containers on the same machine as this user? So what are some of the possible attacks? So let's say I'm running a container on a machine in a cloud, so can other containers see which containers I have started or what are the processes running inside this container? Then which files are used? Can other containers see which files are used by my container? I believe they should not be able to do that. A container is typically configured with a network stack, so if I'm using some specific networking IPs, virtual networking, can other containers see that network stack? What is the host name of my container? Can other containers see the host name? Can they set the host name? If my containers are doing IPC, nobody does IPC these days, but if containers were doing IPC, can other containers see IPC stands for inter-process communications? Can other containers see how my containers are communicating with some other containers I'm running on the same host or a crosshose if the IPC is implemented acrosshose? So these are examples of attacks that other containers running on the same machine can attack another container. So raw containers can also attack the physical host on which they are running. And such containers, these can be raw containers, but these can also be misconfigured containers. So what are some of the examples? So with containers is a root running inside a container, also a root user on the host. That's very important to understand because when you create a container unlike virtual machines, there is no separate kernel running. It's essentially a Linux process that's running. And so if a Linux user inside that kernel is root and if that's the same user on the host, there is potential for exploitation over there. Then are the CPU memory and network limits that are configured for this container, are they being obeyed? Is the container going to violate those limits? Can a container gain privileged capabilities? So a container can be configured, a Docker containers can be started with the so-called privileged capabilities by which they have access to the capabilities of a root user on the host. Are other limits obeyed such as the creating a number of child processes or a number of file descriptors? Containers have seen stuff in the wild and it's an issue across many different container platforms such as Cloud Foundry and Docker containers and Alexi's that misconfigured containers or malicious containers can create lots and lots of processes. And finally, can a container mount a denial or service attack on the host or can it try to mount the file systems of other containers? And finally, the third type of threat we want to talk about is the attacks from public internet. So I'm an innocent user. I deployed my container on the public cloud and I hope it to run but it can potentially be attacked from the public internet. And so the examples include scanning of ports, guessing passwords, the containers, doing denial of service. And this threat model is similar to what we have for a VM based cloud. So you're not going to talk about that and we'll focus mostly on the first two threat models, namely protecting containers, namely raw containers attacking other containers and raw containers attacking the host. So isolating from other containers, how can we make sure that we isolate one container from the other? So Docker relies on kernel name spaces for isolating containers from other containers. So with the kernel name spaces for process IDs, for mount, for network, for UTS and for IPC, other containers are unable to see which processes I'm running inside my containers. Other containers are also unable to see which files I'm using. They cannot see the network stack. They cannot see any IPC communication and they are unable to set or get the host name of my container. And another question that gets asked is, okay, so these are nice capabilities, but what about devices? If my host has various devices such as our DMA or others, can containers access them? So the devices have to be explicitly passed into the Docker container using the device option. But all of these things that are described here, they are moot if a user can create privileged containers on the host. And a privileged container is something that has full access to the capabilities of the host, similar to what a root user has. So then that's isolating from the host, isolating from other containers. And then there is this big issue of how do we isolate host from raw containers? And here, as you can see, there are a bunch of capabilities, mostly Linux kernel that are used to isolate raw containers from the host. They mean user namespaces, C groups, Linux capabilities, Linux security modules, SecComp, Docker API, and Docker engine and storage configuration. Now Phyllis is going to talk about user namespaces, which he actually implemented in Docker 1.9. All right. So yeah, as Solomon said, user namespaces are a, I guess I'll say a new feature in Docker. They actually aren't in the Docker main release yet. They're currently in the experimental release. Docker 1.9 is in release candidate form this week. And probably will come out generally next week. But the experimental build in the 1.9 release cycle has user namespace support. And probably, at least for our talk, the key benefit that we'd like to discuss here is that I'm now able to deprivalage this root inside the container from being root outside the container. In this example, if I run a container and I mount a directory that has files owned by the root of my host, now obviously hopefully you wouldn't do that. But we need to show some kind of, some kind of example where you'd have access to do a malicious attack on the host. And in this case, if I had access to a directory in today's model without user namespaces when I Docker run my container, if I have access to a directory, I could actually replace the shell with my own malicious shell where I could copy in a file from my container onto the host. But in this example, basically that's prevented because as you can see over here, if I'm running this container and I look at its PID and I look at the UID of the user running my shell in my container, which is what I've run by running busybox, I can see that that's not actually root. And I think I was going to demonstrate and maybe we'll save it for the end in case we have time, but since we have quite a few things to cover, maybe we'll skip it for now and see if we have time. But the important thing here to take away from user namespaces is that now when I run a container by default, the root, at least the container still has privileges as root, but it's not able to access files or any of the processes as root if it were able to break out of the container. No, you have to start the Docker daemon with a remap root turned on. Now, the Docker community hopes that that will become a default once we sort out a lot of issues with volumes and linking and there are other issues that come out of having unique root UIDs. But yes, that's the plan. Okay, so now we have these containers which due to kernel namespaces, they cannot see what I'm running inside my container. With user namespaces, my root inside a container is not a root inside the host. Now let's talk about resource isolation. If I'm running a container, those containers use resources. They use CPU, they use memory, they may use disk, they may use network. And if the appropriate resource limits are not defined, then malicious containers have the potential to cripple the host on which they are running. So in this example, this shows a machine with eight CPUs and there's a container running which has these two CPUs defined, a container is running and the container is started such that only the first two CPUs of the physical machine are allocated to this container with the CPU set command, as you can see. Then at the time of Docker container creation, one can also specify the CPU share a container will get. By default, each process that is started in Linux gets a share of 1024. So one can define these shares relative to that default value. So in other words, if I'm running one container with a default share and another container with a share of 512 and both are trying to consume the same amount of CPU, this container that is started using this command will be only able to use half of the CPU relative to the other container. So that's the CPU isolation. Then there is the memory and the swap. How does Docker make sure that a container is going to obey the memory limits? And first of all, here there is the way memory and swap are configured in a Docker container. So note that the syntax of the command here, that memory is configured as 2G and memory swap is configured as 2G. Swap, the way Docker command is defined, it really configures swap on top of this additional memory. So in this case, it is essentially using zero swap memory because the main memory that is defined for the container is 2G. If the swap was configured as 3 gigabytes, then a gigabyte swap will be configured for the Docker. And one of the problems with the memory isolation is that by default in Linux kernel, the memory and swap accounting is not enabled. So if that's not enabled, containers have the potential to consume more memory. So they can consume more memory and the out-of-memory killer will kick in and kill those containers. But there is the issue of the containers not being the memory limit. Then there is the disk. Containers writing data to disk. If you're running hundreds of containers on the same machine all trying to write data to their file systems, they can also potentially cripple the disk. So it's a good practice to use different disks for containers versus root file systems and moreover define limits on how much data containers can write to the disk. The block IO, which is part of C groups, it can be configured, but there are some challenges and problems in configuring and ensuring that the limits for this IO are appropriately defined. Finally, there is the network piece. With containers, one can use different types of networking using lip containers, using courier. One is using Neutron. For example, the traffic shaping capabilities which define limits on how much a virtual machine or how much traffic can be sent on a port is configured. By default, the network names, the C groups for networks are not defined inside Docker. So if you're going to use default Docker networking, you have to make sure that those limits are appropriately defined. So this slide summarizes what I was mentioning earlier. The key point being that among, besides the CPU, kernel, memory and disk, there's also the issue of processes containers trying to do fork bombs and there are many ways of controlling that and one possible way is through C groups. The C groups for process IDs or PIDs support does not exist in kernel yet, but it's coming soon in kernel 4.3. There are also some challenges in enforcing the data traffic for the block devices and some of these issues are being addressed in C group version V2, which has seen a lot of contributions from the Facebook engineering teams and with the real design interface and a new hierarchical organization, hopefully some of the problems that you've seen with C groups are going to go away and those can then be leveraged directly by the Docker for improved resource isolation. So that's on the resource side. Now let's talk about the capabilities. So again, going back to the story that containers cannot see what's running, they're isolated from the root, their resource is isolated, but they may still have all the capabilities for executing various system calls. Kernels by default share the kernel, sorry, containers by default share the kernel with the host and if they can execute various types of system calls, that can be exploited by a hacker to potentially gain access to the root. So to protect against that, the first line of defense is Linux capabilities and make sure that the containers and what Linux capabilities do is that in the old days, there used to be just two types of users, root and not root, and root will have all the capabilities. But starting I think in Linux kernel 2.2622 or somewhere nearby, the Linux capabilities were defined such that all the privileges of root user were broken down into a smaller set of capabilities. So what are those capabilities? For example, ability to load a kernel module. Can a Docker container load a kernel module? Can it mount file systems? Can it perform network administration operations? So Docker by default, there are about 37 capabilities and Docker drops a majority of them and the numbers are not important here. It depends on how powerful a capability here is. But in this example, so because the Docker drops the mount capability, the figure shows that a Docker container is unable to mount a file system, but it can perform the regular I operations using open and close. So Linux capabilities, so that's great, we can restrict how much capabilities are available to a process, but that's not enough. We would like to be able to use the Linux security modules such as AppArmor or SC Linux to restrict what is available for a Docker, what a Docker container can do. And a popular platform for running Docker containers is Ubuntu and on Ubuntu, AppArmor is the popular choice for a Linux security module. So in a nutshell, what does AppArmor do? So previously, as we saw with the capabilities that we can restrict which system calls or sets of system calls a container can execute. Now with AppArmor, one can define that within that system call such as opening a file, which of those files can actually be opened by a container. So we can say that a container can open in this example as a host, but it is denied access to the dev KMM. And Docker has a default AppArmor profile for containers, and it makes sure that a user that breaks out of a container cannot access the sensitive data on the host such as the files associated with Linux security modules or kernel memory, etc. But one can also choose to define a custom AppArmor profile for different types of containers that runs on the host. Okay, so isolated containers, limited capabilities, C groups for resource isolation, restrictions using AppArmor. One thing I did not mention when I spoke about capabilities is that sometimes capabilities are implemented using a series of system calls. In this case, there's an example here. The set UID capability has four system calls that can be used by a container. So there's a technology called SecComp that can be used to limit which system calls can are invoked by a Docker container. So we can restrict and isolate containers in that way. And lastly, I should mention that all of this work that we have done so far in securing our host, protecting our host from raw containers, that is all useless if a user can create privileged containers. Docker, the current Docker API allows a user to create privileged containers. And there is work in progress in the Docker community to create appropriate authentication and authorization for that. But until then, if you want to build a multi-tenant container cloud, you have to make sure that a user does not have the ability to create privileged containers or add capabilities or changing the Linux security module profiles associated with the container. So we'll talk about that. Sure. So just a step back for one second to SecComp. So everything we talked about in that section exists in Docker today. SecComp has been put into RunC. I don't know how many this week have heard about the OpenContainer initiative. LibContainer is sort of the underlying library that does a lot of the low-level work to start a Docker container. SecComp support has been added there, but it's not in Docker yet. So I think Docker 1.10 will probably have the SecComp capabilities. So we've talked about all those limitations we can place. Now we're going to shift gears a little bit and talk about the Docker engine itself and ways to isolate and configure it properly. So obviously using TLS for communication with the Docker engine, if you're opening up the API over a TCP socket, there is documentation on how to set up TLS certificates. There are various modes of whether you're validating with that or actually doing client authorization with that. And that's sort of not full-featured yet, but as Solomon said, there's actually two proposals right now to add a full authorization and authentication to the Docker daemon and client. But you can check out the current TLS support in the documentation. Obviously you want to set appropriate limits. Some of the things we've talked about, limiting opening huge amounts of file resources, a number of processes. And there's a good tool that the Docker community and Docker Inc. security team have been working on. The CIS benchmark, we have the link here and the Docker bench tool, which was announced at DockerCon this summer. And there's a lot of links in our slides and obviously some of you have been taking pictures, but we'll also post these on SlideShare so you can easily get to a lot of these resources later. But anyway, the Docker bench tool is very useful for running through a set of recommended security checks on your configuration. And then as many of you know, the Docker storage setup is configurable. There are many backend storage drivers available. A couple of our experiences consider using DeviceMapper, which is block versus file oriented. If it's possible in your environment, setting the default file system of containers to read only, which of course you can use volumes for data. If you need write access to areas, you can use volumes for that. And then something Solomon has worked on upstream. Bind mounted files in Docker have no quota. So again, that's another loophole for a malicious container to try and basically DOS your host. If you can make those read only, that limits that capability. And then also in that ecosystem we showed at the beginning, the Docker registry is a key component. Whether you're running a private registry, obviously Docker Hub is running the V2 implementation. But if you are using your own registry, the images that you're going to run are obviously going to be pulled from your registry. And currently the V1 registry is deprecated and will probably in a couple of Docker releases go out of actually the Docker engine will disallow access to V1 registry, mainly because of its original weaknesses. There are many blog posts on these topics. You can find yourself. I think we're running a little short on time, so I won't delve into what you can read here. But the V2 API and the implementation in Go were worked on earlier this year in the community. Docker 1.6 was the first engine release which officially supported the V2 API. The most important piece being that now all content is addressable with a strong cryptographic hash rather than the random layer IDs in V1. So now you can have safe distribution and signing and verification is now available. I don't know how many people have heard of Notary. Docker Content Trust is kind of the branding Docker as put around that, but the Notary tool now allows for full signing and verification through that and there's some great demos if you look back at the DockerCon July content on what that can prevent as far as malicious attacks on your registry data. And again there are also performance improvements because now these digest and the manifest that describe them allow for an easier path to pulling all the content for an image. So that's a quick overview of the engine configuration and registry. Then Solomon's going to go on to potential attacks. Right, so when you're running containers, as I mentioned, one common problem seen across different types of containers is fork palm. Containers, misconfigured or malicious can just try to create a lot of processes. So there has to be appropriate protection against that. And as I mentioned that with Linux kernel 4.3, limiting the number of processes that can be created using C groups, that support will be coming there. But until then, you have to use your own custom solution to protect against that. There are resource exhaustion attacks. So there are three types of files that are so-called bind mounted inside a container. So there are no resource file. There are no quotas defined on these files. And again, there are multiple solutions. One can make the whole file system read-only or have some watchdog demons to protect against that. And lastly, and this is perhaps very important that all of the good things we have done cannot protect a container from a user misconfiguration. So for example, if a user deploys a package Nginx or MySQL exposed on the internet, does not use strong passwords or does not follow the appropriate guidelines for configuring and deploying the application securely. Those applications running inside containers can get hacked over the internet. This is similar to what can also happen for applications deployed inside virtual machines. So to put it all together, starting from the top, it's important to restrict the Docker API calls that are out there and differentiate between certain operations that can be used to create privileged containers versus non-privileged containers. It's best as Philip was describing to use the Docker version 2 registry as it has support for a cryptographic hash of image layers. Docker uses kernel namespaces for isolating containers from other containers. You have to make sure that all containers actually do end up using kernel namespaces and cannot arbitrarily drop the use of this. C-groups are used for resource isolation and they will be improved with the changes in Linux kernel as well as in the new version of C-groups. There's already a limited set of capabilities that Docker uses and one can further restrict the capabilities available to Docker containers depending on the deployment model for a container offering. So one aspect about containers is that containers do share the kernel with the host. Unlike applications running inside VMs which have their own kernel and then they run on the host which has its own kernel, containers share the kernel with the host. But the important thing to keep in mind here is that because of the restrictions, because we limit the capabilities available to containers, because we can define Linux security module profiles for restricting file or capabilities access and because we can further restrict what can be used, which system calls can be invoked, one can greatly reduce the attack surface for containers. And obviously with username spaces with containers running as non-root on the host, that provides additional protection. So I mentioned about username spaces. One has to make sure that the Docker engine is appropriately configured. Some things we talked about earlier that appropriate Linux security modules are defined that one follows the host best practices for securing the host. And one can also consider using hardware-assisted verification and isolation and there has been some great work happening in the community on using TPM for verifying the host, verifying the images on the host, verifying the containers and there are several talks from our colleagues in Intel about that. But the important point is that whatever you do, make sure that you define appropriate security tests and you run them within your cloud environment before putting stuff into production so that all those limits and all those configurations we defined are indeed being followed. So here we have put together a series of links on these topics. Hopefully you find the talk useful and I guess at this point we will open up to questions. Yeah, I think we have two minutes. I guess the block, there is the block IO limits and then how the block IO limits are currently configured in the C groups for Docker. First of all, only the limits for direct IO are enforced. The buffered IO limits are not enforced. And for direct IO, some of the options that are available don't quite work well and they depend on the Linux kernel version. So there are some issues there. Yeah, that's why there's some C group improvement that's still coming that will hopefully improve some of those weaknesses that we mentioned. Yes. There are a lot of open stack questions. Some can be managed by Kubernetes management solutions. Some security can be contributed to Keystone or anything. Is there any integration work or even a tool that can do that? Yeah, I assume some of that will have to come through like Magnum, for example, where there's some integration about how a pod is deployed and what's going to configure those settings. I think we have folks involved. I'm not directly involved in Magnum, but we have some from our broader IBM team where some of the lessons we're learning hopefully get implemented there as well and that information is shared. That's not a perfect answer, but yes. I think there has to be that sharing of how do we configure these so that everyone's not doing things a different way and missing some key issues that may leak through if someone doesn't understand the best practice. I think we're over-limit. I've got this headset on that makes me feel like I should sell something, but this is free. If you do want a postcard, this is not related to security or Docker, but open source and open governance has been kind of a Keystone of IBM's involvement in OpenStack, Cloud Foundry, Docker. So this is a book that myself and a colleague wrote. It's an e-book form. If you want to download link, I've got postcards. If you don't, that's fine too. Yeah, two for one for the next five minutes. I will tweet the SlideShare link of the talk at estesp is my Twitter handle, and I'll throw in some hashtags to make it findable maybe. Thank you.