 Hey everyone. Welcome to the intro to Docker Talk. My name is Adrian. Last night I was socializing out on Rainy Street and when the night was over I got up my phone and I called an Uber and the Uber driver rolls up and I jump in and he says, so what do you do? I'm like, oh, I work on the software thing. He's like, oh, OpenStack? Like, oh, you know it? He says, yeah, I'm a software developer. He says, so what do you do in OpenStack? Well, I work on this project called Magnum. It's the OpenStack container service and it's a way of making it so that you can have containers available to your developers. He's like, oh, that's fascinating. So is it Docker? I say, yeah, it's Docker. And he says, can you explain to me the difference between a Docker container and a virtual machine? Because I really don't get it. And I'm riding in this Uber thinking to myself, you really should come to the conference tomorrow. Because we could probably answer that question. So I had about maybe 10 or 15 minutes talking to the Uber driver and I think I got the point across. So hopefully if I'm as successful today as I was last night, you'll all like this talk. So I work at Rackspace. Back in 2014, I assembled the OpenStack containers team, which is a group that started OpenStack Magnum. So where did I get my experience about containers? For those of you, how many of you are first time to summit? Wow, okay. Not many of you know that Rackspace is in the container business. We have a service called Karina. Karina is a Docker on metal service that we launched in October in Tokyo. If you want to try this, it's at getkarina.com. It's free to use. So everything that I show you today, you can use as a hosted service. You don't have to set up your own Docker environment if you don't want to. So let's get into it. So the word Docker can refer to the open source project. It can refer to Docker Inc, which is the sponsoring company that put that project together. The Docker engine refers to the software itself that makes it possible to use containers easily on Linux systems and soon on Windows platforms. If there's any Microsoft fans, I'll talk about that a little more in a bit. And then there's Docker Hub, which is a service that is hosted by Docker Inc that provides a centralized resource for storing container images and a few other features as well. So we'll see this a little bit more. So today, when I say Docker, I mean the Docker engine, which is the open source software. What is a container? If you ask that question today, chances are you're going to get a different answer from everyone you ask. So I would like you to do a line on this idea for what is a container. It is a combination of features in the operating system and a container image. Those features and that image combined can produce a running instance of an application that running instance of the application is referred to as the container. So it's Linux C groups, which I'll explain, Linux kernel name spaces, which I'll explain, and the Docker image. All combined equals the Docker container. So first, let's talk about C groups. This is a kernel feature. It's been around for eight years, seven years, something like that. Originally contributed by Google to the Linux kernel. It allows for you to create a construct that has a group of running processes that are all limited in terms of how much capacity of the system they can consume. So how much CPU can they use? How much IO can they do? To both disk devices and network devices? How much memory can they consume? And it allows an arrangement of these things that are in a nested hierarchy. So Linux C groups are how a container is not a noisy neighbor to other containers on the same host. It's the resource consumption control. Now let's contrast that to another feature that all containers, not just Docker containers, but all containers have this in common. They all use the C groups feature in the kernel, and they all use the namespaces feature in the kernel. Namespaces are about limiting your view of what is available on the system. So the most basic of which has been around for probably as long as I've been alive, CH root, which is also known as a mounts namespace. You call a system call called clone new NS, and what you get is a view of your file system. Let's say your file system starts up here. When you do CH root, you get another view of your file system that is somewhere lower, and that becomes your new root. Conceptually, that is how all namespaces work. It is just a reduced view of some resource in the kernel, and it applies to all sorts of things. So there's the mounts namespace that you would access with CH root. There's the UTS namespace, which is like when you ask what is the host name of this box, what the answer back from that question is. Interprocess communication facilities like semaphores and shared memory segments. Process ID lists. So when you're in a container and you ask the kernel for your process list, you're probably going to get a very, very, very short list of things. You're going to get whatever the first process you started in, maybe it's a shell, and the second process that you ran might be PS. And those are the only two things that are going to show up in your process list. You can also namespace networks. So when you just log into a Linux system, and you run a listing like an if config or an IP adder on the box, you're going to see a whole list of different network interfaces. When you're in a container, that might be reduced to one or two, versus every single one on the entire box. There's also a feature called username space where the process ID of the tasks that run inside the container, instead of mapping to the same IDs outside the container, can be mapped to alternate IDs. So user ID zero inside the container can map to user ID 1,000 on the host. So it would have a different priority level outside the context of that. And again, just like C groups, namespaces can also be nested. So you can have these in a hierarchy. Now, the third part of what a container, what a Docker container is. Now, when I first heard of Docker, my colleagues were like, oh, excited about it. And they were like, this is so awesome. Docker is the best thing ever. And I'm like, what is this Docker thing? And I go in and look at it, and I'm like, oh, it's kind of like CH root with namespaces and C groups. So that's been in the kernel forever. Why should I get excited about this? And I had a very kind of like, what's the big deal? My attitude was like, what's the big deal? I use all this stuff anyway. This is no big deal. But the big deal is this. It's the Docker image. This is the special thing that Docker contributed to containers that makes it so compelling. And it's this idea, first of all, when I say container image, don't think virtual machine image. Don't think virtual hard drive. It is not a representative block device with a file system on it. It's basically a tar file that has some additional metadata stapled onto it. And these things can be, again, related in a hierarchy. So you have this concept of a base image that would be like your Ubuntu environment or your CentOS environment or your Debian environment, right? And then you create a new image based on that, and I'll show you exactly how this is done, that is just the delta, just the changes. So if I say, okay, I'm going to start with this base operating system environment, the base image. And I create a new image of my own that just runs one command. And I change, say, one file on the system. And I produce an image based on that change. What's going to be in the Docker image is going to be that one file, not the whole system. So now I've got the ability to produce this entire system with only moving this one file around. So it's super small, super fast, and my container can start really, really quick. Because it's not moving the entire ocean around. It's just moving that little bit. Now, there's an arbitrary depth to these. I would argue that if you're going more than maybe three deep, you're probably doing it wrong. But you can have these very, very deep. There is a limit. It used to be a limit of 48. I'm sure it's higher than that now. But just remember that the benefits of container images really come from doing a smart arrangement of what is in common among all of your different applications, right? Let's say you have a whole bunch of Node.js apps, right? You might have your Node.js stack set up as a Docker container. And then your different apps are arranged in grandchild containers under those. So when you launch your apps, you're not launching the entire Node.js stack. You're just launching your bits that are part of your app. That's where the speed comes from. And that's why Docker containers start to be really exciting. Now, Docker images are stored in a network resource called a Docker registry. And how many of you use Git every day in your work? Okay, 90% of the room. If you already understand the semantics of Git, then you already understand how the Docker registry works. So it's got the concept of pull. It would work like clone, right? You get your own local copy of something. That's got a concept of push, which is after you've made a local commit to an image and you're gonna save it back up in the registry. And again, the Docker registry has this metadata concept of a parent, right? So there's the base images. Those are the ones that don't have any parents. Those are the ones that are gonna be in the registry to start with. And then there's gonna be the ones that you create that are gonna be descendants of those. So you can either start from a base image called scratch, which has exactly nothing inside of it. And you build it up entirely on your own. Or you start with one of the community provided base images like Ubuntu, or Debian, or CentOS, or Busybox, or Alpine, you pick. Okay, and you can just customize those environments. So in review, a container is a combination of three things. It's the C groups, plus the namespaces, plus the Docker image altogether that gives you a unit to run software, much like you would in a virtual machine, except all containers on the same host share the same kernel, which is different from a virtual machine where every single guest has its own kernel. Now, people think, okay, well, if every, if every guest, right, if every container running on the same host is sharing the same kernel, that must mean that it has to be the same Linux distro that is running on the host. So if I have a host that's running on a core OS, and I run a container on it, does the container have to be core OS? No. It can be anything I want. Right? So I can have a CentOS guest next to an Ubuntu guest running on a core OS host, all sharing the same kernel, but I've got these different environments. Why does that work? That works because all applications are using the syscall interface in order to interact with the kernel. And the syscall interface is a stable API. And it is actually compatible between the different user land for Linux. So if I have CentOS, right, it's got its user land, its libraries, when those calls from the software running on that container interact with the kernel, they're going through the same syscall interface regardless of it's Ubuntu or Red Hat or anything. So let's talk a little bit about the life cycle. Well, we can talk about where babies come from first. Any parents out there? Kids ask you where babies come from? I wish mine wouldn't ask me where babies come from. I wish they would ask me where containers come from. Because if they did that, I could tell them that Dockerfile is where they come from. So a Dockerfile should not be confused with a Docker image. The Docker image are where your bits go. It's the binary representation of your app. The Dockerfile is a set of instructions for creating a Docker image. So just like if you were building a C binary, and you were using make, you create a make file that describes how that is compiled and linked, and the output of running make is an a.out binary or whatever you name it. Think of the Dockerfile as exactly the same thing as a make file, except instead of C source code going in and a binary coming out, it's this set of instructions that are going to run, and the container image is what comes out the end. So all Dockerfiles extend from a base image, either scratch or one of the environments I mentioned before. When the Dockerfile is built, the end, you get a container image. And this is an imperative inner, is an imperative design, not a declarative design. So you're going to provide the exact instructions. They look like this. So this is a Dockerfile that creates a web server container. And it says, I'm going to start with the CentOS 6 environment, and I'm going to label it with some metadata that I'm the owner of it. The maintainer doesn't actually do anything. It's just a label so that when my container is running somewhere, if somebody introspects the container, it's got my name on it. It's like a tag. And it belongs both on the image and on the containers that are created from the image. Then there's a run command that says, during the build, you're going to execute this command. So I install Apache. An expose, which says, unless another port number is specified, port 80 of the what's running inside the container is going to be mapped to some port on the outside. And add says, take a file from the current directory called start.sh and install it in the container image at slash start.sh. So you take your Dockerfile, you put it in your source repository at the point where you want to run the Docker build. So you check out your Git code. You run Docker build. And it runs all of this during the build process. And the cmd command says, what is going to execute when the container actually runs? So it's different from run. Run only happens once during the build process. And cmd is what happens when the container starts. So two different things. So cmd can be overridden. You can start a container and provide an alternate command. And if you don't specify a command, it's going to run whatever is in the container image metadata, which is specified here. Now, I mentioned before that these Docker containers can be related to each other. Each one can have a parent. So let's say I save this, I go ahead and I run this and I save the resulting Docker image and I tag it with the name Adrian Apache server. I could then come along later and make another image based on that one. So I say from Adrian Apache server. So instead of starting from CentOS Bayer, I'm starting from the one that I just built. And I'm going to again label it with a maintainer. I'm going to install MySQL, expose port 3306, add another start.sh that has the commands in it for starting up MySQL and specify the cmd. So now I've got a grandchild image. So now we are to the life cycle. So kind of like people, containers have a life cycle. There's conception, which is building from a Docker file. That's where the image comes from. There's birth, which is a command, a combination of creation and start. So when you use Docker run, which you'll see in the demo in a minute, when you run a Docker container, it's actually creating the C groups, the namespaces, expanding the Docker image in place over the file system, and then starting that process up inside that new context. There's a concept of reproduction, right? Just like in Git, you can do a commit on the image, which saves the current state of the file system into a new container image, which then you can do another run on to produce another exact duplicate of something that you already have. There's the concept of sleeping. So you can use Docker stop, which would stop the processes in a container, or you can do Docker kill, which will eliminate the processes running in a container. So it might be nice to have that context around, but not actually be running anything. And if you do that, you're not actually consuming any memory, but you could later start that back up, that would be wake, right? So you could run Docker start on a container that has previously been running, and that thing's going to start just that quickly, right? Because it's not going to move any data, it's just going to be ready to go. There's death. You can do a Docker RM to remove the namespaces and C groups from the kernel, and you can do extinction, which is like take that container image away so that we're not going to ever create another container like this one. So you'd use RMI, remove the image. So the Docker command line client gives you the ability to run a bunch of different commands. If you run it with no arguments, it's going to give you a help output that looks something like this, except instead of being two columns, it's in one. But just to highlight a few of the interesting ones, right? Build I talked about. There's commit, which is saving it, exec. This one's kind of interesting. When you run virtual machines, your interface to that virtual machine is usually SSH. So you start your virtual machine, you can SSH into it, now you've got a shell, and you can take any action in there that you want. When you have containers, you don't have to run SSH in them because you can ask Docker to start any process in any container you want. So if you already have access to the host, there's no point in SSHing into the container from the host. You can just say start a shell in the context of that container for me. I'll demonstrate this as well. You can get a list of the images that are on the host. You can kill processes in a running container. You can list them, remove them, run them. So these bold ones are the ones you're going to use all the time. There's also commands here for like importing, like the contents of a tar file and making that into a container image, exporting things, all sorts of stuff. So let's look at it. So the first thing I always do when I get into a box running Docker is Docker PS. PS is going to give you the list of containers that are currently running. In this case, there are none. So let's run one. So I said Docker run, run is going to create the contexts in the kernel for you, right? And it's also going to start it. I need to give it a base image, or I need to give it any image name to run. It can be a local image. I like CentOS, so I'm going to pick that one. Oops. And then you give it a commander run. Now the flag I gave IT says run this interactively with the TTY. Sometimes when you want to run a process, you don't want it to have a TTY because you don't want it to act like an interactive process. So that's why it's a flag. So you notice here the host name changed from Docker to this thing. That means I'm in a UTS namespace now. And you notice here the kernel is actually a... Anybody recognize that as a CentOS kernel? Now if I go back out, I'm going to exit the shell now, right? You'll notice the kernel is the same, both in and outside the container, right? No containers running, right? Because I exited the process. But if I ask for dash A, I see all of the containers that have ever been running and haven't been removed yet, right? So I started that container. I exited the process, but I haven't deleted it yet, which means I can start it back up if I want it, right? Now do I have to type the whole container name? No, I only need to give it enough so that it recognizes it as unique, okay? So I don't have to be copying and pasting the whole thing. Attach just gives my terminal control back. And now you see I've got my container just like I left it. I also said a little latency here. When you look at the process list inside a container, it's limited, right? If I exit the container again, you see there's the whole process. Everybody's familiar with this, right? And then you get inside the container and all of a sudden I need to run it again. So let's say I want to get rid of this one, right? Docker PS-A, Docker RM, now it's gone. Docker PS. So you get the idea. That's how you start and stop containers. Now I also said I'd show you the difference between a container running the same operating environment as the host and a container running a different. So if I say Docker run, again, interactively as a terminal with a TTY, let's this time run a Buntu. Let's wait for this to catch up. So this gives you an idea of how Docker works. There are questions about this. If you have questions, please come to the microphones so that the live stream or the recording is going to have the audio of you asking the question so they know what you're asking about. And if this wakes back up, I can continue to do more and more demoing, but seems a bit slow. Yes, sir? I'll take you first. I know there's some discussion about whether or not you should run like some sort of in-it style process manager thing inside containers versus just the command you want. I was wondering if you have an opinion on that. Yeah, great question. So there's this concept of application containers where you're running just a component of your application, maybe just a, you know, maybe a single process or a couple of processes, and then you're going to break it up and run every different part of your application in a different container. That's one style of use. And there's another concept called a system container where you do run something similar to a knit or it can be as simple as a shell script that just starts a bunch of different things. And that's another style of running. So people that are new to containers tend to gravitate towards the system container style. And those who have used it for a while and kind of figured out how things work tend to gravitate towards the application container style. I like the application container style because it allows me to use a microservices application design, which makes it easier for me to upgrade and scale a large distributed application. But what I'm trying to do is just run something that exists and I don't want to refactor it. I might actually set it up as a system container where I am actually using a shell script. Like I showed you in my example, I used a Docker.sh or a start.sh. That's just a shell script that starts, runs a couple of commands, right? It starts and then it just goes to sleep. So that's what I would do for like a legacy app. Hi. Docker is a nice idea to limiting resources between the containers or VMs which we are running inside. But how do we address the security issues inside that? Because you use the same code. Security between two containers running on the same host, right? Yeah. Okay. So virtual machines, like I was explaining to my Uber driver, right? Virtual machines have this hardware virtualization interface between them. So the kernels that are running in a virtual machine believe they're running on a machine, but they're not. They're running through this software that is pretending to be a machine. And that interface is a very simple narrow interface. The hardware interface is actually not very complicated. And that's why it's possible for hypervisors to provide a high degree of confidence that those neighboring workloads will remain isolated from each other. It's relatively easy to secure an interface that is simple. Now containers don't have a hardware interface between them. They're sharing the same kernel. And so there's a risk, just like there is with neighboring virtual machines, right? If you find a bug in the hypervisor and can exploit that bug, it is possible to jump across the context into another neighboring virtual machine. It's hard to do, but it's possible. And that same risk applies to containers with one key difference. Containers that are running on the same host only have the Linux syscall interface between them, which by comparison is a much more complicated interface. 480-something different system calls of complexity that is arguably more than what you're going to find in the hardware interfaces. So you need a much more sophisticated access control policy in order to isolate containers from each other. So the good news is there are a bunch of features in the Linux kernel that are designed to do exactly this. The most basic of which is the things that are activated by things like app armor and SE Linux, which allow you to produce a policy, a mandatory access control policy, which says your applications are not allowed to interact with any kernel resources unless they're in the policy. So if you're using that judiciously, that's like the first line of defense. But there's also like beyond the scope of this talk, like 19 other Linux kernel features that you can also employ for isolating security. The challenge is applying those features in a way that doesn't limit your application so that it breaks. So how do you make the system generally useful and secure like a hypervisor would be? So I would argue that if you're really good at matching your workloads to these features, then you can get security that approaches what you enjoy in a hypervisor environment with neighboring containers. But if somebody from a purity perspective, if somebody asks you what is more secure, neighboring virtual machines or neighboring containers, neighboring virtual machines are by nature more secure because of this attack surface issue. Now, that doesn't mean you can't run containers in a VM over here and containers in a VM over there. You can do that, right? But containers on the same kernel together are not as isolated. So it means that if I'm a root, I have all the resources between like if I have like say 3, 4, 4 containers inside it and if I'm a root and I can access all the information, any information from any containers, right? Not necessarily. Being root in a container is actually an illusion because when you start the container, so the thing that allows you to be root in a Linux system is a couple of things. One, most people understand is the user ID is zero, right? Another is you have what are called Linux capabilities and there's an important one called Capsis Admin that allows you to do things like mounting files, systems, like the true like root stuff that you do, that capability allows you to do. Now, when you start a Docker container unless you run it with a privileged flag, it drops those capabilities. So the root that you are inside of your container is actually not the same as the root outside the container and you can, like I said, you can have SE Linux policies tied to the, tied to the container as well and other features that you can turn on that make the appearance of root within the container actually less a lower level of access than you would enjoy outside in the root namespace. Thank you. You're welcome. Two questions. Which file system does the Docker have access to? I guess the Docker container that's running. Which file system does it have access to and does each container have access to different file systems? And then also you talked about inheritance between, I guess you call it Docker images. Hold on. Let me answer the first question first. I can only remember so much at a time. When you start the container, it's going to produce a new root. So it's like running CH root. And it's going to create that new root under, by default, under varlib Docker. So whatever file system varlib Docker is placed on is what type of a file system you're going to end up with. It's just going to be a limited view of that. Now you might be using a feature to create volumes using LVM. Or you might be using a variety of different file systems. You can choose which one you want. There's also a pluggable storage driver in Docker. So you can choose which like do you want to use it with AUFS, do you want to use it with DevMapper which is like the same thing that LVM is based on top of. And there are others as well. But each file system has pros and cons, different features that it works with, different quirks about them. So not one solution works for all cases. So it really depends on the unique needs of your app, which one is the best one. So you said that under varlib Docker. Varlib Docker. So does that mean that every Docker image has access to the same file system as the other? Or do you have specific file systems that you see it root into for each Docker? No, it's going to be a, it's a multi-layered approach and only the top layer is writable. So if you can think of it like a merge file system, where you're going to get whatever the base image is, okay, as a read-only view. And then you've got a writable layer on top of that read-only view. So you can make changes, but those changes are copy on right. So they're only changing within the context of your own container file system. Neighboring containers will not see those changes unless you've done something to circumvent that. So there is a feature in Docker called bind-bounding, where you can do dash v and a mapping of a host file system path and the container file system path. So if you want multiple containers to share the same file system, it's possible to do that. And the other question was about the inheritance between the Docker images, I guess it is, I still don't have a good grasp on the terminology. And you said that they can inherit from each other. Yes. What happens to the commands? The run command, does the one from the parent get executed and does the command command from the parent get executed? Well, the run commands run during build time. And once your build is done, you have the container image. And then it's fixed. It's just whatever the state is on the file system at the time the image was built is what's saved in the container image, right? So it's the output of the run command. But it's not going to re-execute it each time you run. The only thing that's going to execute each time you run the container is what's in the CMD line of the Docker file. And is it only the one of the actual image that you're running that gets executed, not the one of the parent? Right. Only one. Okay. And the last question, the run command affects the environment or the files. Like in your case, you were installing something. You had the yum command. Is that being installed in the container image or is that being installed in the host? In the container image. Okay. Thank you. Yeah. The host is left untouched. Over here? A quick question. This is where it's starting to blur a little bit for me. So you're trying to use an Ubuntu container on a CentOS base, right? Exactly. So what is the advantage? Why would you do that? Because is it the CentOS kernel that's going to run? And so if you could talk about that. Great. Yeah. And this is a very confusing issue. And this is something almost all Docker newcomers get wrong. They assume that the environment that you're running is tightly coupled to the kernel, but it's not. Okay. And so what I was trying to demo here, my latency is a little too bad to actually show it. But when I say what kind of host is this, when I run uname and I say what kind of kernel, it's the same in every container regardless whether I started in Ubuntu or a Debian or a CentOS or an Alpine, they're all going to look exactly the same. And they all have the same syscall interface, but they all have different files installed, right? They're going to have a different lib directory. Lib64 is going to be different. So a base container image is, like an entire system, except the kernel files have been removed. That's what it is. So yeah, it works. And it's surprisingly simple once you understand how it works. And your first reaction is that shouldn't work, but it does. Yeah. Here? You contrasted the system container versus, you know, having multiple components distributed across containers. So can you compare, like, a virtual machine running a single application with a system container which has got the unit and the whole package? How would you, how dissimilar are they? Okay, so the key difference, you know, just notwithstanding the security difference, the security isolation difference, from a resource perspective, when you're running a virtual machine, all of the resources you need to run that machine are allocated at start time and can't be shared with other neighboring virtual machines unless you're doing something exotic. Okay? So it's provision time capacity versus when you start things in a container, both storage consumption and memory consumption are only what you're actually using. So instead of preallocating memory, it's just got a limit to the most it can use. So if I've got a bunch of things that use, say, 50% of the memory on the box and I run them all in virtual machines, I'm going to have a 50% waste in terms of resources. Whereas if I run that same workload in containers, I won't have that waste, right? That free memory will still be free. Now arguably, I might be able to over-commit. I might be able to create more containers than I otherwise would be able to create virtual machines. And as long as you're good at managing to make sure that you're not going to end up in a situation where you're going to get memory allocation errors because you don't actually have enough free, that's an issue with containers. But the truth is, it's much more efficient because you're only paying for what you use, so to speak. The containers are much more efficient. Containers are more efficient from a especially from a memory perspective, and it's also true from a storage perspective. Again, unless you're doing something exotic, there are exceptions to this, right? If you're using storage on your VM system, it can be as compelling as this. If you're using like memory ballooning on your hypervisors or using shared kernel, or KSMs in the kernel, that could be an exception. But in general, the way that people usually use virtual machines and the way that people usually use containers, containers are going to give you a much better utilization of the equipment. Now there's that. And then there's also the utilization of the system, right? I mean, has that hardware virtualization layer, there is some tax that you're paying in performance to translate software calls into hardware calls into hardware calls again, right? Versus when you're running a container, in some use cases, you might get 60% better performance simply because you're not doing that that virtualization in direction. Thank you. You're welcome. So if I have a couple of Docker containers, or three or four of them and running on a machine, what type of inter-process communication do you normally use? Do you just open sockets or what do you usually use? It depends on what kind of isolation you want between those running containers. It is possible to share the IPC namespace. So you can do dash dash IPC host, which says these containers are actually not going to have their own unique IPC namespace. They're all going to be in the same IPC namespace together. So you can create a semaphore or you can create a shared memory segment and they're able to do that intentionally. That's one way. A more common way is they communicate over TCP IP using message queues or those sort of things. That's how most people do it. Thank you. You're welcome. All right. I know there's more questions. Very quick, very quick. First of all, great presentation. Thank you very much. What would be an equivalent of a V-motion if I were to go with the Docker containers? If I have multiple hosts? It would be a stop on one host and Docker start on another. It will be manual. Thank you. All right. Thank you, everyone, for attending.