 Hi guys, my name is Rangit, I work for Red Hat and at Red Hat I basically do product support. So my session title is actually a question, what should be the pit one in a container? I hope after 15 minutes actually I actually don't leave you with more questions but some answers which you can take it forward and build on top of it. So please raise your hands if you have used dogged containers or if you're using it for production or if you have either tested it or started the containers or anything, most of you. So this is what I have for you, this is going to be very short, I'll be using the first few slides to apply my main point and then we will have a demo. So I'll be using a demo to actually show you what the problem you could face and why it is important for you to take control on the pit one and we'll be spending a lot of time on the demo and we'll be switching between the demo and the slides more often. So for this session I will be making use of dogged containers since that is the most popular right now. So how do you control which process becomes the pit one in a container? So you actually do a dogged one or when you do a dogged start, how do you control which process becomes the pit one? So in the dogged file we actually have two different ways, one is the entry point, the other is the CMD. Now to make it simple I will just use only the CMD option. So how do you decide which process becomes the pit one? So you actually mentioned the process which you want to become the pit one by mentioning it with the CMD directive. So there are three different ways how we can use this popular directive. So this is the most popular one where you actually mention the application in this CMD directive. So this is just a PS output from two different containers that are running on the system and if you look at it. So this is my first container where I'm actually starting the PPD process. You could see the pit one is actually the PPD process. And the second container is just an assemble output where your bash is the pit one and the top is actually the style of the pit one. So now with the help of the pit namespace, multiple containers, the process which are running in multiple containers can have the same pit numbers. But when you look at from the outside of the container from a host perspective, each process has a unique PID number. So now let's take a look at the demo. So for the demo I'm going to use a simple Python application. So this particular Python application has some known issues. So this particular example is just taken to highlight the problem. So I'm going to use two containers for it. So the only difference between this is for the container one, I'm calling my Python application directly. And for the second container, I'm actually calling a shell script. And within the shell script, I'm actually calling the same Python application. Just that I'm just going to change which process is going to become the pit one. So now let's take a look at the demo. And on the extreme right side, you actually see some lines which have been printed out. I will tell you why I have done so now. And in the demo, I'll show you a different container with all these lines probably written. So can you see the fold? What do I have written? Yes? Yeah. So this is my first container where I'm actually calling the Python application directly. And so I'm just going to start the particular container. So on my second tab, so this is the same system. So on the second tab, I'm just going to start my second container where I'm just calling the bash script first. And then within the bash script, I'm actually calling the Python application. So now let's try to access the particular application. So we'll use a simple curl command. Yeah, so now you can see that I am able to connect to the container. And also I'm getting an output saying it is the hello world. So for the same, I will use the same command, same curl, the same curl command for my second container also. It's just that it will have a different IP because the application which is running inside the container is the same. So you would see the same output, right? So now let's take a look at the PS output. Let's see how it looks like from a container perspective. Yeah, so this is good because as I said before, I'm calling the bash script first, and within the bash script, I'm actually calling the Python source. So now let's go to the first container. And let's do the same thing. Let's look at the PS output. So do you see a problem here? So this is the same application. It's just that in the first container, this is the first container. I'm calling that a particular application directly. In the second container, I'm using a bash script. And then from the bash script, I'm calling the particular process. But now whenever I try to access this particular application, so now I'm just going to show you the PS output again. So do you see an increase in number of different processes? So each time when I access my application, I'm getting a different process, or an often process. So at the same time, for the second container, where bash is my pit one, so when I try to access that particular container, actually I'm not seeing any problems here. There's a different process. So now why is different process a problem? So now let's take, for example, if you have multiple containers running on your system, the same instance of the same container running any system. And at one part of the time when the load of the system goes high, you would have different processes within each container. And what really happens is, now let's take a look at the PS output of the whole system. So when you take a look at the PS output of the whole system, you would actually see the actual pit number which is used by the process which is running inside the container. So here you can see here. This is the actual pit number that is consumed by the process which is running inside the container. So if you have multiple containers running in the host, and they all have the same sort of a problem, then you would end up consuming all the pit numbers which are available on the system. So by default, by default on a well-opened system of a pit number, you would actually have 32,768 pit numbers. So at any point of time, you can only have 32 processes or threads or 32,760 processes or threads. So if you have some containers that are actually leaking, they're often processes. And if the process are not being reaped, you would end up in such a situation where you will not be able to log into the system as well. Because to log in a system, it has to fork a new process. But since all the pit space is being used, you will not be able to log into the system. So the only way is to either re-put the system, or if you have some other way to reduce the number of ports which are running. So this is problem number one, that is process leaking, or take care of your orphan processes. So now if you run the same application on a bare metal system or on a virtual machine, you will not see this problem. Because on a bare metal system, the pit one is actually your system D or your init. So they actually have a special feature where they take care of process leaking. So they actually clear the orphan processes. So this is problem number one. Now what is problem number two? Now let's take a look at it. So now is bash the answer for the problem? So when you have bash as the pit one, you are not seeing any orphan processes. So is bash the answer for the problem? The answer is an S and a no. Because S, because it has the feature to leave orphan processes, why it's a no? Let's take a look at it. So what I'm going to do is I'm just going to stop this particular container, this bash container. So now what is going to happen is you would see this particular command getting paused for 10 seconds. It is actually paused for 10 seconds. So it is not hung. It is just waiting. So now it has come out. So it is able to stop the container. But it took 10 seconds. So now what is happening in the background? So docker stop actually sends, first it sends a sick term. That is, it is asking the container to do a graceful exit. But here, it first sends a sick term. It waits for, by default, 10 seconds. And then it actually sends a sick kill. So here, when the container gets a sick kill, it actually exits very quickly without closing all the stuff. So if you have a DB sort of an application running inside the container, and if you don't handle the signals properly, you would end up having a problem where your data is not consistent. Where the data from the cache is not flushed into the file system. So now, how do you find what is going wrong here? So let's say, for example, I'll start the container again. Let's find the process. So the process ID is 3960. So if you look at the Prog file system for an PID, you actually have a file on a status. That actually gives you a lot of information about the process. So here, we are just interested about sick catch. So sick catch is actually in an hex value. So we will have to convert it into a human readable way. So for that, we actually have a bash script. So now let's take a look at what is happening here. So now the bash script is actually registered only for two signals. One is sick. So one is sick int. So sick int is nothing but a control C sort of stuff. And then you would see why bash is able to reap the child process, because it actually also listens to sick child. So since the bash has this particular feature to reap, so you are actually able to find it using the sick child. So now is bash the answer for the problem? Is bash the answer for the problem? So we actually know bash is able to reap the child process, but it is not able to handle the signals properly. So now do we have an hack? So can we use bash as the answer for the question, what should be the PID one in a container? So the answer is yes, you could use it, but then you would have to rely more on some sort of a script. So now let's see how we could do it. How we could make bash even handle the sick signals properly. So this is what I said before. So here if you see, so what we could do is we could actually put the actual application in the background and then register for a signal using a trap. So if you register for a trap and you actually wait for a signal, then as int and term, so the moment it gets a term, it actually takes it and it passes the term signal to your child processor and then it exits. So it will do a graceful exit. So bash can be an answer for the problem, but do we have a better answer? Like what should be the PID one in a container? So you've seen this. So as I said before, if you run the same application on a whole system, it is actually taken care because your PID one is actually a system, it actually passes the stick trans properly, it can actually read the child processors. So now what about having a system D or a minimal in it inside a container? So now to solve these problems with forces repeating as well as with signal handling, it is still there. We actually have a couple of open source implementation known as attaining and dumping it. So this particular attaining or dumping it is so small that it can only deal with two things, that is, possess shipping and signal handling. So it comes out 128 kb. So you could have this particular binary in your container directory. And then you can actually call it using this particular way. This is how you do it. So you actually use it in your same directive, either attaining or dumping it, and then your actual application. So when you start that particular container, so this is how the tree would be. So your dumping it becomes the PID one, and your Python application becomes the child of it. So this particular minimal in it, they can actually handle two things properly. That is, often possessing and handling of signal. So now there's a thing that is coming with Docker 1.13 known as the init flag. So now this particular flag is actually a daemon option as well as you can actually pass it with the Docker 1 command. So what it does is, so the moment you start a container using Docker init, so now let's go to the demo. So this system actually has a Docker 1.13.1. So I'm just starting the container. The only difference here is I'm just adding this particular new flag. I'm actually adding the new flag known as init. So let's see what happens here. So what is happening is I'm using the same container, but since I've used the flag init, so what the Docker daemon does is it actually starts a minimal init. So that is what you see here, dev init. And then it starts the Python application as a child of it. So now this could be a very good answer for the problem, like how do you, so let's say if you're not the author of an application which you want to migrate to a container and you actually don't know how it behaves. So this could be a very easy way to do it. That is with Docker 1.13, you can actually pass it as a daemon flag so that every container that starts will have the bit one as a minimal init. Or you could control it per container by starting it with Docker run, by using it with Docker run. So we actually saw the minimal init, then we actually saw the Docker init. Then what about having systemd inside the container? So for some of us, yes, systemd could be heavy, but then there are some additional features which you get when you have systemd inside the container. So now with the latest version of Docker, you can actually have systemd running inside a container without using the privileged mode. So this is actually done with the help of OCI Hoops. So now what are the additional benefits? So now we actually saw that you can actually deal the problem with signal handling as well as child reaping using the minimal in its. But then why do you want to use systemd? So this systemd actually comes with some additional features which I think you should consider it. For example, let's say, for example, you're actually migrating an application that actually writes its log to the dev logs. Or the particular log is captured by your payments which are running in the whole system. So if you're planning to migrate any application to a container, right? So then what you'd have to do is you actually end up writing the logs either to the STD out or to the STD other. And there are multiple ways also by using log4j, you can actually send the logs to a remote system also. Yeah, so it actually differs. But when you're actually moving an application from a bare metal to a container, the things should be easy for you, right? So this is where the systemd actually helps you out. So now let's take a look at it. So I'm just starting a container which is actually gonna start the systemd. So now let's take a look at the output. So here you could see, so the bin in it is actually your systemd. So now what else, so what is the additional benefit that you're gonna get? So with the help of OCI hooks, the moment, so the moment when you start your container and your container, so if you see here, so here you're actually telling it to be SBIN in it. So if it is a Fedora 25 or Centro 7, SBIN in it is actually your systemd. So when you start a container, the moment you start the container, and if the Docker daemon with the help of the OCI hooks, it actually detects your pit one is actually your systemd, then it does some additional things. For example, it actually registers your container with the machine CTL, clear them. It will take 10 seconds for each container. Yeah, so here you can see, so right now I only have only one container which is running and that particular container, the pit one is actually used by a systemd. So now what I could do is I could check for the logs of the container by running this particular command within the host system itself, right? So now whatever is the output that your application does, that can be captured here using the general CTL. Now for example, let's say, let's say you actually use a very simple application like logger, so to the logger what it does is actually writes the logs to the devlogs and you can actually capture that on your host system. So now if I run this journal CTL command again, you're right, so this actually is a very good benefit if you're just migrating your application, which was actually running very well in your bare metal system to a container. I think, so I'll just open it up for questions. Yeah, please here. Yeah, you could do it, yeah, you could do it, but then what you're actually doing is you're actually migrating an application which works very well on your bare metal system to a container. Right now if you actually want to handle these things, you actually end up writing more, you actually make more changes to your application. So now let's say if the same application is running on your bare metal system, you actually may not do anything else. So if you're not the author, I'm saying if you're not the author of the application, it actually, yeah, so as I said before, I have used this particular, yeah, yeah, see, yeah, so that's what I'm saying, right? If you have an application bug and if you let that particular application bug to run inside a container without you knowing what is happening and you would come to know about the problem only at the last minute when you have multiple containers running, all those containers actually leaking through societies, you would end up having this forward. So now how do we handle these things in a better way? So you could have a minimal in it as a pit one so that you actually have some, you know, a confirmed way that application bugs can be handled. Yeah, because what will happen was, for example, you know, how I came across this myself was one of my customers was using Tomcat within a container, right, and all of a sudden, just days before the actual production rollup, we actually noticed this problem. So that particular application was running perfectly fine outside, it wasn't creating any zombie. Even if you run this application outside because it wasn't creating any zombie because we never knew what was happening whether it's an application bug or not. So this can be captured at a time of testing, but unfortunately, we actually found this at the last minute. So we actually had a tweak around. So that is how I came across this particular topic, what should be the pit one in a container? Yeah, so your questions are very valid, but then any other questions, so we just have one minute. So if it is just one container, that's fine. Let's say if you're using like hundreds of containers for the same instance. At one point of time, it becomes a problem, right? Then it becomes, so then you have to choose the right orchestration tool. Then you have to use a container native storages, right? So it is not that easy for you. If you have one container, you can use the host mount volume, host mount options. We have thousands of containers and this container can run on any nodes. So you have to make sure that particular host mount is there in all the nodes. So instead of using host mount, then you would start looking at NFS or Gluster or the container native storage, which is the popular Gluster. It becomes a little bit of a problem, but you can do it. You can actually do a volume. So yeah, from the host system, you actually send it to your symbol server. That is your ALK stack or maybe if you can, you can also use this plank. You can actually send the logs from the system deep to the plank server, to a remote server. So when it comes to Docker logging or container logging, you wouldn't keep it on your one particular host because you actually don't know whether the host will go down or not. So you actually want the logs up. So you actually send the logs to a remote server. So I think we are done. So thank you for your time. Hope it was helpful.