 So before I start, I'll just quickly show you a pencil. Who have used Docker before? So not many people have used Docker before. So OK, that's OK. So most of the time, you will have been running applications of what they call bare metal, which is basically you install your dependencies and your application. And then you run it directly off your OS. So the good points about this is that you get the best performance you can get based off your machine. But the only problem is dependency health. If you want to move this application to, say, another machine, you have to install tons of things. And if you have very complicated dependency, it makes things worse. So people can have the idea of virtual machine, which basically you emulate a hardware layer, and you run a lot of OS on top of your OS. So it's kind of nicer because you can move your whole virtual machine to another machine. But the only problem is you suffer in terms of performance. So now we have a new, not really new, but new thing called containers, where you can run your applications and dependencies sandboxed together, separate from your OS. And you don't lose any performance. It's all as good as your bare metal performance. How do we do this? It's basically using some of the features of a Linux kernel. So there are two containers technology I will talk about, namely Lexi and Docker, which is the main point of this talk. Lexi is, OK, before I do that, let me talk about CHroot first, which is the original idea to want to virtualize your runtime environment. CHroot basically changes so you can CHroot a process, and then the process will think your root directly somewhere else instead of your actual root. So let me show you an example. So before you can even run CHroot, you need to prepare a bunch of stuff. For example, I want to run bash at the ls command inside a CHroot jail. So before I can do that, I need to copy the bash binary and all these dependent libraries into a root into a jail directory, which I've created at the top, which is home slash jail. As you can see, it's not very user-friendly. And then after that, I can finally CHroot. So I run sudo chroot to my jail directory, and I run bash. As you can see, if you do a PWD, it thinks you're at root. And if you do an ls, it's only all these things and nothing else. But it's very easy to tell you in a CHroot because if you do an echo of your current PID, some rather big number, and you can actually escape, you can jailbreak out of CHroot relatively easily with this C program. So how it works is first, it needs to set the UID to 0, which is the root user. And then, this is basically how it works. You have to create a temp directory, and then you seed it into the temp directory. I CHroot into the temp directory. And before that, you open a handle to your original root because when you CHroot, file descriptors are not closed. So you can just jump out to the previous root. And at this point, you've already escaped from the jail. And if you scroll down the code, I don't know what happens up. If you scroll down further, you just go up or up all the way to your root. And then you can do RMRF. And that's the end of it. So I think if I play this here, you can see this in action. I hope you can oh, goodness, it can't really read. Can you see the text? So anyway, this is the jail directory that I've created. And now I CHroot. And then if you do else AL, it's just the root stuff. And you do PWD, it is that root. I will run that C program that I showed you just now. And then it's still at root, but it's now the actual root of my system. It's no longer the jail of the process in. See, all the mindfulness kernel, everything's there. So this is the beginning of the first step into virtualization. So now I'm ready to talk about Lexi. So Lexi is basically, you can think of it as a virtual machine, but without emulating another hardware layer. And then you can install your dependencies, like whatever binary you need. And then you write applications. And it behaves as though it's another virtual machine, but it doesn't need to emulate a hardware layer. How does it do that? It makes use of all these Linux counter features, which today I'll be talking about namespaces and control groups, the rest, and CHroot, which I've said something about before. The only problem with Lexi is that it's quite difficult to use the documentation. It's rather terrible if you go to the Lexi website. And it's because it's developed by Canonica. So it only works with Ubuntu first. And if you're using Red Hat, good luck. You have to figure it out for yourself. Which is why it hasn't been as popular as Docker, which more people have heard about. So first I'll talk about namespaces. There's a bunch of links that you can read out. Namespaces is a kernel feature where it allows you to place your processes in another namespace where you get a copy of all these things that the kernel provides, like UTS, all these. I won't go too much into details about that because it's quite dry. But I'll show you, I'll demonstrate primarily the PID namespace and the mount namespace. So PID is processes and mount is like mount points file systems. So yes. Yeah, also like to find out what namespace your process is in, you can just do a LSAO for slash procs, slash self, slash NS. You can replace self with any PID and then you can see all this. These are all the namespaces that this process belongs to. A namespace is just a number, an ID. It doesn't have a name. So it's a nameless namespace of sorts. Yeah. So here's a demonstration for the PID namespace. So at the start, I'll just show my own PID which is some rather big number. And then if I do a tree of the proc self NS, you can see all the various namespace that this process belongs to. And then there's a small C program that just spawns a new process with a new PID namespace and then run the best shell after that. And you have to do it with pseudo because you need the sys admin capability which a root user will have. So as you can see, this is the PID of the parent and this is the PID of a child from the perspective of the parent. But from the perspective of a child, it's now PID one because it has its own set of PIDs now. It's in another namespace. So if you do a tree of proc, slash self slash namespace, you can see all this part. You can't really tell what the difference is. So I'm going to div them all up and the PID is one. So I got to div the tree between the parent and the child. You can look, you can see the div there. They now belong to different PID namespaces. The rest of the namespaces are the same. But something curious is going to happen. I'm going to run top which display the processes running on my machine. How come the child can still see all the processes running on the machine, even though it's in another PID namespace? The answer is simple because top reads from the slash probe mount point to look for all the processes that are running on your machine. And because we haven't created another mount namespace for this process, therefore we can still see all the processes running on your machine. So in the next demonstration, it's going to start a batch shell in a new PID and a mount namespace. So yeah, this is the top from parent. You can see all these things running. And then I'm going to run another C program to start it in a different namespace. As you can see, the PID is now one and I run top. They can only see the batch shell and top and nothing else because I've mounted in another mount namespace. So yeah, that's about, that's it for namespace for me. If I have time, I'll talk more about the different namespace times but there's quite a few. I don't think we have enough time now. Yeah, so the next thing I'm going to talk about is control group or C groups. So throughout the history of UNIX and there's been like a desire by OS developers to be able to classify processes based on hierarchy like process one has a child process two, et cetera, and by organizational purposes like this is a shell process, this is a demon process, et cetera. And as you can tell from this already, process can belong to more than one hierarchy. So how do we do that easily? So the Linux kernel developer scheme of the concept of C groups which allows you to classify, not classify, but put processes in multiple hierarchies. So a C group is also known as a hierarchy. They use these two terms interchangeably and it's primarily used by the Linux kernel subsystems which I will talk about later on. So what's the difference between a process and a C group? In a Linux OS, when you boot up your machine, there will be a process called the init process which is started by the kernel and most commonly nowadays, we use something called system D which is rather controversial if you look it up. So system D will have the PID number one and it has a very special responsibility. It is responsible for spawning everything else your Linux kernel runs and it also has the additional responsibility of grouping processes. So why is grouping a process? So when a process spawns a child, and the parent process dies, the kernel will automatically re-parent the child process to PID one. And then the PID one now has the responsibility of ensuring that the child process dies normally. So in a Linux, a Linux world, before a process can die, the parent process must wait for its return code and some other statistic. Otherwise you will live on in the kernel as a zombie process. So if you run, if you do ps, grab, set, you will see some zombie processes hanging around. Those processes haven't been reclaimed by your init process. So yeah, so that's the special responsibility of the init process. This will become important later on when I talk about Docker, who? What happened? Yeah, so a process can only have one parent but a process can be multiple C groups or hierarchies and both are hierarchical in nature. So how the C group works by a virtual file system which is mounted by system D at this part and everything like adding a process to a C group, removing a C group, creating a C group is done by creating a folder in this mount point or editing a file. Yeah, and because this is a virtual file system is, well this virtual file system is lost upon a reboot. So on my Ubuntu 16.04 machine, if you run this command, you can see all the C groups that I have on my machine. So these, most of them are created by the sub, Linux sub system, except for this one which is created by the system D init process. And if you look at the system D C group, you can see all these files. The red bolded text are folders, the rest are all files, are hierarchies. So these files, the hierarchies are basically child C groups because a C group can have children as well. And the task file is just a text file with a list of PID that belongs to it. And the C group is a list of thread group PID. So the difference between a PID and a thread group PID is that in Linux, every thread has a PID. So yeah, and then a thread group ID just says like this process parents, is this PID? And then the rest of the files are probably specific to the sub system and generally not interesting. So yeah, you can see the type of subsystems you have on your machine by running this command. So you can see, these are all the subsystems of my machine. Yeah, I won't go too much into the details of the subsystems because they are quite dry. Yeah, and you can see the C group and subsystem for a process by running this command. So yeah, in Linux 4.4, all these bunch of subsystems, each subsystem will create a C group and then you'll place processes inside the C group based on its hierarchy. So some subsystems don't really do much. They just collect data like CPU accounting. It just collects how much CPU time you have spent. And some subsystems impose controls like how much CPU time you can use, how much memory you can allocate. And some subsystems do freezing and thawing of processes like the freezer subsystem. Some subsystems make use of the entire hierarchy of the C group, some they just don't bother. So C group is basically just a generalized hierarchy of processes that subsystems may or may not want to use feature. Yeah, so if you've seen a previous slide, I say V1. So there's a V2 now. I don't think any, at least Ubuntu is not using V2 at the moment. So because as you see, V1 has many, many C groups for many, many subsystems. So there's a wish for them to unify all that over that into one single hierarchy. So you can only have one hierarchy, which is what V2 is supposed to do. But there is not much use for it yet. So all these, these two features of the Linux kernel are basically what powers Docker. It allows them to run processes in its own little sandbox and doesn't allow you to affect the rest of your OS. The difference between Docker and Lexi is that Lexi is supposed to run sort of like a version machine without the hardware emulation overhead, whereas Docker is supposed to let you run one single application inside a container. So also by default, Docker doesn't do storage persistence unless you change some commands. So here's a diagram about the key differences between Lexi and Docker. As you can see, Lexi is designed for running a VM without the hardware emulation whereas Docker is meant to run applications. So Docker used to use Lexi as its containerizing driver of sorts, but in recent times, the authors of Docker have abstracted this onto something called LIT container, which still uses Lexi on Linux to run its containers. But the hope is that in the future, if someone wants to run Docker or Windows, someone can write an execution driver for that. So these are the features of the Linux kernel that Lexi uses. So how does Docker deal with data? It uses a bunch of layers that are all read-only and then it stacks them up together to give you a unified view of your file system. So when you run a container, it creates a thin container area that is read and write for you to change your data, which is then thrown away when the container is removed and below the container is a read-only image like this Ubuntu 1504 image, which all the layers are read-only and they are addressed by cryptographic content passions. So as you can see, the system works with a copy-and-write algorithm. So if you want to modify a file, you copy a version of that file all the way to the top layer and you modify it there. So if you want to create a layer based off the Ubuntu-based image and you change some files, only the changed files will exist in this top layer, whereas the rest of it unchanged. It's basically reusing the same layers. So Docker uses a bunch of, allows you to use a bunch of different storage drivers, which you can choose based off this table, but today I'm going to talk about the AUFS driver. AUFS stands for Advanced Multi-Layered Unification File System. I think it used to stand for another Union file system, but now they thought it's not cool enough. So AUFS was the first driver that Docker used. So you can see that it basically maps one-to-one with the Docker image layering feature. So at the top of it, you access the AUFS system using something called Union Malfoying, which shows you a unified view of all these different layers. AUFS calls these layers branches. So yeah, these are all, each branch basically corresponds to a directory on your actual file system. So what happens when you want to modify a file? Say I want to modify file for at this layer. So basically I just copy it up and then I modify it. What if I want to modify it at the container layer? I just copy it up again and I modify it. And the AUFS file system will just go down from the top layer to the bottom and find the first file that it sees, and that there will be the version of the file you see. The only problem is, what if I delete a file? So AUFS uses something called Wipe Out File to say that this file is no longer existing. So like the container wants to delete this file, just puts out a Wipe Out File and then AUFS will report that the file is no longer there. So the only problem with Docker is that it makes running some applications needlessly complicated. And the first one is the zombie reaping issue because by the way, well, the idea for Docker is that you want one single process inside your container. But as we know that there are many processes born of the child processes like Nginx, for example. And if the parent process that is born of receives a zombie child, it probably doesn't know that it needs to reap it. Otherwise, the zombie child will linger around in the kernel as a zombie and it will never be reclaimed by the kernel because the kernel doesn't do it automatically. And so the good guys at Fusion has come up with this base image Docker which helps you deal with it with 100 line plus Python script. And I'm going to demonstrate later about how complicated, how many containers you need to spin up just to run a simple Rails application with Nginx reverse proxy and SSL certificate. Yeah. So you expect complicated why still use Docker? Yeah, the problem is, the whole is Docker has been quite marketed heavily and it's generally, it's definitely nicer to use Docker to contain all your dependencies rather than installing dependencies manually on your machines. It's definitely nicer. It's once that better. It's just that it's still quite complicated. It's quite painful to use. So ideally I would like to use Lexi, but Lexi is even worse than Docker to use because the documentation is terrible and the command lines are not that great as well. So we solved one problem but created another problem. Yeah, I know. Okay. Yeah, so you can see how complicated the Docker run command is. It's not even showing everything yet because... Yeah, so you see the help is so much. So yeah, it's not very fun. So I'm going to demo how to run a very quick Ruby on Rails application inside Docker container with Nginx reverse proxy and a let's include TLS certificate. So yeah, make a simple Rails application using Postgres database. And then I write the Docker file which will build that application inside an image. I can go through this later on if you can ask me in person, but that's the file. And then the first step, I need to create a network for my Ruby on Rails application and I need to build the Rails application. Then I need to start the containers using all these things. So what happens if you want to really pull this on another computer? You're not gonna write a shell script to do this, are you? Because it's kind of painful. So here comes Docker Compose. So Docker Compose allows you to write all those things that are written just now in one single email file. And it also allows you to declare a dependency tree between your container so that it will start container one before container two. Like you start a DB before you start a Rails app. The, yeah. And then you can just start the containers with the single Docker Compose app. So everything I'll show you just now can be written in this gmail file. What's next? Okay, this diagram is how I'm gonna set up the NGX Reversed Proxy with the Rails application and a Let's Encrypt Certificate. So I've shown you those two containers just now, a Postgres database and a Rails container. These two containers live inside this virtual private network which I've created. Let us just call it Rails. And then I have a separate setup which will live under the NGX network, private network on your machine. These are all on your host machine. So I need to start three containers. First is the NGX container. And I need to mount all these directories inside the NGX container onto your host as a volume so that other containers can access it as well. And then I need something called a NGX Generator Container. So this container is some script that someone wrote that will, okay, wait, before I do that. So Docker, the Docker demon runs, provides a REST API mounted on the Docker socket which is this path. It's a unique socket that the Docker client which is the Docker command it's not just now will talk to interact with Docker demo and machine. So it provides a REST API. You can even write your own Docker client if you want but the default Docker client will talk to this socket. So this NGX Gen container will listen on the Docker socket for new containers joining, they are being run on your host to see, oh, this container says I want to be reverse profit by NGX. When it sees a new container like that, it will write the various configuration files for NGX and then it will signal the NGX container to restart itself to use the new configuration. So similarly, we have a Let's Encrypt bot that runs here that will listen for new containers to say I want a new Let's Encrypt certificate. Then you will generate the certificate and you handle all the protocol challenges that NGX Encrypt issues it and then you will write the configuration file, you'll save the certificate there and then you will signal NGX to restart itself. So that's this entire setup about how you can do it. And yeah, of course this NGX container needs to expose the pod or pods to the rest of the world. As you can see from this diagram I know none of these containers are accessible from the internet. Only the NGX container is accessible from the internet because they are all on their host's private network. So yeah, to set up the NGX containers, you need to use this pretty long double-composed file and I'm gonna show you how it works now. So the first demo is just the Rails application running of the Rails server without any reverse proxy using double-composed. So this is the double-composed file, which I'll show you just now. And this is gonna do double-composed up with some extra flex. And I'm gonna use a text-based browser to show you that it works. But notice that I still have to use pod 3000 because it's a, yeah, that's it. That's the first one, simple. This is the second demo I'm gonna show you running the Rails application via reverse proxy, NGX reverse proxy. So yeah, the difference, oh, wait, oh yeah, this is just to show you the NGX setup that I've done just now and that it's the same file that I've shown you just now. And then I'm gonna just show you that it's actually running right now using this component. Yeah, and then I have to use a slightly modified compose file. Actually, I removed this time, but it's okay. So to tell the NGX Gen container that this container wants to be reverse-proxy by you, I simply add this environment variable to the container and the rest of the stuff are the same. Oh yeah, I also have to ask it to join the NGX network and then I just run this and I'm gonna open the text-based browser again to show you that it works. This time round without the 3000 in the back. Yeah, it works. Okay, the next one, which is what I promised in the title, which is to run the Rails application with let's include. So this time round, I have to add new stuff to the double compose file again. Yeah, which is these three new environment variables. This one is the host name, but you have to give let's include email address. And I said this is true because I don't want to use an extra cell, I want to use a tested, but normally if you run this in production, you will not have this line and everything else is the same. And then I just do double compose up. I can explain more if you ask me later for more details. So notice that I visit the site without the HTTPS in front and NGX reverse-proxy will automatically redirect you to the HTTPS version. Yeah, as you can see, there you are, it's HTTPS in front. Let me quit the browser. I want to show you the extra certificate itself. Sorry, I was copying and pasting a comma. Yeah, so this is the extra certificate. You can see it's a fake CA because it's a test certificate. These are just the hashes, blah, blah, blah. And then, yeah, you can see it's by, somewhere it says it's by let's include. Yeah, see the CN, let's include. Yeah, that's it. Just to show that it's actually by let's include. So yeah, can we have simpler containers? There's something called LexD, which is basically LexC with nicer tools. They built on top of LexC with a nicer REST API and nicer command line tools to make it easier for you to use LexC. And then someone wrote something called Bokker, which is just docker in 100 lines of batch. If you look at the source code, it's basically just setting our new main space, new C groups and executing them. So you just try to demonstrate that whatever you can do with docker, you can do it easily with batch. But of course, docker does some nicer things where you like create networks where you can create like overlay storage devices and the layering system there. So it's just a framework of sorts. It's like rails for building website, but in this case it's for running containers. And there are many, many other efforts to do this. So yeah, that's the end of my talk. Before I go, I need to talk about this briefly. So basically we are trialling this, I think it's better for me to show you the site than talking about it. Is it F? Wow, it doesn't work. Can a Mac user please help? Click on it. Okay, anyway, it's this page. So basically, we are testing our new platform for you to make a quick buck by building for these tasks, which are just like simple programming tasks for you to earn some quick cash. So if you want to find out more, please talk to us after this. There's some ideas there. It's not really ideas though, it's our own stuff. Yeah, okay, thank you. Does anyone have any questions for your awkward about the stock? If not, then we're done for today. Yeah, just wondering, right? Why don't you put every single application into one container? Because how Docker is designed is that it's meant to run one. Can you refresh your question? Do you mean like, why don't I put everything inside Docker containers? Okay, is it possible? Not really possible because Docker is designed to run one process. So there are many applications like the Hadoop world of data processing, right? They expect you to run many processes together to do the one thing, like Spark is terrible for this. I tried to put Spark inside a container and it just ends up with thousands of zombie processes because the main Java process doesn't know how to read the children. And it's just not ready for it yet. So in this case, I think Lexi and Lexi will be better. I've not tried it yet, but yeah, still we are trying it. So Docker isn't really good at killing zombies? No, Docker doesn't care about a zombie. Your process that you run has to kill the zombies. Yeah, that's right. Like you mentioned, you could include that in your system and try your Docker. You don't really want to run a unit system inside Docker because it makes it really heavy. So there's this, in my slide, there's this base image that Fusion has created that will help you, where is it? There's this base image that includes a script that helps you deal with the, it's not an unit system, but it will help you read the children if necessary. Yeah. Okay, thank you. Does anyone have any more questions? So you say that we cannot run Spark on Docker. You mean that Spark has a very complex dependency? It's not a dependency issue, it's the fact that Spark doesn't have a client server model, at least for the old version. There's a new version with a client server model because you want to run a client inside the container and then your hard to cluster will be the server so that you can call the Spark server to run whatever jobs it needs to do. Currently, with the stable version, it's not really very well. So I had to package the entire Spark binary inside the container and it makes the container really fat. It's like a few gigabytes, not very user friendly. And then like I said, there were a lot of zombie processes that results from it. Yeah. So recently there was this thing that happened on, I'm not sure if you know, about the fucking Docker. Oh, what's your take on that? I don't know what happens. I haven't heard of that. Yeah. Can you tell me some background? So one of the, it's speculated that one of the founders of Docker wanted to call Docker and come it off as a new base. Why? Because of internal issues. Oh, okay. I've not heard about that. I don't know. I'm sorry. In practice, how do you supervise all these containers because if you don't, you can't handle it? I mean, Docker daemon has an API for you to, I think there are some tools that help you do this. And Docker has, everything that you run is like, oh, this spits out a lot so you can supervise it from that. So you just have to do the tooling? Yeah. It doesn't provide it by itself, which is why it's paid Docker to do it for you. They have a paid solution. Or you can build it yourself, yeah. Can we look at Kubernetes? You can use Kubernetes. I, yeah. Yeah, Kubernetes is another one. I've not had the chance to mess with Kubernetes yet. So there are two, I guess, competing, Kubernetes and Docker swarm. Docker swarm is by Docker. So the API is more familiar, I guess. Whereas Kubernetes is different. I've not really tried both yet so I can't say much. Find the popular networking, there's a solution for networking quite handy. Yeah, there's some legacy behavior they try to preserve and they want to break back the compatibility. So yeah. Any more questions? If not, thanks, government. Okay, bye. Taking a break for next...