 Okay, so welcome to the next session, which is going to be done by Jaka from Ljubljana. Jaka, as I have been told, is a guitar player and programmer and long-time Nix contributor, working on systems stuff, and that's what he's going to talk about today. So welcome everybody, Jaka. Welcome to my talk about running Kubernetes and Nix iOS. I'm Jaka and we'll talk about why Nix and Kubernetes are a great combination. Okay, so something about me, as it says, I'm full stack engineering, JavaScript, Python C and different other languages with experience in provisioning and some better devices and security. Lately I'm mainly backend JavaScript developer, but I'm interested in what spectrum of IT fields. One day I'm fixing bugs of incompetent front-end developers and other days I'm doing some low-level hardware programming. So the project I'm currently working is actually the first one is a startup called Gate Hub and it's a new FinTech platform for multi-currency payment trading and exchange. It's based in Ripple. We already got the CIME lesson and it's currently we are just releasing our new product that is mostly for enterprise users. And the other product I'm making is actually data-driven distributed task automation and aggregation work using graph databases and Docker containers and also Nix. And this project is actually making my free time and I also want to make a startup from it. And so when I started working on Gate Hub, we decided that we want to split our infrastructure in many multiple microservices. So we started to think how can we deploy scalable infrastructure. Our developers wanted to use their Docker for development and deployment. But I still wanted to use Nix.js and Nix packages for our deployment. So the questions I had were how to deploy scalable Nix.js systems, how to deploy scalable apps on top of Nix.js, how to have reliable distributed storage and scalable monitoring. So I decided we need something like a cluster process manager, secure distributed, overlay networking, load balancer, distributed and replicated storage, a schedule for different resources like power, networking, storage. And this all should be managed by some cluster manager and there should be some monitoring system to actually see what's going on. So to do everything, we have to start with... Let's make an overview of what we have. So the first thing is the decision of which container service manager we will use. So in that case, we have different choices. One of the first is usage of system. It's not like container manager, but the main real difference is for container managers they actually run processes in containers and for... And for example, system also has this command called systemD and spawn that also runs processes in containers. And the other you will probably know is Docker. So main advantage of Docker is that it's user friendly and it provides hub of pre-made Docker app containers. It has two simple format for building containers. And the idea behind Docker is that you run one app inside container, not the whole OS. Because if for comparison, for example, with Alex T, container manager where you usually run the whole OS in container, whole distro base image. Of course, you can also do this with Docker and with other container managers, but that's not how it should be done. The rocket, I don't know much about rockets. It's a new thing developed by CoreOS. It actually, they define some specification, declarative specification format for building images. And it's a bit different from Docker, but in a sense that you can... It's a bit different from Docker that it has more... You can specify more things, because Docker has pretty much simple format for building images, and as I said, it's sometimes too simple. Okay, and then we... Let's go to the Overline Networking. While we need Overline Networking, it's actually to connect all these services together across different machines we are running because we don't want to... To statically type in all the IPs, we just want to spawn these services anywhere in the cluster, and they should connect to all other services they need. So in that case, we have a few choices. We have Open-WiSwitch. As we will see later, we use Open-WiSwitch with IPsec networking or across our cluster. Then we have also some other choices that are more Docker-related. For example, this Wave that simply enables to connect multiple machines, and I think they also use now Open-WiSwitch as a back-end. QuoraS Flanel... No, Flanel. Flanel is actually... It's not as optimized as Open-WiSwitch because Open-WiSwitch has a kernel support. So it's actually a really low latency. With Flanel, there is a bit more latency because it's running in the user space. So in that sense, it's lower, but it connects to ETCD. ETCD is a distributed configuration storage. So whenever a new machine comes to a network, it automatically connects to this distributed networking. And Docker is now working on LIP network, and it's not as developed as other alternatives. So to be actually able to run our services across different machines, we need to have some kind of storage that's remotely available or available on-demand on a specific server. So in that sense, we have a choice of different distributed file systems, like SAV, GlasterFS, and ExtremeFS, and also in cloud solutions like Amazon and Elasticbox Storage. In NICSYS currently, there is a model for ExtremeFS that Matei Siwa is working on, and I don't know how well it does it work. We still need... I would really like to have NICSYS model for SAP for GlasterFS that's better supported by cluster managers like Kubernetes and also Docker now. And then let's move to cluster managers that we can choose from. The first I will talk more about later is Google Kubernetes and others. I'll describe now. The second one is Quora as Fleet. So Fleet is actually a cluster manager that's managing system services across a cluster. It was actually, I think, developed before Docker was so popular. But it's not so much developed now. So it's really nice because you can spawn system services across a cluster and also probably NICSYS could use it. The third one is Docker swarm. So Docker is usually running as a single service that manages your containers on a single host. But if you use Docker swarm, it actually distributes containers to many different nodes that are connected to it. So it actually exposes the same API as Docker, but it can communicate with many machines. And then you have... Actually, Docker is not here, a cluster manager. It's... But I still put it on the list. It's actually like an open-sourced Heroku written in Bash. And it's really nice if you want to deploy some simple application. And then the last we have Rancher. Rancher is a solution on its own. It's actually also using Docker. And it's actually providing its own overlay networking. They have pretty nice UI, but I actually spoke with the guys from Rancher and asked if... Because their solution could only be start from Docker, and then they provision everything. And I asked them if they could, like, provide an instructions how to deploy it separately on the machines, and they don't have this in-plan right now. But it would be really nice to see Rancher. What? Mesos, yeah. Maybe I forgot about Mesos here, yeah, also. Yeah, Mesos actually... But Mesos is actually also a cluster manager that can work with not only containers, but processes among the cluster. But it doesn't have built-in support for networking and storage and stuff like that. But it's a really nice distributed scheduler. And, for example, what would be nice to see is, for example, for Hydra tasks, for example, use Mesos, that's one nice case, because all other cluster managers are more for running applications, but, for example, Mesos is more for running, like, I would say tasks, but it can be used for different things. But you have many more available here on this link. It's actually, if you Google how to run, how to scale Docker, container production, you'll find a long list of these solutions. Okay, so now let's talk, let's say something about Docker and Nix. So Docker is primary use for running application, not a whole distros. And, of course, running Nix inside Docker is easy. A few benefits here we have compared to other Docker images is that you can pick the exact version of Nix and exact versions of packages, because, for example, if you use some other images, like, I don't know, Debian or Ubuntu, you can just select which revision of distro use, and here you can actually pick the commit you want, like, of course, you can add a channel and then install whatever you want. And, yeah, it's really simple. I actually just pushed a new Nix image for Nix 110, and so you can pretty much easily make a Docker image and run it. So you can try this yourself. I will not run this now. Okay, this slide. So what about running Nix.js in Docker? As I said, actually, you can run Nix.js inside Docker containers using privilege mode, but you actually don't want to do that, because Docker was not meant for that, and that's actually more for if you want to, like, test or develop on Nix.js, you wouldn't do this in production. Now, working... I was working on service abstraction layer that would actually abstract the services and provide a way to run... that you would be able to make, like, an Nix.js config and then build... and then make a separate container for every service and run it on cluster, but currently I have different things to do, so it's not my priority list, but I think it would be a nice thing to do to actually separate services from Nix.js, because Nix.js is currently quite a monolithic system. So now let's go to Kubernetes. It's actually open sourced and announced by Google in 2014 and influenced by Google's Borg system. It's written in Go, like, actually most of these things I showed you before is they are, like, written in Go, so mainly because Go allows to actually pretty easily build static binaries and distribute them anywhere, so that's also a Google's approach currently, how they're deploying stuff. So Kubernetes uses Docker as primary process manager but also has support for Rocket, so you can make a Rocket image and run it in SystemD and I think Garbus was also working on making... also wrote an article how to build a Rocket image, so maybe we could try and run it with Kubernetes. So Kubernetes has a lot of commits, I would say a lot of contributors, mainly because it's supported by Google and... it's currently for a pretty long time in stable release from the summer somewhere. So what does Kubernetes provide? It provides replication, so it actually manages Docker containers across multiple machines. It provides load balancing, so it has built-in load balancer that balances traffic across all the replicated services. It integrates with distributed storage just like EBS, SAF, GlastroFS, NFS, now they added support for this fiber file system, and there are also some other storage just like Git and secret storage so you can put secrets and stuff like that. It has supports for resource quota, so you can say, okay, I only allow this process to take this much of RAM and this much of processing power. It has built-in support for logging and monitoring, and it has declarative configuration for actually for everything. So Kubernetes consists of many components and these are separated in microservices and here is the description of them. The first one is API server. It's HTTP API service on which you connect and control Kubernetes, and the changes the made are applied by a control manager, a Kubernetes scheduler for resources like servers, so it decides where to put some containers and it also now has support for allocating storage, so if you have a pool of storage it can allocate it, but currently this only works in Google Cloud. Then it has proxy service, it's like load balancing proxy across for all the services we expose, and it of course has this QBlad that's actually running the containers themselves, so it manages containers and reports on containers. Okay, so here's the schema. So usually run one master with which you have applied API server scheduler control manager. Okay, we have also here hipster. It's actually monitoring service for all these things. And then you have many, they called Minions, so many worker nodes where you have this Qproxy for load balancing, QBlad for running services, and also a DNS service, so they also integrate with this DNS called Sky DNS that provides DNS service for all the applications you run on Kubernetes. Okay, now about some terms they use. So namespace is a separate group of application control services. So you can have, for example, development and production namespace. It actually doesn't provide a physical separation between containers themselves. It only provides logical ones. So for example, one container can still communicate with other container, but you can actually write your own firewall plugin that would prevent this. Then there is a Minion that's a worker node, a pot. In a Docker world, you usually run one container and then you can attach multiple containers together. Here, actually, you can run a set of containers on a single host, but they actually share the same networking namespace. So that's pretty nice and they actually can share the file system between these containers. Then you have replication controller. Replication controller is a controller for group of pots. You can say, okay, I want to deploy these pots and I want to make these many replicas. You define this in replication control. We will see an example later. Then a service is a load balancer for a group of pots. It's running on Kubernetes. Okay, so maybe let's make a demo. So what we will do here is actually we will deploy on... It doesn't show everything. Yeah, okay, it's okay. So we will deploy... Here we have our replication controller. So what we will make here is... We will deploy two NGINX containers that... Okay, so let's do this. Okay, so I actually already have this deployed, but I can... Okay, now first let's get replication controller. So we already can see that we have replication controller with NGINX with two replicas, but we can delete this. Okay, so it now deleted our replication controller. We can see that now. And now we made create with this file we have on the top. NGINX controller. Okay, so now we can see it's running the replication controller. In this replication controller it's running two pots. One has already started, the other is almost started. So now both are running. Actually, maybe if I just make curl it would be just fine. So it... Because I said that these are exposed by DNS, so I can just make a curl like NGINX. Oh yeah, I forget something. We also have a service. We also have to expose this. So I already have a service available, but I can delete it of course and NGINX service. And I can actually go to the next slide that shows the service. So here is... Maybe just describe this. So here we have a service that selects all the pots that have label app NGINX and it exposes port 8000. And target ports on the container is port 80. So let's do this. Now we have service available. It has its IP dedicated. So every machine has to have IP space for these services and for pots again. Now let's do this curl. NGINX service. Just a second. Probably I have DNS problems. I actually usually have different DNS server. Let's see if now works. You're not working. Oh yeah. Oh yeah, the different port. It's a wrong port. It's 8080. Thanks. 80,000. Oh yeah. So here is NGINX responding. And it's loud balancing between these two pots that are running. If you would have multiple machines, it would run these pots across multiple machines. So it provisioned them on different machines. That's what scheduler does. Of course, here we have some additional commands. We can also get the logs of containers running in our classroom and access some command inside container and also have a shell to this container. Okay, now about running Kubernetes on XIS. I created an XIS model. And it's a deployment in production for a longer period of time. I actually been testing this for quite a long time. I think from version 0.6. And now we have version 1.0. So actually not the latest stable release, which is 1.1. That's also what most other distros have. And it's actually to actually deploy it. It's actually just a few lines of NIC-size config. So what you have to do, it's like make a bridge interface. You have to make this your server to be a rows of master and nodes. And to actually just define the docker where on which interface that it should use this interface. So in that case, CBR-0 interface. So you here don't see enable. That's because with these roles, you already defined that it kind of should be enabled. But I will probably change this. So you will have to explicitly enable it. You just define the rows of services. You can of course also enable the different components separately. This is just a shortcut with these dot rows where you define master and nodes. But that's running only on a single node. For production environments, we need a list cluster of three machines. And one of which has to be a master and also a minion node, also a worker. And that's because usually in production, Kubernetes depends on ETCD. ETCD is this configuration storage. And if you want to run it reliably, we have to have a quorum of to be at least two of three machines running. So that three machines kind of make a cluster that you can run in production. And you also of course need to have an overlaid networking. So these machines have to be connected somehow and have rootable subnets. Every machine gets its own subnet. So services that are deployed on different machines can talk together and have different IPs that are rootable to different servers based on IPs. So in our case on Gate Hub, we do it on Amazon AVS instances. And we have virtual private cloud. And on top of that virtual private cloud, we are running open-v-switch connected with IPsec because we don't just run instances on Amazon but also on Hetzner. And with Hetzner, because we are communicating over the Internet, we need some kind of secure communication. And that's because that's why we have IPsec links. Yeah, then the deployment is with KnicksOps, elastic box storage. And then we have separated production and development namespace. Here is just a high-level overview of how it looks like. So a couple of minions connected. And then we have Amazon load balancer that's load balancing the traffic to these servers and open-v-switch overlay networking connecting all these instances. Maybe just a bit about monitoring. We use collector for matrix aggregation, influxable for matrix storage, Grafana as dashboard for matrix visualization, Bossoon for alerting. Yeah, as it says. It's actually we are currently using Bossoon for alerting because there is, I haven't found any better component, but Grafana will get in an extra least support for alerting, so this would be really nice because then you would just have to run influxable, Grafana and some service like collector that collects the locks and that would be it. Configuration, actually I will not show how to deploy a multi, how to deploy cluster now, but actually I have available set of profiles. You can include this into your NICSIS configuration and I have something like profiles.kubernetes.enableTrue and for example profiles.open-v-switch-enableTrue and it will, with a bit of options, you pretty simple configurable system. Why I don't want to use bare NICSIS config because I reuse this configuration in different deployment. I don't want to duplicate my code, so I have developed a set of profiles that I can reuse in different deployments, but of course it's still a better documentation. That's about it. I think it's already time for questions. Thank you. At which point when you deploy to, when you build images, docker images, right? When you build docker images, you include the whole closure inside the image so they can be moved around. Yes, currently. But we are working on having distributed NIC storage. Which is mounted across? Yes, that's the idea, but we are not there yet. But yes, that's the idea. This would be really awesome. For me, you really need a big scale to benefit from all this big setup. You need to really have a good problem to solve this, to have this big setup. I mean, for us, it's easier because what's the size? We are running around 10 Amazon instances, around 20 microservices replicated. And then we also have support services like GitLab, like Sentry, and Grafana, Elasticsearch, a lot of stuff. And actually, if I would have to deploy all this by... I actually, for example, GitLab is just deployed with image available on docker hub because this was much easier, quicker way. So this setup actually enables you to run NIC services and also the other, and it enables us to later scale or replace it with a full NIC setup. I was just curious on that topic with copying the closures into the other containers. How big are you seeing your containers being, like for your Nginx container, for example? Oh, Nginx? I don't know how big is it. It's an exposure size plus a few 10 megabytes, maybe. So it's not a big container. The bigger problems are other services. For example, we have a lot of Node.js packages that could be really big in size. If they have a lot of dependencies, this becomes... Has that ever been a problem in actuality, or is it just kind of a... Actually, we had problems with storage because when we updated it, if we updated it, and for example, especially for Node packages, if you updated the Docker file, Docker description file, and you update the version, it will rebuild the whole image with all the dependencies, and it will be a different hash. And then if you have a lot of updates, it will just fill the disk space. So now we actually have a garbage collector that runs every day and cleans the old Docker images. So it is a problem, and distributed storage with Onyx would be a really nice solution here. Can you tell a little bit more about how the VPN works, or the OpenV switch? OpenV switch. Between Amazon and Hatsner and stuff. So OpenV switch is a virtualized switch that's integrated in Linux kernel. So what it actually provides is to simply create a virtualized networking, so we could say it's after-defined networking between machines. We are not using all the features of it, but it provided us with actually not so easy setup of encrypted tunnels. We could also probably use just GRE tunnels, so it's like an IPsec integrated directly in Linux, but I actually decided to use OpenV switch because it has better support for routing. We could do firewalling in some sense with it, so we can actually limit communication between different containers in a cluster. So in that sense, OpenV switch... It integrates IPsec, or... I mean, IPsec is used in Linux kernel, it's part of kernel. Okay, and the configuration between the different servers, is that manual? Yeah, currently we have... We deploy the configuration, its static is generated by NixOps, so when I deploy it, and if I add another server, actually it configures all the other servers to provide this mesh networking between machines. Okay, cool. So we need a better solution to actually be able to spin up servers just without NixOps because... Okay, I'll play more, thanks.