 Hi all, thanks for joining the very last presentation here. So what's hidden behind this daring title? I want to spend the first half of this presentation talking about some of the core concepts and Kubernetes that are supposed to help you as an administrator or as a user to keep your workload up and running, no matter what happens. And in the second half of the talk, I will be talking about node network configuration in Kubernetes. And I will focus on a safe network configuration. So let's start. You will know this logo. Kubernetes is an orchestration tool that helps you run your continuous workloads. And this workload is usually running on multiple nodes. And even more importantly, this workload is not running on a laptop or a phone of your user. Meaning that if anything goes bad with the servers that you run on or within networking, that's connecting these servers, the user may end up offline. So how do we prevent that? Let's illustrate it on a couple of examples. So here we have a node. And on the node we have a bot, which is the smallest unit of workload in Kubernetes. And we have a user. This user is using our application running in the bot. And you can tell that they are quite heavy. But then something terrible happens with the node. And the user loses their connectivity. The application is not reachable anymore. And they are upset. So how do we prevent this? If the issue was a single point of failure of our single node, let's throw more nodes in the cluster. And since we have multiple nodes, let's include a control plane node that will manage them, will manage all the bots and keep the nodes healthy. And now to leverage our new nodes, we create something called deployment, which is a simple way in Kubernetes how to run your application in multiple instances. What it does, it deploys your bot. How many times you want, if it sees that there are more applications than you asked for, it will heal some. If there is less of them, it will create a new one. So we kind of solved the issue with a single point of failure, but we have a new one. And that's which of these bots should be now accessed by the user. For the user, they just want to get to the application. They don't care about how many bots you're running in the cluster. So to solve this issue, Kubernetes introduces something called service, which is an abstraction of service. What it does is it serves as a single entry point with a virtual IP address and a domain name that the user accesses to. And then the request is then forwarded to one of the bots implementing the service. So now the user is happy. The traffic is getting load-balanced to our healthy bots. And say now one of them crashes. Again, you see that the user is still happy because the liveness probe of this bot was started failing. And the service noticed that one of the bots that it interfaces to is down. So it will serve on, it will dispatch all the traffic to the one healthy bot from now on. And then thanks to our deployment, we start a new bot. You have two of them. And the service picks it up and forwards the traffic here. And finally, the failed bot is getting deleted. So it looks good so far, but you may have noticed that while I was solving one issue of a single point of failure of our worker, I introduced a new one, which was the control plane. We have only one instance of that. So what happens if that crashes? Well, we will lose our deployment, but you see from the slide that the user is still communicating with our bots. That's because Kubernetes proves to be quite bulletproof here. If it loses the control plane, it enters something. It enters a read-only mode, meaning you cannot create new bots. The deployment cannot do its job, but the workload running in those bots is still running. The service is still serving. And as long as these two bots stay alive, you're good. So if you are okay with this little downtime until you fix this control plane node, you should be good to go with this kind of setup. But we won't stop here. I want to make the control plane highly available as well. So applying this similar process, I will throw in more nodes for the control plane. Each of them runs a single instance of the API server and the add CD that serves as a distributed storage. Now, how do we handle the high availability here? For the add CD part, all of these instances are able to serve at any point of time. You can read from them and you can write to these storages of databases. For the write-do, you need to make sure that at least more than half of these instances are alive so they can vote and allow the new data to be stored. If your count of the add CD instances drops below the half of them, then the cluster again enters the read-only mode. For the API server, again, we have several instances of the same service here. So we have to solve the same issues with pods and we need to balance the incoming traffic to all of these and make sure that if one of them goes down, the other two can continue serving the request. And that's done similar to the service using virtual IP address and load balancers. I won't get into details of this. But with all this, we have kind of a healthy, highly available in most cluster. All right. Now let's change the topic a little bit. I will talk about the network configuration if you manage to space out in the previous 10 minutes. Now it's time to wake up. So let's get to it. So I will try to illustrate why do we want to configure the host networking and what can go wrong on a simple example again. Here we have three nodes. Each of them has a single network interface connected to the central switch, which is then connected to the outside network, but it doesn't matter here too much. And finally, each of these nodes has a handful of pods running and these pods are communicating over the default interface and everything looks fine. But say we start more pods on these nodes, as many as can fit there. If they generate enough traffic, it may be an issue for our network interface, which doesn't have enough throughput to carry all this traffic. So applying the same logic as for the high availability, we introduce a new network interface to these nodes and to make sure that they are equally utilized, we aggregate them using a bonding interface, which is a way to aggregate multiple network interfaces to either increase throughput or provide some active backup safety mechanisms. And now you can see that since we increased the throughput or the possible throughput, all these pods can communicate over the network. So it was quite good, but if we made a single mistake, we could have ended up like this. And to illustrate what can go wrong, let me show you a very primitive way of configuration of these bondings. And it's just an SSH running a script on the cluster. If you don't know IP tools, don't worry, I will get through the whole setup line by line. So here we have to, we first have to connect to nodes or we use SSH to do that. And then we run a set of IP link commands. First, we need to bring the original management interface ET0 down so we can reconfigure it. And this is the first problematic point because as soon as we bring it down, we lose the management network. And if any of the following commands fail, then we lose this management network forever, unless you use console to connect back to the host and revive the configuration, but they may be quite troublesome. But let's pretend everything is okay here and continue to the next line. So we create this virtual interface bond of type bond. We attach both of our brands for interfaces to it and we bring up both interfaces and the bonding. So now we formed this virtual interface on both of its ports. And finally, in the original setup, the ET0 served as the management interface and it carried the default IP address. But as soon as we attached it to our bonding, it lost its IP address. So now we need to configure it again but this time on the bonding. We use a DHCP client to do so. And here comes the second issue or a pain plan. And that's that if we don't manage to get an IP address again, the host will end up offline. And if we do get an IP address, but it's a different one than we used to have, then the node won't recover properly. And the first issue, I guess, is if you make a typo in any of this, you are again in trouble. These are all the issues but another one would be that this is not really what we are used to in Kubernetes, right? What we are used to is something like this. We call kubectl apply and some YAML file. And this is exactly what Kubernetes NM stays project, which logo you saw in the previous slides does. It provides a Kubernetes native way of configuring the networking in kind of a declarative manner. So let's illustrate how it works on an example again. So here we have our three nodes, each of them has an instance of NM stay, which is communicating with the local network manager to obtain the network status and also to write the configuration back to the house. And again, each of these nodes has two interfaces, CTH0 and ETH1. Now the first feature of NM state is that it reports the state of the network as a Kubernetes object. Let's call it state here and the state would then contain the list of the interfaces we have available. Here it's ETH0, that's up now and ETH1 that's currently down. It does it for every single node in the cluster and you can use this information to just figure out what interfaces are available to integrate with some kind of automation or monitoring tools. The counterpart of reporting is configuration. This is driven by a policy object where we declare the desired state of the network on all the hosts that match this policy. So here we declared the policy that says we want an interface of type bond called bond 1 and it should have two parts ETH0 and ETH1. Now when you apply this configuration NM state will create an enactment object per each node in the cluster and this is then used to monitor the progress of the configuration and also to debug any issues. Now here you can see that we have this central enactment in progress while the one on left and on the right are pending. This is a first safety mechanism of NM state where we apply the configuration on one node at a time and by doing that we make sure that if the configuration is disruptive for the network connectivity it won't take down all the nodes at once. So let's get into the configuration itself now. The enactment is progress. NM state does its thing and it creates the bonding interface over these two network interfaces ETH0 and ETH1. Now the configuration succeeded but we don't stop here. We do it. Now we get to the second safety mechanism which is the connectivity check. We want to make sure that after the configuration of the host is finished we still have connectivity to the default gateway to the DNS server and to the communities API server to confirm that the node is still healthy and member of the cluster. If we wouldn't get a response back then the configuration of the node and the bonding would be removed and the default interface would be again the ETH0. But in this case our bank got a response and we committed the configuration on the node and then we continue to the node on left we configure it and to the node on right. It's worth mentioning that if the configuration on one of these nodes failed we wouldn't continue to configuring all the other ones. We treat every single configuration as a canary test and if it fails we just abort the whole rollout of the configuration. Okay, that was kind of the process in pictures but we are probably more familiar with QCTL and Manifest so let's illustrate the same process using these tools. So if you want to read the current node network state you would call QCTL get NNS and the name of node NNS being assured for node network state and here there is a stripped down example of a node network state. It has a name matching the name of the node it has a list of interfaces ETH0 with its IP address and ETH1 which is down. If you call this on real cluster you would also see many other interfaces the DNS configuration default gateways and much more. So now for the configuration part we would call QCTL apply and then we apply an object of kind node network configuration policy and NNCP ensure it has an arbitrary name and the declaration of the desired state. Here we want the bonding we want to bond interface CTG 0 and 1 and most of the bonding is balanced around Ruben which should balance the traffic evenly across both of the ports. This API that we use in the Kubernetes NM state is directly taken from the NM state project which is not bound to Kubernetes if you want to learn more about it just search for it I guess it may be very useful if you are dealing with a configuration of individual nodes and you want to do it in a declarative manner. So we applied the desired configuration of the bonding and now we want to monitor the progress using node network configuration enactment or NNC in short. As soon as we call it we get this issue unable to connect to the server which is where you should when you should start panicking somehow disconnected. We are not able to connect anymore to the API server which sounds like trouble. Fortunately after a couple of minutes you can try it again called getNNCE and you see that the connection is back the configuration of the first node failed and the two better nodes just avoided the operation and never attempted to configure it. So this is the rollout and the candidate testing practice. So now let's see why did the configuration failed. So to do that you can get the details of the node network configuration enactment and in the message it would tell us something like this basically saying that we were not able to find the default gateway after the configuration was finished. That sounds like IP configuration issues. So let's review our policy. This is the original one I previously applied and you may remember that on the original setup the EK0 was the default network interface keeping the default IP address management IP address and we forgot to include any IP address configuration in this declared state. So we can fix it by simply saying that this bond interface should have IPv4 enabled and use DHCP to obtain an IP address. Now when I applied this configuration and call getNNCE to monitor your progress you can see that it successfully configured in the first node it's currently progressing on the second and the third one is just waiting in line and eventually it should configure all the nodes. So that was for the current state of Kubernetes NM state and what features are available there. Now, Kubernetes NM state has many more features and supports many more types of interfaces to configure but it lacks a little bit on the safety side of things. First of all we are currently working on a long control API where you would be able to say that you want to you don't want to configure in one node at a time but you want to take groups of them and configure them in chunks. Now this may be important if you have a huge cluster where rolling out the configuration node by node may take longer minutes or hours and it's not always necessary to to do it one by one in case your desired configuration is not really breaking the network. Now the second missing feature for safety would be node draining. What we do currently is when you are configuring a node we don't touch the running bots at all we leave them running on the node which means that they may lose their connectivity for a bit that shouldn't be an issue if you run them in multiple instances but we want to strive for an excellence here and we want to make this better by draining all the running bots from the node before we start the configuration there so they get time to gracefully shut down and start on a different node and only then we attempt to reconfigure the node and potentially break the network connectivity there. So to wrap up in the first half we talked about the core concepts of Kubernetes that allow you to keep your containerized workload highly available and to keep your clusters highly available as well and in the second half I talked about Kubernetes NFSIG project, its API and safety mechanisms that should help you not to destroy your cluster and break the connectivity there.