 The next talk will be Vadim. Vadim is an engineer at Red Hat and one of the, on the day he is working on OpenShift and OQD and the open source version of OpenShift. And in the night he works on other challenging projects and will show you today the different approaches of agent-based Kubernetes installations. Have fun. Okay. I hope you can see the screen and you can hear me as well. Yes. So today we'll speak about different ways of installing Kubernetes and introducing one of the new way, which is called agent-based method. A few words about me. My name is Vadim Nikovsky. I'm a Principal Software Engineer from Belarus living in Czech Republic. I work for Red Hat in OpenShift department and I job is basically to overlook the whole cluster lifecycle, helping customers to install clusters of various configuration, stand them, upgrade them and manage all of them at scale. Let's look into the most challenging and common problems when you're installing a cluster and proceeding with the basic things. First is our customers usually need to know if the installation method we provide is flexible enough to run on their infrastructure. So we need a very thorough description of the installation process and the installation method has to be flexible enough. Another common problem is collecting debug information if the installation fails. Another quite common problem is that usually people don't just install one cluster, they install multiple of them. So the method has to be very resilient. That means you have to limit a set of user inputs so that the method will be reusable over and over again. We want to make installation as boring as possible so that you could hand off the installation to developers or less technical users and in general, offer it as a service or inside the company or for the general public. GitOps is a buzzword people use a lot and it's very helpful in a way that it helps tracking of COVID what change and allows us to review things before they get applied, which is incredibly valuable that your branch of things is a cluster. That allows us to keep history of cluster changes and the whole team is managing the cluster instead of one defined person. So here we need a dilemma when we are picking which installer should we be using for this. There are multiple possible ways. The most common one is probably the bottom but there are multiple ones like Gov's and keep straight into the mansion and all of those install was a balance in between to extreme. One extreme is a very generic installer for instance Cubatum which only manages the parts which are essential for Kubernetes. They are not involved in preparation of the host they're not involved in network settings and so on. That means you have to come up with a way which installs the operating system and networks and so on and that might influence the installation process and also needs to be accounted for. The other extreme is a very specific installer which targets say a specific cloud or based on Terraform and depends on what kind of providers would be used. That comes with a price that it may become harder to customize rather complicated to update for the new methods and so on. This method is very complicated to be implemented for bare metal install systems. These problems have led us to research on a new method of installation and with a specific focus on bare metal and things to find that little ground to interest things focusing on the infrastructures which is provided by user unlike what we get from the cloud and the great API. This method should have the following requirements should satisfy the requirements. First of all, it has to be boring. It has to hide complexity. It has to do a lot of things for the user so that they could focus on deploying their workloads into the cluster and not bother with silly things. It also has to be quite customizable so that it could fit on existing infrastructure or has to follow some specific requirements for the network for instance. And this method also has to have a great feedback when it comes to failures as to collect sufficient debug information automatically. This leads us to the idea of assistant installer, a tool which assists you during installation and identifies if the cluster settings are incorrect and if nodes are satisfying the necessary requirements and if the installation itself is going to be. First, the idea came from reversing what basically could happen. Instead of provisioning the machine with real grading system and writing things in this and starting with installation we first should collect all the information we know about our hosts. In order to do that, we need to boot every single machine with a specific live ISO and have a special agent which patches collects the necessary information and stores it in the centralized database. That database definitely should have an API and so that we could build a pretty UI on top of it. Here we can see a screenshot from our service which shows what kind of node is about to be joined in the cluster. What's its node capacity, what's its CPU is used, how many disks and what are their types there and what are the network interface. It's already helps us to solve several problems like if your node is unable to run the agent at all it won't be able to run the cubelets. So we filter out nodes which will not be able to join the cluster at all. We also have a full overview of all the available hosts we have so we can play it on a sign different roles. And we also establish a communication channel between our service and the machine without any kind of SSH and more complicated things. This channel is incredibly important because it's also bi-directional. That means the service can send integration certificates and so on on the host and ask it to run some commands on the host and the host itself can return error messages or send logs back to the services if the operation fails or just report on the successful steps. Since we already have a hosted service we can build a pretty UI on top of it. In order to do that, we need a specific API that means we also make it customizable so that users can interact with the API just directly by passing the UI service. UI have a reason quite important because it makes user experience easy to newcomers and people just spent less time typing the very same commands if they are more used to the wizard style UIs. One of the distinctive features of this installation method is that we can validate configurations before the installation is started. We can highlight critical problems on the host or the cluster configuration itself. Most common validation would be validating for this network misconfiguration if we already have different overlapping ranges of the, for instance, for parts of services that might result in problems later. There are more subtle problems, for instance, if the nodes don't have time synchronized, their time might diverge and some certificates we're sending them might not become valid from this point of view. Later on, this might cause quite a lot of problems. We can also automate the provisioning if we use protocols like IPMI, Redfish. For that, in our service, we're using a trusted OpenStack component called Ironic which helps us to automatically power on the machine, provision it with a discovery. So we build, run it in the live mode, collect validate details and then proceed with the installation all without user interaction. The progress is also, installation progress is incredibly important because it shows that the phases that the installation is passing through and the users don't have to guess if the cluster installation is finished or not yet. You might also have heard about the cluster APIs, method of installing clusters, which is a great tool, of course, but for us, these features have not been sufficient. It doesn't allow us to prevent installation if we know that cluster configuration isn't correct. However, if you want to follow that route and combine the cluster API with the agent-based installer, that's of course have created a great cluster API provider to do exactly that. For our purposes, we also have created the cluster API provider which works with assistant installer agents so that you could make use the best of those words. There are more challenging problems we would need to cover and the most challenging is probably networks. In our implementation of this in OpenGIF services, we are relying on network manager on host to configure networks for us and in order to generate network manager configurations we're using and I'm state operator to build network manager complex from Kubernetes manifest that also helps us to manage the cluster on day two as well if you want to change some of the settings here. The second problem is node configuration and basically what kind of course and what files do we want to lay there? Here we use a combination of this technologies. One of them is RPMOS tree which helps us to create an OS image from container composite from various RPMs and helps us to manage atomic upgrades. The other part is ignition which helps us to layer additional files for the configuration on disk and also manage it as a Kubernetes manifest. Both of those technologies are used by machine config operator which helps us to set required files on disk so that user could also make use of it if it wants to add some more additional things. What if you would want to use it with a different Redis distro? That would be a very, very challenging task because when the organization comes in different sizes and shapes and in order to make a useful and viable services you have to narrow it down to a particular scale. Like in our case we're focusing on network manager. We also pre-select an operating system for you. We also bind the users to use ignition as one. If you want to use your own agent base installer you would probably have to make the very same choice just like we do. Since we're also tracking the installation and all of its process we can also collect pass rate how many clusters we're able to install and which ones have failed in the mid-time how many have been canceled and so on. That helps us to maintain, to set a specific SLI for our team that we want to have a success rate of the situation for instance 75%. And if the success rate falls below we are having a problem, have to fix the service. This helps us to treat this very services and application just like any other. Since we are deploying a service which has an API and it's a web service we can definitely run it on Kubernetes as well. And since the inputs can be set as Kubernetes manifest we can throw this into an operator or controller. That gives us an opportunity to declare the whole cluster as we want to see it as Kubernetes and a bunch of Kubernetes manifest. And the operator would do necessary API calls to the service and the cluster would be installed in the end. Since we're talking about Kubernetes manifest we can store them in a Git repo get it applied by flux of Argo and we can use GitHub methods to create cluster and OpenSheetSpeak. This is called a zero-dutch provisioning where user doesn't touch the hardware to start the convention. Kubernetes manifest and feeds it to the operator operator does necessary things and the cluster gets installed in the end. Let's do a small screenshot tour on the assisted services we've decided to talk about now. First we start with a high level cluster details. Name is basically the main cluster version we want to set into one as empathy. The next step would be generating the ISO to fetch information about the machines. There are two different flavors of this ISO. One is the minimal ISO, which only has the kernel and the bare minimum just to run the OS and fetch the necessary requests from the service itself. This type of ISO is very useful in virtualized environments where network is usually provided by DHTP and usually pre-configured and can be considered stable. The other one is like so in case which includes the root device itself, just in case you need to tweak the more complicated network parameters. We also ask users to bake in their SSH public key so that they could SSH on the machine in case the agent doesn't automatically boot up so that they could fix the problem and stay here there. Later on, we will not need it. Next we are looking into the whole inventory like the list of discovered hosts we have. We can assign specific roles to them, be it control plane or worker. Immediately at this stage we run validations. For instance, we don't allow control planes hosts to be running on HD disk because the entity will be very unhappy. We also require all host names to be distinct so that they could join as different nodes and a few more. Next, we can specify the cluster network settings, be it IPv4 cluster or dual stack. We set CRDR ranges and can tweak into more complicated things. After that, we can start the installation and we read it with wizard which shows us which stage we add, which host is doing what and just the overall status of the install process. Next, we are done. Once we're done, the wizard will show us the web console URL and the generated credentials to get started for the users. It also gives you a link to download a cube content so that you could immediately get started with this cluster. Let's look into more specific details and problems we're solving. Here is an example of how we define a network configuration for particular hosts. We can set that we need two interfaces with these mic addresses and we want them to be joined in a bond with this address and parameters set like this. This manifest is handled by an in-state operator to set the necessary parameters on the node. Next, we can also define our hosts as bare metals, those in case they support protocols like IPMI and the red page and so on. We can set the address to the bare metal control interface, a link to the secrets where they could then also store. And we can set that we don't want to clean it up if the boot fails for further investigation, for instance. The open stack organic would do all the necessary steps provisioning, powering the host down first of all, running discovery.so, collecting details, touching back over the user interaction. And in the end, this agent-based installation method gives us quite a few benefits. First of all, validations, they help us prevent no entity failure configuration before touching the host itself. It gives us pretty API, which allows us to build flexible and a UAE UX reach applications on top of it. And this method allows us to use it, to be used in various infrastructures customers want to make. And that's pretty much it. Thank you very much. Let's hop into the questions. Thank you so much. Let's have a look if you have any kind of questions. That doesn't look like it. Not yet. Maybe they're coming soon. I would be interested in like, what do you think are the biggest challenges when you're talking about doing these kind of installations? Networks. A lot of challenges are revolving around networks, be it validations, first of all, we want to make them very useful for the users. On the other hand, we don't want them to be very straight, but a lot of things are revolving around networks here, because people have different. The other part is shifting validations before we write something on the disk because people get very annoyed if we destroy all the disks they have marked for installation and they have selected the invalid settings. But for technical reasons, we're going to drown some validations before we get things that we'll start it to. Inventing ways of running validations in the live or so has been quite interesting and very challenging, but networks and disks is a hard problem. And what do we can expect in the future from platforms like Opershift and OCD? Is there something which should make the live even more easier? One of the things we're working on, first of all, I try to avoid using the word OpenShift because it usually means OCD and people get very annoyed that we're neglecting OCD. Now is the good time to say that a system installer can support installations of both OCD and OCD. One of the interesting things is installing the so-called cluster zero, your own very first cluster using agent install method. This is what the team is working on. That means that involves selecting one of the nodes as the host which runs the service, then a lot of automation happens which provisions the other two nodes. But since you don't have user input at that point, you're not supposed to have an input in this way. It's again, very challenging, but an interesting task to get results. Cluster zero using the agent-based method is what we're working on next. Cool. Awesome. Then thank you very much, Vadim. Have a great day. Thanks for being here and see you soon. Thank you. Bye-bye.