 Hi everyone, I'm Lucas Chelström and right coming directly from Finland and I'm today going to talk about QBDM and how it works under the hood, things like that. I'm an upper secondary school student and doing some cool stuff with Kubernetes upstream contributing to some SIGs and working on this year mostly on QBDM and driving my own company that is contracting for WeWorks and such things. It's nice to be able to contribute to this community. So what are we gonna talk about today? Well, how QBDM is and how it fits into the ecosystem overall. What is the scope? Why does it exist overall? Then getting a bit more deeper into how does the different Kubernetes components talk to each other? How is the cluster secured? What is self-hosting and how can we benefit from using such a technique? How does in 1.8 QBDM support for easier upgrades and so what does that actually do under the hood? And then how can AJ or high availability, someone could call it multi-master, how can that be achieved with QBDM as a building block? And well, this is a deep dive. I hope you'll enjoy it. Okay, so let's start with the scope of QBDM. What should QBDM do? How does it fit into the ecosystem? Well, at the bottom of the stack, we have some kind of infrastructure. It could be some of the public clouds shown here. It could be your bare metal cluster. It could be my Raspberry Pi's on my desk or anything that has some machines. And then we're all here today because we're excited about Kubernetes, yes. Then we want to have Kubernetes installed somehow to our machines. And before QBDM was created, this could be done, but it's kind of hard or you could use some other end-to-end tool as well. So we created QBDM, that bootstrap Kubernetes, and it executes locally as a CLI tool on each machine. It only sees the machine it's executing on, like the file system and other local resources, and the Kubernetes API above it. So what QBDM gets you is a Kubernetes cluster. It stitches the parts together and makes a best practice, but minimum viable cluster by design. So we don't install all the things for you. That's left up to the user to layer three. That could possibly include things like cloud provider add-ons. I think we're currently wanting to extract out of core. Currently, cloud providers are built eight or so, are built into the core, but that has some severe limitations. So we're actively trying to move those out to run as normal controllers on top of Kubernetes. Or it could be load balancers or monitoring, logging, whatever is still. But that's left up to the user. And then we have a new exciting API coming up, an API spec called the cluster API. You might have seen Robert Bayless and Chris Novus talk earlier today about on this topic. If you didn't, you can check out, check the recording out on YouTube in the coming weeks. The cluster API is a declarative way of specifying what should my cluster look like. I can say things like I want the control plane configured like this, and then I want a machine set of these n nodes running on this cloud or whatever. And then we have some, so the spec is generic, but then we have an implementation that is cloud-specific or environment-specific that is running on top of Kubernetes and reconciles to create these nodes and or masters. So QBARM is a tiny part in this ecosystem. There's a lot of different, a lot of moving parts, but QBARM is this like turtle at the bottom that does its thing, executes locally and has a very small scope to just do the minimum viable thing needed for a cluster. People often ask me what is the difference between QBARM and COPS? Are they competing, which is better? Things like that. They aren't competing as they have total different scopes. QBARM, as said, only executes locally on the master, has a very local view of what's happening, but COPS has this global view of the cluster and manages the infrastructure, all the machines, bootstrapping Kubernetes, and add-ons as well. So there are two total different things and should be used in by two different kinds of uses. If you just want a cluster up and running on AWS or GC or whatever and want everything, like just one command to do all the whole cluster for you, then COPS is your thing. If you want to build your own Terraform, whatever, bootstrapper on bare metal or your custom something, then QBARM might be your better choice. So what are the key design takeaways with QBARM? Well, when we started this effort one and a half years ago or so, we decided it should set up a best practice cluster for you. And it should have, as said, a very small scope. The user experience should be simple and the cluster reasonably secure. For example, you can add nodes with this token to QBARM join, which is a tradeoff between security and simplicity. You don't have to use the token. You can use other ways as well, like copying over a file from the master to the node. And it's intended to be a building block. QBARM in it does a lot of things, which we'll see in a minute. And you can run these separately as well. QBARM doesn't care about how the Kubelet is run. So we do provide package, Deb and RPM packages. Siglas lifecycle provides those for you. But if you want to run QBARM on like CoreOS or Rancher or whatever other OS that doesn't use Debson RPMs, feel free to run the Kubelet yourself. And everything will just work. It's only a template. Then we made a design decision that we should not favor any specific provider, like Flannel or Weave or Calico or something like that. Instead, we chose to not do such things. And it's up to the user to install the CNI network via a Qubectl apply. So that also has a consequence that if you run QBARM in it only, you won't get a working cluster, as there is no networking. The QBARM architecture is composable, and everything's divided into phases, as we'll see in the coming slide. Who's the QBARM audience then? Who should use QBARM? Well, a lot of users are happy with this QBARM init flow. They're typing into the command line, then joining nodes as they go. And this is a great way to get started to Kubernetes. Start, like, tinkering about, like, how it works, seeing all the API server running things like that. Or it could be used in an automated manner by tools like Cops or Qubicorn or whatever. So here is the high-level component architecture for Kubernetes. We have the master, which is an API server, which is stateless. Basically, a REST API in front of STD. Then what makes Kubernetes is the controller manager, which runs a dozen of controller loops to make sure the desired state is the actual state and reconcile on that. Then we also, when you create a deployment, for example, the controller manager will create replica sets and it turns pods as a consequence. And then the scheduler will kick in and bind these pods to a node, which will then trigger the QBlet to run the container image of your choice using the container runtime, which will, well, then execute on your host. And we have all these beautiful interfaces with CNI, CRI, OCI, et cetera. So Qubit I mean it does a lot of things. The first thing it does is generate the needed certificates. They are a couple. We have the root CA cert, which is the boundary of trust for the cluster. From that root CA cert, the API serving cert is generated with a year's valid, and it's valid for a year. Then we have the Qubelet client cert. So we'll see in a later slide that the API server when calling, when you're calling Qubectl logs or Qubectl exec, you're talking to the API server, which then is talking to the Qubelet. And then the API server needs some kind of credential to talk to the Qubelet. Then we have a service account private key and some things needed for API aggregation support. Then we generate some identities for the initial actors in the cluster. For example, the Qubelet client cert or a cert for the admin or for the KCM and scheduler. So far so good. Then we host a TD. You can do this externally as well. I mean, that's probably preferable. But the easiest way is using a static pod and letting the Qubelet host it for you or basically babysit it. Then we have the API server, controller manager and scheduler running a static pods from that at the Kubernetes manifest directory. The running Qubelet will then start these static pods and will have the control plane up and running. Then Qubet and marks the initial node to be schedulable with this label or key value pair. And it also taints the master to not schedule workloads there by default. And this is because of security boundaries. You shouldn't really run normal workloads on your master as that might lead to privilege escalation pretty easily as well. Hence, we have the taint so that your normal workloads won't schedule on the master. Then the first Qubet and specific thing is that Qubet and will upload the configuration, the internal state of the world to a config map so that later when upgrading, it will know what is the current state of the world and act based on that. Lastly, we create a bootstrap token so you can use the Qubet and join functionality to authenticate to the master when adding this node. And then deploy the mandatory add-ons in order to pass conformance tests. So that is the current bar of Qubet and Qubet should create a cluster that makes it possible to pass the conformance test but nothing more. I mean, there's various other things like heap server, which is now metric server or some logging solution or whatever other thing that is normally used in Kubernetes clusters should be installed by the user themselves or this higher level solution. And one thing is, one cool thing is Corde and support is now alpha in one line. You can enable that with a feature gate. But if you don't want to do all these things at the same time, or in like this full meal deal, you could run these phases, as we call them, these atomic sub steps of cluster creation, you could run them separately. Generate the Qube configs, certificates, the control plane, static pod manifest, and et cetera. Or you could, if you have some good replacement or don't need, for example, the bootstrap token, you can just don't do that and do everything else and provide some other like copying over the file credentials from the master to the node instead. So what does a like in a TLS secured cluster? How do these components talk to each other with what credentials and so forth? Well, we have the API server, which serves on HTTPS by default 6443. And we have this Qblet client cert, as I talked about. Then we have the other components with their certificates that have to have this, like for the control manager, for example, it has to have the CN of system Qube controller manager in order for it to be able to pass the, well, have the right RBAC privileges. Of course, if you have something else, you have to assign, you have to create a cluster all binding between the normal, well, the controller manager privileges and the third. The controller manager runs two, in this case, significance controllers, which are the CSR signer and the CSR approver, which we'll see in action a bit later. And CSR is a certificate signing request. Then we have the initial masternode. And now we're about to join the second one. We're now basically typing in QubeM join with the token. The Qblet will start with a self signed HTTPS server. This is still an area to improve. This is basically true for all Kubernetes deployments right now, that the Qblet cert is often self signed. As we go, hopefully, this feature will graduate to be the next cycle in 110, so that it can use the same cluster CA. But that is work in progress with Seagult. So we give QubeM the bootstrap token, and it will figure out the trusted CA, which the Qblet uses to talk to the API server. We'll then get CSR with node two, which then the CSR approver will automatically go ahead and approve. If you don't want this link, you can disable it. It's just an RBAC rule. But by default, in order to have a smooth flow for you when you're joining nodes, the CSR approver automatically approves, which the CSR signer then signs. So that means we have the CA key on the master. We never give away the CA key anywhere. And then we have a single, we have this reconciliation flow where we just post these requests, and whoever has the CA key can then go ahead and sign them. This is also true for external CAs, for example. One common request with Qblet is I have some kind of corporate policy that I don't get access to the CA key, or I can't move it on cluster. How can I milligate that and still use Qblet? Well, the answer is you run Qblet face certificates and Qube config, which will give you all the required identities while you're having the CA key. Then you'll just go ahead and copy everything but the CA key to the cluster, and run this CSR signer somewhere else. That will give you the same effect. So cool. Now we got the client cert for the node, for the second node. And its identity is unique. Then we have another feature introduced in 1.7, which is the node authorizer, which will make it, which scopes down the Qblet privileges to only what it actually runs. So when you type in Qube CTL logs, the API server is going to insecurely or unverified connection to the self signed HTTPS server. So this request is using HTTPS, but the API server can't be sure that the Qblet is the one that it's actually pretending to be. But this is work in progress and contributions welcome, as always. Then when the Qblet gets this call or gets any call, it will send a subject access review request to the API server, asking, is this actor, now the Qblet client, authorized to call my API? Basically, it's asking, is this person who is trying to access my API root in the cluster? And in this case, yes. In other cases, no. So in 1.4 and 1.5 and earlier, most clusters didn't secure this Qblet API bit, which essentially made it possible to exit. If you could ping any node, you could exit in to that node, because there were no authentication in this step. But as of Qblet in 1.6, this is all secured. OK. Let's move on to self-hosting. What does it mean? Well, we're using Kubernetes primitives to configure and host the control plane itself. And this concept was developed by CoreOS and has been embedded in the BootCube project. Now we're working, the C-Class lifecycle team is working on upstreaming this kind of functionality into QBetam, into this building block that QBetam is. So you can still run your cluster using static pods. But the chances are, you might want to use self-hosting because it leads to easier upgrades eventually, and stuff like that. There's also another self-hosting talk tomorrow by Diego from CoreOS. Make sure you check that out. And so how is a self-hosting cluster created? That's kind of opaque to many people as well. Here is a sequence diagram, how it works. There's quite some detail as well, but this is a deep dive. So I guess it fits in. First, QBetam has to know what's the actual state, what's the current state. It figures that out by reading the local file system for the static pods. It then mutates these static pods a bit to work as self-hosted components. For example, tolerating the master taint is one thing. And then it just goes ahead and creates the exact same, the nearly exact same resource in the API server. So you could pretend and think of it like, I'm running a new workload. It's an API server that should be run on my master. Then the API server is run, but it can't bind to the port. It's trying to bind to the 6443 port or whatever. But the static pod hosted API server already listens to that port. So the self-hosted API server will go ahead and crash loop. When QBetam detects this crash looping, it will remove the static pod API server. And the self-hosted one will come up cleanly. And now we have a state where Kubernetes can manage itself, which is kind of cool. And then QBetam does this for all components. And there we go. Upgrading clusters with QBetam. We made it easier with QBetam upgrade plan and QBetam upgrade apply. These helper commands have a good UX and makes it straightforward to upgrade a cluster and then hides the complexity under the hood. In 1.8 and 1.9, this basically shuffles around these static pods and disk. And makes sure to catch edge cases and rollbacks. For example, if the 1.9 manifest doesn't come up cleanly, we'll rollback. But then the promise is with self-hosted and A.J. we'll be able to roll it out like any manifest. We'll just edit the demon set. The controller manager will roll out new pods as we go and replace the old ones. And if something fails, it will rollback itself. So that is a cool thing. And in 1.9, we're also supporting automated downgrades. In a future release, we might look at runtime reconfiguration, or basically kubectl apply for a cluster. So let's say you started your cluster with some config, or like normal QBetam init. Then a week later or a month later, found out that I should actually have done this kind of thing. But you have used the workloads there and don't want to tear everything down. Then you could just do a QBetam upgrade apply with a new desired state of the world. And QBetam will roll that out for you. We don't officially support this yet, because there are some parameters that could be dangerous to reconfigure that will get into a weird state. So we probably have to catch that. But technically, it's already working and possible to do, if you know what you're doing. So we have heard a user request that AJ would be nice to have. And yes, it would. The ideal flow here would be QBetam init. Then we get two tokens. We get one master token that we can use to add a master. And we get one node token, as usual, to add nodes. By definition, this means that the master token is equal to an STD write key. And that is a thing like you may ask yourself, is that production grade or is that secure to have a 22-chart string able? If someone else gets to know this 22-chart string, then they'll have access to my STD to basically do anything. So in order to achieve this, we have three primary challenges. First, we have to manage state. And this is probably the hardest one. How do we run a STD? Especially in a secure manner, with unique identities, dynamic identities, like we could start QBetam init, run one master for a week, add some nodes as we go, then a month later find out that, well, I actually want to have two masters, or then three masters, or five. So we can't know in beforehand what will the desired cluster state look like. Instead, we have to dynamically reconfigure, which is challenging. Then sharing and rotating the common certificate between all masters is another challenge. We have these, like, for example, the service account private key has to be the same in all controller managers. Otherwise, we'll get into a weird state where pods will start erroring. So these have to be on all masters, but they also have to be rotated at the same time when we rotate things. And then, how should the QBet be able to address all masters dynamically? So we start with the one master, all QBets talk to this one master, then we scale that up to three. And now, suddenly, they should be able to somehow discover all three masters in a secure way to be added. Well, you can achieve QBetam AJ today. And that is one of the most common misunderstandings, which I'm glad to clarify here. If you run your own HCD AJ cluster to start with, that might not be easy, but will provide documentation for the 1.9 release, what it could look like. And you could build some kind of automation yourself, like Ansible, Terraform, or whatever. Then run QBetam it on the first master, copy over the shared certificates to the second one, run QBetam in it again, and do the same for the third one. So basically, or you could pre-generate all certificates and just distribute them to all masters. That might be preferable. Then some kind of external load balancer. It could be a Cloud load balancer, or a DNS server, or something like that. Just to make the QBetam address all the masters dynamically, or statically for, I thought, also work. And then the API servers should talk to it today. So this is all doable. And as I said, we'll provide documentation for you how to do this. There are more commands to execute than a QBetam join Dashesh master. But at the same time, it takes a lot of pain away from you anyway, because if you're about to do or create a normal AJ Kubernetes cluster, there's a lot of things QBetam already does for you, as said here, with things like certificate generation and control plane spec generation with good defaults, et cetera. So this is going to be documented. Then one possible solution to create this QBetam join Dashesh master flow could be this. We're using the QBetam as the babysitter for all components, self-hosting the API server and control manager scheduler in demon sets, putting all the cluster certificates in secrets, and having self-hosting at CD on top. This is extremely challenging, though. And it hasn't been proven to work. So I mean, this is one idea that we could do. And then use something like Envoy as a demon set to address Envoy as a proxy to address the masters. In the meantime, we might do something in between that you can do QBetam AJ, but you still have to set up a TD externally, and we'll do the other two parts for you, or whatever. But as QBetam has this local scope, it's really hard to, not the global scope, and that it would know of all machines. It's really hard to achieve this QBetam join Dashesh master flow, just like that. But join C-class lifecycle, and you'll be a valuable resource for us to spec this out. Is it possible? In that case, how? If it isn't possible with one command, maybe two, or three. And yeah, that was my talk. I'll post the, or this presentation is posted on SCAD, so you can check it out there. And look at these links. And yeah, let's talk about this in C-class lifecycle and continue to improve QBetam to GA and beyond. Any questions? We have one minute. So as an alternative to HA, sometimes it feels like it would be nice to just have a snapshot of it CD. And if something goes bad, I'm like, oh, panic. And I kind of restore with that. Is it easy to get QBetam to do that? Or is that not at all the right direction? I think using haptio arc or something similar would be useful there, or a dedicated backup tool. Of course, you could add CD, CTL, snapshot, or whatever. Assuming I have the backups. Let's say I have a backup of each CD, and I'm in panic mode, and I try to revive my dead Cubemaster. Can I use QBDM for that? Or am I using the wrong hammer on the wrong nail? I haven't thought through it. But spontaneously, I would say you should use another tool. OK, thanks. That is how this goes. Can you use QBetam to manage existing clusters? Let's say if I have an existing production Kubernetes cluster which is not created or managed using QBedman, can I use QBedman to take over that? No, I wouldn't say that. I mean, it all depends if you happen to use the nearly the same semantics, maybe. But there are so many parameters that's virtually impossible. So QBDM had a slide where you can do everything in phases. Yeah, the phase functional is there since 1.8. And unfortunately, now time's up. So we'll have to. OK, we'll take. You can come and talk to me after this. And also, we're going to host C-class lifecycle in-person meetup at the TACA project in Hilton, Austin. Just right now after this. So feel free to come and talk to us, the C-class lifecycle guys, after this. Thank you.