 Thank you. Hi everyone. My name is Dmitry. I work for Red Hat in the OpenShift team and today I'm going to give you an overview of Metal Cubed and hopefully even show something. Let me first show you some slides. Do you see my slides? Wonderful. So, Metal Cubed, what is even this? It's a Kubernetes project for managing bare metal machines. Using Aronic, as you can guess from hearing this talk here, it consists of three main parts. First, it has Kubernetes API for managing bare metal machines using Aronic, which is pretty much a wrapper around Aronic API that converts our RESTful API in something more familiar for Kubernetes people. Declarative API based on a lot of jambles, just as people like nowadays. Another thing that Metal Cubed project provides is a cluster API provider based on all this. So again, for those not familiar with Kubernetes, cluster API is a self-management API for Kubernetes. So, this Kubernetes API that allows you to install and scale Kubernetes. These are the same ideas, a bit more advanced. And of course, the third component are container images to run Aronic and required services in a configuration that is recognized by Metal Cubed. I'm passing here a few helpful links, Metal Cubed.io is a website. GitHub comment is Metal Cubed.io and there is a Metal Cubed.deaf on Google Groups, which is a mailing list that the upstream project uses. This is a CNCF project. I'm probably going to confuse incubated with graduated. I think it's incubated project. So let's talk a bit about the components. We have the thing called bare metal operator. Contrary to its name, it's not actually an operator in Kubernetes, but a controller. So what's a controller in Kubernetes? It's something that manages certain resources in the API. So in Kubernetes, you can create customer resources in the API. Like in Aronic, you have nodes and ports in Kubernetes. You can just add things. And a controller is what actually backs this API and provides it with some logic. So bare metal operator is managing resources that are called bare metal host. And these resources are more or less directly corresponding to nodes and ports in Aronic. So each bare metal host is a node that can be deployed, can be inspected and cleaned and stuff like this. As I said, declarative API, a lot of YAMLs, I will show you. The peculiar thing about bare metal operator is it assumes the complete authority over nodes. So it manages the whole lifecycle. It's not connecting to nodes, you know, it's managing them. So you create through bare metal operator, you inspect through bare metal operator, you deploy, undeploy, and you delete through bare metal operator. In theory, you should not touch Aronic. So Aronic becomes in this scenario a tool, a backend tool, rather than another API you're interacting with. Now, when you start debugging it, of course, you know how it happens. But that's the idea. What features bare metal operator currently provides? It has a ton of sets of drivers, which I don't like, but that's the case. IPMI Redfish, iDRAC, through Redfish, ILO and RMC are currently supported. The drivers, I think actually IBM see me, they're quite about them. It supports no more deployments of an image. It supports what they call live ISO, which is the RAM disk deploy in these guys. But only suppressing ISO at this point. And it supports running a custom deploy step instead of a normal deploy. Just one step, really, because people ask me to have a simple 8 guys. It's possible to replace a current deployment with one deploy step. A separate inspection currently in-band. Although I know some people hacked it to do out-of-band inspection as well. It supports rate and bias settings configuration. And it supports a bit of an interesting feature of bare metal deploy image customization. So you can build and deploy image per node using... You can provide your own control of the build and image per node, which is a bit interesting. We use it in OpenShift. So what happens with bare metal hosts? So this is the logic, always driven by the state of a bare metal host. So what is written there is the authority. What is happening in Ironic follows. If the bare metal host is in so-called post state, then nothing is happening. If the node does not exist in Ironic, it's created. Credentials are verified. If bare metal host marks the node is already deployed, then we adopt. So yesterday, Ruben asked me whether we use adoption. Yes, we use adoption because if according to bare metal host, the node is deployed, but the node doesn't exist in Ironic, we create and adopt because bare metal host is the authority. Otherwise, if the node is not inspected, so no introspection date is available inspected. If rate and bias config doesn't match what we expect cleaning, if bare metal host has an image but is not deployed, I will explain how it's determined later. Deploy, if the image has changed on bare metal host, un-deploy and deploy again. We don't use Rebuild, we really go from un-deploy and deploy because there may be also bias settings changes in stuff like this. And if bare metal host is removed or detached, so-called detached state, the node is deleted. That's basically its life-loan. Now, a few words about the second component is called Cluster API Provider Metal Cubed. So Cluster API, as I said, is a self-management API for Kubernetes. It operates on clusters which consist of machines. So Cluster is your Kubernetes installation that is self-managed or managed by another Kubernetes. A machine is a node, right, a node on which you install the Kubernetes component, control plane or worker. This project is an so-called infrastructure provider for this cluster API. So it provides, it manages infrastructure on which Kubernetes nodes are installed. The machines are attached to Metal Cubed machines, so it's another custom resource. Metal Cubed machine is a request to find a bare metal host and deploy on it. So each Metal Cubed machine is kind of an allocation in Erronek, I would say, or in Nova instance, let's put it this way. And it all ends up on an Erronek node. Now, yeah, Erronek image, the third component of this picture. We have one container image that can start all most services here, which is bad container design, but yeah, historical reasons. It can start Erronek API in Conductor, which I'm going to merge into one service because last week my changes merged for a combined Erronek service. Inspector, we have Apache for sewing files and for terminating TLS. We have DNS mask as a DHTP and TFTP service, and we have MariaDB, which I want to replace with SQLite by default to make it optional. You can deploy Erronek image the way you want. You can even deploy your own Erronek and point bare metal operator at it. But the bare metal operator has some deployment scripts that install Erronek on Kubernetes, so everything is on Kubernetes. It's exactly the way bare metal operator expects it, so it's a bit convenient. In OpenShift, we do it a bit differently. We have a cluster bare metal operator, which is an operator that manages the life cycle of bare metal operator itself and of Erronek. So this is an operator to manage all operators. Yeah, what else? A few other projects, if you check out our repositories. IP downloader is a container image to download in cache Erronek Python engine. So it has to end up in the Apache container, so it's like a site container or actually run it as a neat container. So a neat container runs when the port starts. The download is IP and cache, I think. There is hardware classification controller, which can label your bare metal hosts. So labels are more or less three or four annotations on Kubernetes. Why it's useful? Because I will show you, when you create a machine, you can ask it to target specific labels. And this controller allows you to assign labels based on inspection. So you can define the rules like if it has more like four CPUs, I don't know, four is fine, right? More like 64 CPUs, assign a label, large. And then on your metal cube machine, on your machine, you can say, OK, host selector, it has matched label, large. MariaDB image, as I said, I'm splitting out MariaDB code from Erronek image. Erronek agent, OK, this is a funny one. In OpenShift, we no longer use Erronek Python engine builder to build images. We run a container on top of CoreS. So since CoreS, there is like upstream CoreS as Confidora CoreS, we also have an upstream version of this image. There is an IPAM for metal cubes. I'm not familiar with that, to be honest, but we can use it. And there is metal cubed def-enf, which is def-stack for metal cubes, if you think about it. So that's a collection of batched cubes and Ansible code that allows you to install metal cubes with virtual machines instead of bare metals. So you can develop and play with it. And that's what I actually want to show you. I have a metal-cubed environment here, fortunately. I hope it works still because I messaged it quite a lot. I want to show you all the things I've just told about as a lot of YAML. Prepare for YAML overdose. Now, I need to change which window I'm sharing. For some reason, it doesn't allow me to share the window I need, which is weird. No, it did. We could see something briefly. Maybe it wasn't the right one. I just saw a terminal briefly. Yeah, it's trying to understand what is going on. Now I can see something. Right, but that's just I've switched temporarily as a window. So OK, I'll try to do it normally. If it doesn't behave. OK, now I've closed one terminal, and I can use the second one. I guess just too many terminals. So do you see a terminal? Yes. Wonderful. So metal-cubed, as I said, it's a virtual environment similar to DevStack. VMs, they're already inspected because inspection happens pretty early when you're in the raw nodes. Basically, you're in the raw nodes, and they inspected right away. And then they're in available state. So mini-cube is like a detail of metal-cube difference. And these are two testing nodes that are running. Now, let's just so that you believe me, let's take a look. So the two nodes, node 0 and node 1, they are here in Kubernetes. So BMH is the shortening of bare metal host, right? Just I'm asking Kubernetes to list all bare metal hosts in metal-cubed namespace. We can take a look at Ironic. There is a wrapper here. So see, we have two nodes. They are named like namespace till the name. So node 0 till the node 0. They are part-on-then manageable. They should have their introspection data already. And we can even check that. Yeah, let's check that. Let's ask it for some details. Actually, I want to use YAML. I'll spare you YAML. I'll just ask it to describe it. It's still YAML-like. So what we have here, it's a resource of kind bare metal host. Its name is node 0, node 0, and it's namespace metal. So this is just Kubernetes stuff. This is just Kubernetes stuff. I don't know what it is even. So the spec. Spec is how it creates a bare metal host. It's essentially what we request to happen. That's a state we want to achieve. So cleaning metadata, this BMC, this API node. Address. Credentials are stored in a secret. Secretly Kubernetes way of storing. Well, secrets. So passwords, stuff like this. Put MAC address, UEFI mode, and online it means powered on. It's powered on because they haven't expected it. So status in Kubernetes is what is the current state of things, really. Credentials have verified, as you can see, no errors. Hardware is a result of hardware inspection. So that's coming from ironic inspector. And as you can see, there is a lot of information. Knicks, firmware, RAM, storage, nothing else. And there is a history. So it was inspected earlier today, roughly two hours ago. And it took roughly three minutes. Status OK, powered on. Yes, provisioning. So the provisioning part of the status is what we expect to deploy on it. That's information about deployment. Boot mode, yes. No image, see? Image empty. No rate, no device. So this is not that it's not deployed on. It doesn't have an image. When it does have an image, we start deploying. We start deploying by putting an image to the spec. You cannot see it here, but there will be an image here. And then it goes through all the ironic stuff. And in the end, it puts the image in status to reflect the fact that this image has been deployed. Now, before we actually do that, I want to show you how it looks. So ironic here is deployed on the same Kubernetes. As the pod in namespace code, I think Bermat operator system. So it's a system namespace. It has, I'll spare you these details. OK, this is a pod with ironic. So a pod is a grouping of containers in Kubernetes. They're all together. And actually, in case of ironic, they're talking through a local host, which they can do because it's one pod. Let's see what's inside. Just scrip pod. Just scrip pod this. I cannot type. OK, that's a pod. It has containers. As I said, the IPA downloaders started as an init container, so containers that starts when the pod starts. So before we start anything, we download IPA. If we fail to download IPA, we don't start anything. Makes sense, I hope. Then we have containers, ironic API. You see a little bit of boring details. The image is cached locally. There's some live checks. There is script we run. It's coming from the ironic image repository. There's TLS settings, ironic conductor, ironic inspector, MariaDB, MariaDB, aren't points, keep a live D, DNS mask. So log watch is a bit funny thing. It's a container. We use it to fetch RAM disk logs. So if you're familiar with ironic, when ironic does something with a RAM disk, it puts the logs into a file, which is not convenient for containers. So we have a container that is pretty much a bash script that loops or checks its directory periodically. When it finds the RAM disk logs, it just unpacks it and dumps it there and deletes. I think it should be fine. I should probably show you that. Cubelogs from ironic log watch. There's nothing here. Okay. Yeah, probably I have to deploy it first. It's just pretty fair. Then, and we can probably deploy it. I wanted to show you something else. Ah, okay. Let's talk about machines. So machines, we shouldn't have them. Pretty much nothing is created yet. So no resources for machines, okay, machines. You can play it. Play doesn't exist as well. So all I've shown you so far was about Bermeter operator. And you can use Bermeter operator just as it is without cluster API. If you need to manage Bermeter on your Kubernetes, you can install Bermeter operator. You can install an onyx, connect them together and install this custom resource definition, Bermeter host, and you can just do that. Now I'm also going to show you cluster API, which is the primary goal of MetaCube to be able to deploy clusters. And there's a cluster already I've prepared it in advance, but nothing is deployed. So I'm going to use scripts that come with MetaCube.dev and the provision control plane, which is one master because I have a small deploy. Okay, this ints, ints, something happiness. Some generate, some templates. I'll again spare the details. It's not exactly interesting what is going on there. There's a lot of YAML generated. And just give it a minute to download the image. And again, you don't have to use the scripts. You can use cluster CTL, just like command here. Yeah, so it applied something. Now, in Kubernetes everything happens asynchronously. So the fact that it has finished in 30 seconds doesn't mean anything. It's probably hasn't even started deployment yet. Hasn't yet, no, it hasn't started deployment yet. It should have created the machine already, not yet. I don't remember the exact order it does things. It has the machine actually first. Okay, yeah. Once you're around the watch comment, of course everything appears. So, okay, we got the machine. Machine is not a MetaCube notion. It's a notion of cluster API. So that's an abstraction around some things that Kubernetes can deploy them. As being an abstraction, it needs some real implementation. So real implementation in this case is MetaCube machine. We have one as well. Creating a machine creates a MetaCube machine. A MetaCube machine is linked to a bare metal host. Let's probably take a look at bare metal host. Is this the one we're deploying? I hope that's the one that we're deploying. No, it's probably not the one we're deploying, right? Maybe it hasn't started deploying yet. Okay, that's the one we deployed. Okay, we're gonna look at another node, node one. Node one uses Redfish, if you can see, through Sushi tools. And the spec is a bit larger, like I said, substantial larger. We have this consumer ref, which means what this thing is used for. We, so it's used for MetaCube machine, this name. It has an image, that's the image. Pretty similar to our instance and for it has metadata, network data, user data, these are for the config drive. All of them are secrets, actually. No error so far. I had an inspection where you talked about that. And in the status, we still have an empty image because the provisioning is not over yet. Okay, let's take a look at the MetaCube machine. Okay, it has an annotation, also free form strings, that it is connected to bare meta host node one. Then there is some Kubernetes stuff. Okay, and it's owned by a machine with this name. So this is this link, actually by directional. It has this template. So that's where the image is coming from, right? But before we saw, there was no image on bare meta host, it came from here. And here it came from a template, which is this template. There's also status, so status provides addresses, addresses come from IPM, metadata, network data, and other stuff. So let's talk about machine first, right? Machine, you see, where is that? Yeah, this is not coming from MetalCube anymore, so it's cluster API. And it doesn't have much, but it has a link to MetalCube machine. So infrastructure F, it means what implements this machine? What implements it is MetalCube machine, that you just watched. And before I probably open the floor for questions, I'm going to show you is template, I think it's code. Okay, okay, that's it's template for machines. What it has, it has pretty much the image. There can be host selector here. Remember, I mentioned you could use labels for select specific hosts. So this could go here, there's currently we're selecting just any host and it's image. And there's also data template, which is used to generate, I think it's going to template, yeah. This is used to generate metadata, network data, and user data. So all the stuff that goes on config drive, if you don't specify it in this case per machine, per parameter host, also we have a template here that is used to generate it. And yeah, it looks a bit fancy. So this provisioning pool comes from the IPAM controller. So it does request me IP addresses from here. And yeah, again, IP addresses from pool. So there's pool and asks IP address from here. And yeah, that's it. Nothing interesting. There's no user data actually, I guess. But it could be. Oh, right, is it finished deploying or not? Provisioned, oh nice. So we have finished deployment. I can take a look at that. Yeah, I want to show you that we have the image status. So there's status and now I have image here. So if I delete the ironic note, probably not going to do that. But if I do delete the ironic note, it should just recreate it and adopt it. I will also upset cluster if I try to do it right now. But you can do it, I don't know. Okay, we still have instance UID, it's active. I think I cannot just delete an active note without putting it maintenance mode. It's probably trying to just fails. Okay, okay, I got it. I'm not, okay, now I did it. Okay, I definitely screwed up control plane installation right now. But I'm curious if we see, it's a rare bring after some time. Doesn't reconcile constantly, so it may take us some time. Maybe somebody actually have questions while I'm doing this. And I don't see zoom window, basically. I do see zoom window. Okay. Yeah, because that can take a while, while the installation doesn't run every second, a few seconds, it's minutes. Thanks a lot, Dimitri. Does anyone have questions? No one has, I go first, maybe. Oh, hold on, hold on. We have a note, look. Nice. For some reason it didn't use, okay, it fixed the name, all right. See, it just happens as you see. Oh, inactive, it adopted the note. So I messed it up and it repaired what I messed up. I think it's a nice conclusion for this demo. Yeah, it's really cool. Stop sharing. Okay, ready for questions. Awesome, thanks a lot, Dimitri. I have a first question. You said that MetaCube uses its own drivers for IPMI, Redfish and so on. Doesn't that create or is there like a potential risk that this interferes with what our running does? Or why does it have its own drivers? Let me get a ring on. I really poorly expressed it. It's not like the own code. They simply just defined the name schema differently. So it's the same drivers we use, just the schema is different and the options are different. So they're not direct mapping from driver, okay, they're really direct mapping, but pretty much different names. I guess I just confused everyone with this. Okay, the other question I had was about like, the metal hosts that are already deployed and that I adopted, right? So, okay, is that only the case when like a note like now is being deleted and then it's like recreated and then being adopted again. Is that the main use case? Because initially I understood like you said, okay, the BEMMETA notes are maybe already deployed and then I just pulled in and adopted into Ironic. So what is the scenario under which this would happen? There are actually a few of them. First, the Ironic port is not persisted in normal deployment. You can do that, of course. So if the master which hosts Ironic fails, it's going to be recreated completely from scratch. So this empty database, that's what you do. To avoid redeploying everything you adopt. That's one case. Second case, when you go from your main, so right now I showed you control plane deployment, which is not gonna succeed because I missed, or maybe it's smart enough to actually restart. Re-deploy this control plane and then we pivot. So we move our meta-cube installation there to the freshly deployed control plane and shoot down this management cluster. So that's again a case where Ironic gets restarted on a completely new location, empty database and we can adopt the notes, including the notes themselves that host Ironic. We have a similar process in OpenShift, not exactly the same but pretty similar. We have an installer which also creates by meta-host and then moves them to the actually deployed cluster. And can you adopt? Yes, in OpenShift we have an alternative installation procedure and they are doing adoption. So it's possible to create a bare meta-host that is so-called externally provisioned. So which is meta-cubed speak for adoption. Yes, they do that. Okay. Thank you. Any other questions for Dimitri? I don't see any hands or any questions. Just speak up if you have a question. Doesn't seem to be the case. So Dimitri, when you deleted that node at the end there, and then I thought you commented that it was adopted. So I'm a little confused by that. Was it deleted in the Ironic sense? So what I did, I just took direct access to Ironic and force deleted an already provisioned node. What it means from Ironic, the node disappeared. What means for bare meta operator? Bare meta operator is the authority. It has this bare meta-host resource corresponding to it. And it has image in spec. The spec has an image and status has an image in this match. So it knows this node is deployed. It's the authority. So what it does, it checks with Ironic periodically and says, okay, there's no node corresponding to this bare meta-host. Let's create it. It moves it to manageable because that's part of creation. Then okay, according to my records, it's deployed already. So it does adoption in Ironic. What adoption means moves directly from manageable to active without the actual provisioning. So you tell Ironic, it's already deployed. There's already instance running there, just mark it as active. That's what adoption is. I see. And that's because MetalCube is the authority and it just trusts that the bare meta is in the same state that it was in last time it looked. Yes. Yes. So it doesn't really know that the bare meta is in that state. It just trusts that it is. Right, when it's a part of the cluster, there are like higher level logic. So when, for example, like cluster API realizes that it can no longer talk to this master, for example, they don't actually go destroy it. Then there will be processes for tearing down this master, finding a new machine, deploying a new master, or anything like that. But it's high level logic. Bare meta operator is not doing that. There are safeguards against what you're suggesting but just not at this level. Okay, thank you. Thanks Richard. Any other questions? No? Okay, thanks a lot Dimitri.