 Hello, and welcome everyone. I finally figured out how to unmute my audio when screen sharing and recording are on. And with this great achievement, I would like to welcome you to Bermet Oseek 2020-3 Quarter One Meetup. It's a new format for us, please bear with us. We have prepared some exciting stuff for you and first and foremost, we want you to participate. So it's not gonna be just us talking, it will be hopefully also you talking and asking questions, telling about your experience and proposing crazy features. You know, doing all this sort of interactive stuff which we like on face-to-face meetups, but face-to-face meetups are scarce nowadays, so this is it. And I guess the necessary introduction while we still wait for maybe more people to shop. So what a Bermet Oseek. I'll read this wonderful paragraph which I copied from someone else's slides out loud. The scope of the Bermet Oseek is to promote the development and use of ironic and other open stack Bermet O software. This will include marketing efforts like case studies of ironic cluster industry and academia, supporting integration of ironic with projects like airship and the Kubernetes cluster API, coordinating presentation for industry event, developing documentation tutorials, gathering feedback from the community on usage and feature gaps. Right now, right now it's happening and other broader community facing efforts. Again, right now to encourage the adoption of ironic as a Bermet management tool. And we probably should add that stuff built on top of ironic as well. Small request, if you're not speaking, please meet yourself. Don't make us meet you, it's gonna be rude. So why this happening today? You're gonna share some news from the community. Then we have two panel discussions. So we'll prepare a few slides, some of us then we'll ask you things and discuss things, hear your proposals, maybe write down. By the way, maybe someone could take notes in the other part or maybe Bermet O sync other parts. The second, so the first topic will be things you didn't know ironic can do for you. So stuff beyond what you think ironic is. Second topic will be scaling Bermet O provisioning. And then we hopefully have time for open discussion where we discuss things. The proposed topic is battle story. So if you have something crazy, that ironic detail did not for you. Hopefully not breaking stuff, but that happened in the past too. You'll be welcome. With that, I will tell you very quickly the news from our group. As you can see, we're switching from short monthly meetings to longer quarterly ones. We hope that it will give you more opportunity to interact with that, to learn more. And again, hopefully it will help you allocate your time so that you allocate two hours once in a quarter rather than you need to allocate one hour every month. We'll see how it works. We'll be looking forward to your feedback, reach out to any of us really. And the second news, we, the few of us, namely Jay and myself, participate in talk selection for the Open Info Summit, which is gonna happen in June. And of course I should have written the dates, but I forgot them. The hard-renewed track will have a few very interesting presentations, which I'm not gonna spoil to you, but they are really great. They cover a broad variety of topics, even things we all agree we don't know much about, and things, of course, we do know much about. So come to Open Info Summit, or if you cannot come, check out the recordings, I think they're always public. And yeah, see what is in hard-renewed track. We prepared that for you, with love for you. Dimitri, can I just jump in? Sorry. Absolutely. The notifications for the authors should be sent out these minutes. So I checked yesterday, they were aligned with this meeting of the bare metal stick to actually send out the notification of your track, if your talk has been accepted, and also the schedule will be published, I understand, at the same time. So that should happen any minute now. All right, so yeah. We're gonna learn very soon what exciting stuff we prepared for you, and of course other teams just as well. Now, for the news from our ecosystem, I first give the mic to Jay. Hey, yeah, so we've been working on the Antelope release, which one of the things I'll mention is, well, first of all, I forgot to put the date of the release in here for one. It's coming in at the end of March, but Antelope is the first release we're doing this new naming scheme. So you might still see Antelope in some of the marketing talk or whatever, but we're gonna start referring to them using a time-based version number. So it's actually gonna be 2023.1. So that means that if you're consuming this downstream, if you're consuming stable branches, those branches are gonna be stable slash 2023.1 once that release is cut. So just a warning in case you've got some system that expects it to only be letters, you got plenty of time to fix it. What is that likely to include? Well, we're working on getting no charting support landed in Ironic, and we'll talk a little bit more later about what that means. For Antelope release for operators, it probably doesn't mean a lot, but it is laying the framework for us to do sort of some extensive scaling around some of our least scalable parts right now. We've made progress toward integrating Ironic Inspector into an existing Ironic service. We did not get that work completed this cycle. So that's gonna continue into next cycle, but the end goal there is we're gonna get rid of Inspector as a separate service entirely and Ironic's gonna run it all. It's become a core part of our ecosystem and it deserves to be home in the big process instead of relegated to its own. One really cool thing, and this is actually a feature of Ironic that doesn't get talked about enough, is Ironic will happily send you application metrics. It's supported doing this via stats D for a long time, but stats D is not what the cool could use for monitoring anymore. Prometheus is the new coolness and so Julia was nice enough to do some work to hook up our application metrics so that now along with your hardware metrics, they can be exported via Prometheus using the Ironic Prometheus exporter. That's gonna be landing in Antelope. That's really exciting. And honestly, you should hook it up in Antelopes that you can see once we get sharding things in place for our clients in the 2023.2 release, you'll actually have metrics to tell you how much faster your cluster's running. The last one, and this is something that only went mentioned on here, but I think it's pretty cool like sort of another hidden feature of Ironic is we're working on a Metal 3CI job. Demetrius actually, and this is a really neat thing because it's gonna be the first time we're testing end-to-end our SQLite support in the gate. So you can run Ironic happily against a SQLite database, works great. And we're gonna test it to make sure it keeps working great. That's how Metal 3 runs, it's supported. And I don't think that's a lot, I don't think a lot of people really consider that because you always assume you need a big database but with Ironic, we can do it in a file. So I'll be very excited once that CI job gets up and running to make sure we don't break it. The other thing I wanna make sure to point out is we are having another virtual PTG for the Bobcat release. That's gonna be March 27th through 31st. We're still working on topics for that. So if you have a particular pain point, something that is causing you issues that you don't hear us ever mention or talk about, or you're concerned, feel free to go to that etherpad and propose a topic about it. Especially if you're gonna be there at the PTG to talk about it, cause that's what we need. We need that feedback loop with y'all knowing what's actually going on. Quite frankly, the longer we work upstream, the more separated we can be sometimes from the operation. So if you give us that feedback, let us know, we're happy to try to deliver what we can to make Ironic better. And that's it for Ironic News for this bare metal SIG session. Back to you, Dimitri. Thank you. Does anyone have questions for Jay or any of us? Yeah, folks, it's an interactive session. So feel free to interrupt, ask things, comment on who would use a SQ Lite in production you created. Yeah, it's supposed to be productive. It's supposed to be fun. Anyway, a few words from the ecosystem projects. Bifrost, so who knows what Bifrost is? Bifrost is a standalone tool for deployment of Ironic and for deployment using Ironic. It's written mostly in Ansible. It's part of our community deliverables. So it's kind of official thing by Ironic, although we of course recognize and appreciate all the products that use Ironic and rely on Ironic. So Bifrost is this last quarter, got support from Bunto Jamie, which is common topic in OpenStack. There was a patch at custom cloud need configuration per note in a convenient way. It was some pretty interesting rework of our Pixie configuration, which among other things added support for Pixie, not I Pixie boot. So we used to have only I Pixie more or less. Now Pixie group boot is a thing too. And there is another going effort to support future versions of OpenStack SDK and Ansible collections OpenStack, which are undergoing a huge rework and we are adapting to these changes as they happen. MetalCube, yeah, finally, my topic. So MetalCube, what is MetalCube? Yeah, you can pronounce the Metal3, but the Canonic way is MetalCube. Really, there's really supposed to be high up, but I cannot do it in Google Docs. Anyway, MetalCube is Kubernetes project. We use this Ironic for bare metal provisioning. So we provide all the useful Kubernetes stuff, so customer services, controllers, and all the things we are used to in Kubernetes world. Let's use Ironic under the surface and provide more or less the same bare metal provisioning features than you used to Ironic, but in a Kubernetes fashion. Most of them use this quota around usability and community building processes. We got a new user guide. The link here is it's temporary home. We are looking into placing it somewhere under MetalCube.io. But anyway, this user guide, it still has some gaps and you are very welcome to populate these gaps or provide a feedback on them. We get some refactoring of the deployment scripts to use things called customized components instead of, you know, so basically we reduce the number of yaml in our repository, which is always good. We established some formal processes like vulnerability reporting, releasing process, versioning, code of conduct, all the things. And we started, we are starting, hopefully, finger cursed, but everything seems to be settled. This cycle participated in Outreachy, which is an internship program for people from underrepresented communities and groups. Hopefully one in turn will join us this cycle and we'll see where it leads us in the next cycles. If you are from an underrepresented group or you know people from underrepresented group who wants to get experience in IT, get into paid internship this summer, reach out, Outreachy.org, MetalCube will be a part of that, OpenStack, hopefully as well. Yeah, if you want to mentor, talk to me. I know things, I know things, right. And this was in use. Before we jump into the first panel topic after just 15 minutes of talking, does anyone have any questions or maybe news or announcements or maybe some ecosystem projects that I missed? Okay, at the risk that everyone is like, like opening a web browser and just checking it, the schedule as I said for this summit is online so you can actually check who's like giving a talk on the Haber and Maverick track and Dimitri didn't promise too much, there's like awesome talks there. But please stay here, don't move. But the schedule is online. I'll also note that the summit this year is also co-located with the forum. So we will be having some form of in-person, ironic discussions at Vancouver as well if you're gonna be there in June. But who knows what we'll be talking about that. Seems like it's forever ago from now. But we do have a question. Cuba was asking, how is Metal3 different from running Kubernetes on top of OpenStack? Right, so Metal3 is not about running Kubernetes. It's a Kubernetes component. So it's a thing that runs in Kubernetes that can provision bare metal machines for Kubernetes, for example, or for any other purposes. And just to give you an idea, so at Red Hat, I'm part of the OpenShift team, we are using Metal3 and directly ironic as part of both our installer. So we bootstrap essentially Kubernetes by first installing a small Kubernetes installation in a VM, then using ironic there to install three control plane nodes. Then using ironic there in the three control plane nodes to install worker nodes and scale up, scale down when you need. Or you can use this Metal3 components for whatever you want to use them. For example, we have people in OpenStack team looking for provisioning, for using OpenStack. So OpenStack is control plane on Kubernetes, but provisioning OpenStack, they're not called workers. So they compute nodes using MetalCubed. So it's not a Kubernetes node, they separate. Let's answer a question. But it means if you already have an OpenStack running, your need for MetalCubed is limited. If you have OpenStack, if you have ironic, and if you do not want to use ironic through Kubernetes API, that probably you don't have a case with. I guess the case is a bit different, you have Kubernetes and you want to have bare metal provisioning. And there are several projects for that. Aronic is arguably one of the most major projects. So you can have bare metal provisioning in your Kubernetes. Something we've been talking about as a vision with some of the people in the core team over the course of a while now, since standalone ironic has been a thing, is thinking about ironic not as an OpenStack component, but ironic as an anyStack component. If there's a stack that needs bare metal provisioning, I firmly believe there's not tools that are better suited to do that than ironic. We've got years and years of knowledge and experience running on real hardware, fix some bugs around it. And so I really like MetalCubed for that reason, of that it's another entry point. So it's sort of like, if you're looking at ironic from an OpenStack perspective, then like MetalCubed doesn't make a lot of sense. But if you're looking at old school ironic from a Kubernetes perspective, it doesn't make sense. But the MetalCubed makes it make sense, right? The, oh God, I might have Kubernetes people coming through the Zoom screen on me here, but I almost view it as like Kubernetes is serving the role of like the other OpenStack components. It's like, it's doing the Nova pieces a little bit for that. It's doing the other things, like sort of buffering that between a Kubernetes API. And I'm dumbing it down because I don't know the details. I'm not a Kubernetes expert, I'm an ironic guy, but I always found that as really cool to give people another entry point into ironic. And I mean, from a certain point of view, Bifrost is the same thing except we're using Ansible as the entry point instead of OpenStack or Kubernetes. So like that's another thing, like to keep an eye on and something we're always looking for suggestions for. If you've ever used another open source tool and just, oh, I missed, I wish this could provision bare metal. Like those are the sort of things we wanna hear too because getting ironic embedded into more ecosystems is gonna be good for everyone. And so that's really exciting. Yeah, yeah, Kubernetes and bare metal. It's all very exciting. This is quite frankly, that's the confluence of why my employers interested in it right now is that getting the bare metal servers and Kubernetes scheduling can give you some pretty cool things. So thanks for all the good work on Metal 3. I don't get to talk about that a lot with you, but it is really cool and I enjoy it. And yeah, go talk to Scott's talk. Congratulations, that was beta tested in a bare metal SIG. Okay, folks. Any other questions on MetalCube to an ecosystem on the news? All right, so the first panel topic. The format will be as follows. So a couple of us have prepared slides. So we're gonna talk about our experience with ironic just to bootstrap the discussion. Then things we didn't know ironic can do for you. We hope we discover something cool you're doing with ironic that's just more than just a pixie provision or writing QCOW images. But you also wanna hear your crazy ideas what ironic could do for you. And they can have promised to implement all of them. But we can promise to hear the group therapy, you know. I guess the starting person is me again with my MetalCube. So what cool stuff are you using? Virtual media deploy. I wanna say I already treat it as a boring thing. Like everyone does that. And then I think Julia came to me and asked Adina if a lot of people are using this and I'm like, wow, different words. So okay, virtual media is a boring thing for us. Probably not so boring for most of you. It works by connecting a CD image to the BMC directly. So HTTP request. Instead of going to all the business with pixie booting, HTTP servers, which returns specific options use TFTP to know not the image, then blah, blah, blah. All it has gone. You have a CD image with everything, which you tell the BMC to boot it. Boom, that's your RAM disk. It's done, it reboots and synths. That's all. People in my world are very excited about that. So excited that they actually only use that. Many of them. Cool. Think about that. You can do deployments without HTTP. You can embed your network configuration, the CD image. And that's it really, right? It can have static IP, static routes and you don't need any HTTP. You definitely don't need L2 connectivity because there's no, again, you can have local HTTP, you can have no HTTP, stuff like this. Yeah, if you have any questions, just interrupt me or if you can rise a hand here and zoom it. Yes. How do you do the virtual media deploy? And is it also something that will come to Aronic? That's a feature of Aronic. Oh, I'm talking about here a feature of Aronic. It's not like specific features on MetaCubed. And MetaCubed doesn't really have bare metal features on its own. It's just a wrapper around Aronic. It presents it as a Kubernetes API. So anything I'm talking here is available in Aronic, to one extent or the other. So that's Aronic features. It manifests as boot interfaces. So there's, for example, a boot interface called Redfish Virtual Media. There's also ILO Virtual Media, RMC Virtual Media, iDRAC Virtual Media, I assume, iDRAC uses Redfish actually. Or maybe a couple more. So it's more vendor-specific than APMI that I need to tell you. But we see an increasing number of hardware that supports Redfish and supports Virtual Media through Redfish. So you enroll your notes with a compatible server using this boot interface, for example. You have driver, it goes through Redfish, and you have boot interface for Redfish Virtual Media. And you provide an ISO, or you can provide kernel RAM disk as you used to. And Aronic will build a small ISO out of your kernel RAM disk and connect it. Everything else is the same. That's the same deployment, just no pixie is gonna happen between you and the note. Clear? Or less? Yes, thank you. Cool. So Virtual Media, play with that. It's pretty cool. Like in my world, it's dominating the things. We use feature code adoption. That means if you have an already deployed note, when I mean like you have a server that is already running a operating system in your software, and you just want to manage it through Aronic, you can make it active right away, without redeploying. Is it useful? Yeah, it's very useful for us, because the way we work, first, when I mentioned we bootstrap Kubernetes Bay, having first meta-cubed in the virtual machine, and then real meta-cubed on the control plane. So we migrate control plane notes from this VM to control plane. So we're in control plane in itself using adoption, because obviously we cannot reprevision them, because that's the notes that are running Kubernetes. We also do the same on upgrade, because there's a default mode of operation in meta-cubed, and it can be different depending on how you install it. But the default, we upgrade, and we do it in OpenShift2, by just tearing down both, and starting it from scratch. So we use adoption to make hosts that are already active in Kubernetes database or also active in the running database. Without that, we would have to persist the database which we did not want to do. You may do that depending on how you use meta-cubed, but we don't. Yeah, questions here? I have a question. Is there support? So let's say someone wanted to do a side grade into Metal 3. I've already got an ironic that's mostly standalone. Like, is there any starting point for Metal 3 that isn't Metal 3? Or is it more or less you have to start, you know, Metal 3 is your ironic? So the question essentially, can you use external or ironic? Yes. You can have ironic install somewhere. Yes, but the but is, we have a bit opinionated about what ironic can and can do. For example, we don't support keystone authentication. We may, contributions welcome, but there may be some assumptions. Some of them may be a bit poorly documented, but yes, that's absolutely a supported feature. And Metal Cubed upstream is pretty agnostic. In OpenShift, we are dictating how things should be running, but that's I think about OpenShift. Are there, do y'all know of many Metal Cubed users outside of OpenShift? Like how much trailblazing would it be if you went outside of those guardrails? So, well, Metal Cubed community consists mostly of two biggest contributors, are Red Hat and Ericsson. I apologize if it's called a bit differently officially, but essentially that's it. So what they do, they're definitely not using OpenShift. We have a few adopters, actually adopters file in Metal Cubed docs. It says Deutsche Telekom, it says IKEA, and a few others. I'm pretty sure they're not using OpenShift. And we sometimes see people who are doing heavy customization for Eronic for their Metal Cubed deployments, including up to writing drivers, I think. So there will be some trailblazing, definitely, but it's not like a crazy case nobody ever tried. It's absolutely things people do. Thank you. All right. Something JRD mentions, we use Eronic in all-in-one mode with SQLite. So all-in-one mode, it's a thing that I added a couple of years ago, maybe a year ago. It's eventually Eronic API and Eronic Conductor in one process. So a bit contrary to what traditional OpenStack is doing, we have API, or APIs grouped together and then we have Conductors, Engines, whatever you call them, you know, the compute somewhere separately. We benefit from just having one process called Eronic because we want it to be lightweight appliance. We use SQLite by default. Metal Cubed upstream supports MariaDB. In OpenShift, we don't do that because everything is ephemeral for us. We currently turn off RPC completely because it's a single process, but we want to go multi-conductor, multi-process in this sense. And there are some problems with that I'm gonna talk about in the scaling section, but we are looking into that. Yeah, as I mentioned, the database is ephemeral. So it's rebuilt on startup. Both starts a new database and then Metal Cubed companies orchestrate, rebuilding stuff if needed. Eventual consistency, we're all live, right? And then the last, no, that's actually not the last I have a whole on the other slide. Secure Boot Management, it's a feature of some drivers in Eronic. Redfish supports that. I know a few others, I don't see anything too. You can tell Eronic to turn Secure Boot Mode on before rebooting into instance and turn it back off before rebooting into RAM disk. So on teardown, for example. Just very handy, you make sure your instances run with Secure Boot, but it, for example, doesn't get in the way of Pixie, which does not really like Secure Boot, depending on how you exactly configure it. This grub, maybe with IPXC hard. Yeah, this feature has been in Eronic code base for a long time, but I haven't heard a lot of people using it, so it's cool to mention. Do I need to talk about what Secure Boot Mode is or do I assume nobody's screaming? Yes, so. Go for it. There's some stuff, not an advertisement, but some stuff we do specifically in OpenShift, Miss Metal Cubed and Eronic, that upstream Metal Cubed supports, but maybe doesn't actively use. So we use RAM disk images per node. Most of normal people will have Kono RAM disk ones downloaded from Terbo, so it's open therefore, can use them. We build images on the fly. So we take a CD, and core SCDs have a special region where you can insert things. So we serve that, so we have a GTP server written in Go, that serves this image, but inserts your custom information into this area, into ISO, or on the other hand, we need RAMFS, for those who don't know, we need RAMFS images can be concatenated. So again, we serve base in each RAMFS and we can concatenate a CPIO archive to that with customization. So that's how you end up with RAM disk images per node, that are heavily customized, but not duplicated anywhere. This is what's old is new again, isn't it? Isn't this the way the original IPA image was made? With CoreOS, back when it was CoreOS, we dropped stuff in the OEM durr, and that was even pre-ignition. That was CoreOS cloud in it. So it's sort of strange. I feel like I'm in Bizarro land. It's like, wait, I did that 10 years ago, but y'all are doing it on the fly. That is super cool and exciting. And that actually, honestly, that's one of the secret best features of CoreOS that no one thinks about is the support for the OEM inject stuff. Like that's really good. And it greatly amuses me that that MetalCube has gone this direction as well. Yeah, that's specific in OpenShift feature. You can do it MetalCube, of course, and there's a whole OpenShift open source, so you can just use it. You can do that with anything, right? Like I would say, you're saying this is like an OpenShift feature, but I'll say in my experience operating Ironic at multiple different places, many of them that were scaled up very high had their own bespoke processes for building RAM disks, had their own policies around what they wanted inside of them. In some cases, even did a different distribution or something. So I don't think this is... I don't view this as like, you're not advertising MetalShift, you're talking about like the specific deployment decisions they made and most of these mirror pretty closely the same type of decisions that folks who do it themselves are making. Right. Yeah, so I'm highlighting that's a cool thing. In Ironic, you can have RAM disk images per node. Oh, you can... Since recently I have Kona parameters per node for Pixi, which is also used. What else? We, since quite recently, have users of custom deploy steps. The deploy steps are way to split the deployment process into small units with ability to customize them per driver and per RAM disk and to request certain deploy steps to be turned on and off during deployment. So we use Ironic Python agent as everyone else but we don't use the default image writing process. So writing QCOW or any images. Or we have a custom deploy step that calls CoreS installer, which is a CLI that comes with any CoreS image and that it stores CoreS from this image. So if you put from ISO, you have everything in this ISO that is needed. So you just can just install this ISO onto the disk without downloading any other images. So that's pretty cool property of that and use custom deploy steps for it. Yeah, it's part of our RAM disk image building process, the sentences, this custom deploy steps because we use Ironic Python agent containerized. We have a container visit with this deploy steps embedded and we have CoreS images that download this container and start it up and, you know, I think J is having some memories again. This is on metal. Like honestly, like you're described a lot of the things like you put all metal in a box with the Kubernetes face and it just, I've been kind of ignorant some of what's been happening with metal three and stuff. This is all very exciting to me. I'm having a great time. Great, I'm glad to hear. Yeah, so custom deploy steps are very powerful. And as I said, you can complement the deployment procedure with your own steps or you can replace the deployment with your own steps. And that's what we do. Of Ironic components, we only use UFI slash bootloader installation. Actually only UFI bits, we don't use legacy bits. We nearly never use legacy, but by the way, we mostly like nearly 100% UFI. And the last but not the least, we, for one of the products built on top of our stuff, we use currently RAM disk deploy because they have their own installer essentially to put it shortly. And so again, everything works as usual in Ironic, but instead of normal deployment, we just put their installer and it does a job. It's pretty cool. It's a bit like a side pass from metal cube perspective, but it's an interesting way to see how Ironic and metal cube can be integrated with stuff that is not aware of Ironic or metal cubed. And I'm aware of at least one person who is not affiliated with Red Hat. He did the same thing. They essentially used, it was before we supported CoreS installer. So they built their own thing using RAM disk deploy essentially boot of CoreS RAM disk and somehow automated CoreS installer. So that's pretty cool. Even if you have an installer completely agnostic can use that. Any questions, comments? Good memories. I mean, I'll also say if you're deploying this and you're struggling with any of these things, don't be afraid to come into IRC and ask about it. We've all done it before. Swapping these stories is lots of fun. We know where a lot of the dragons are. So like, don't be shy. If you're trying to do some of these more advanced Ironic things where you have to do sort of a little bit of custom development on your own to make them work. Come ask if you're having trouble. Ask on the list, ask in IRC. We'd love to hear from you. Okay. The mic goes to Anna. Okay. Thank you Dimitri. So I will zoom out a little bit for like from, from hardcore very specific features that you use Ironic for to like more of the operations side of things. So we're still in the section of things that you did not imagine you could do with Ironic. And when we started with Ironic, something like five, six years ago now, we of course used it for the like prime use case which was like to install machines. So this is what it's called beyond pixie installation. So this is what we started with. And then we explored the whole like metal management, bare metal management universe of all the things you could do with Ironic. And this graph basically shows all the steps that we have when we handle physical machines. And I put the Ironic icon wherever Ironic is involved now. So we basically moved the whole management of bare metal into Ironic. So when we started a couple of years ago, it was only the provisioning really. So the like installation of physical machines and all the rest was done with custom tools that we had built here and for our deployment. And over time we realized that actually Ironic is so flexible and the various frameworks that you have for instance, the cleaning framework is so flexible that you can basically hook into various things that we were doing with our custom scripts. And this way we could like moving into Ironic, we moved a lot of the stuff and our experience upstream into the tools that are not also used by other deployments. So going through this graph, it basically shows the path like bare metal takes in our deployment from the initial registrations is basically when you've like switch on a node it basically registers with all the various databases specifically Ironic. So the moment a node boots up it basically gets served the Ironic image does an initial introspection and sends the data back into Ironic. And then there are hooks in order to like treat this data in order to use it in order to register the node with the various databases that we have. There's an initial head check that actually does an inspection. So the inspection data is actually not only going to Ironic it's also going to an S3 backend. And there's like other tooling that actually extracts this and then analyzes this. In order to see if the service that we got is actually compliant with what we ordered. Afterwards, the nodes are like burned in in order to like create additional or initial failures. You may remember this bathtub curve. All of this like CPU memory disk networking test is something that we put upstream. CPU RAM and disk is relatively straightforward and networking is a little bit more tricky because you need two partners that actually talk to each other but they also then like using a zookeeper backend for instance can find partners and then they stress test the network interfaces. This is most also true afterwards verify that you can actually transfer as much as you like for it as much as advertised through these nicks. We benchmark the nodes with Ironic so we have very because we basically buy at the sweet spot of performance per dollar if you like. So for us it's very important that we verify once we got the servers that actually are as performance as we liked. And the benchmarking is something that is also driven by Ironic. So the Ironic cleaning framework is used for this as well. There's various steps where the nodes are configured. For instance, for software RAID something that we also contributed upstream. Then the provisioning of course this is like the initial use case for Ironic as a provisioning driver and Nova and now through Nova. And this of course we still use. Then there's adopt step. So the adoption is okay. It's a little bit or slightly misleading because Ironic has an adoption but what I meant here is like adopting nodes into Ironic and Nova. So that means we adopted in production nodes while they were being used into the system and afterwards they're fully managed by Ironic. So this was mostly to drive the adoption of Ironic to manage our bare mental fleet. And then there are repairs because we have thousands of nodes and they break all the time there's something broken. We need something that the repair team which is the team intervening replacing this or many modules can actually use easily in order to move the nodes out of production flag them in the various systems as being in maintenance at the moment. So the support that Ironic has for this like the maintenance mode is something that we leverage as well. And in the end there's a retirement which is some of the nodes that we have will actually not be thrown away after we use them but they're usually quite good enough to like donate them and the retirement step is basically to re-burn in so we verify that nodes that we actually donate are not broken but as the performance just have reached their end of life here. So all of this is basically managed through Ironic now and we moved in the past couple of years everything that we had done locally into Ironic into mostly into the cleaning framework into the various images. This is true for x86 for instance but we also recently added arm nodes. So it works the very same for arm nodes. And this is how we manage bare metal now. Do you use rescue mode? No, rescue mode is not something that we use. Well, one thing that is missing here I see that Scott is also still on a call. Something that we were discussing is what's something that's missing and it's like the only one of the very few things that we still do outside of Ironic is GPU benchmarking at the moment. So there like we're looking into how to do this with Ironic as well. Okay, any further questions on this one? Well, I think me and Sam started talking at the same time and I just scared him away, sorry about that. I was just gonna ask like in terms of timeline like I know you've been involved in OpenStack and Ironic for a while. I'm curious how many black and white bears we had on this graph when you got started at CERN? Like can we maybe talk a little bit about the journey as well as you know, where you're at now? You mean in terms of the bears? So actually this graph I updated for this presentation there were, so when we started there was nothing. So we started like OpenStack in 2012, 2013 very early on. We started of course with the core components. Ironic came relatively late. I think Mateusz was also on the call or worse. He was actually there when we started with Ironic. So this must be like around 2017 maybe. And then the- I think 2016, sorry. 2016 even. Yeah, summer 2016 we started. Okay, so it's relatively, it came relatively late and the provisioning was the first one we worked on and that took us quite a while and we were mostly focused on this. And then more and more we expanded. So I don't really actually remember the exact order. Registration is one of the last things that we added like auto-registration is probably one of the last things beforehand we were purchasing notes mostly by hand. Repair was relatively early as well because Ironic had already like built-in support for this like the maintenance mode that we leveraged and all various toolings that we have. And then we relatively quickly added rate support because software is something that we use widely. And this was actually also my first, I think major contribution to the code base and was quite welcomed. It was very nice. It was a very nice experience at the time when we contributed to this and was integrating there was lots of uptake of this. And equally for benchmarking which is like more recently it's something that people use and I get questions about how to use it or how the works I think is also quite heavily used. I'm not sure if anyone ever used the adoption that we did. So the adoption that would add things also into Nova which is a little bit tricky which we needed to do because we could not like just delete or reinstall the whole fleet. So you will see probably one of the later slides you will see I will come back to the adoption. You will see how steeply the number of notes within ironic rows at the time when we did adoption because I mass adopted these notes because we were trying to be very aggressive on moving everything to one system. But I'm not sure if someone has ever tried this in addition to us. We have a blog post detailing how to do this but it's a little bit more intricate because you have to basically trick Nova into believing it's installing a machine while it is not in order to get all the entries in the database, right? So Arnie, like that's used by a large amount of people. That might be a, that's actually interesting because it's sort of, I had no idea you didn't know. I've worked multiple places that used certain style adoption like swapping out fake drivers to get things booted into an over instance. Like that was used very, very extensively at one place I've worked and similar patterns were used at other places I've worked. I was totally unaware. I think that's a significant problem and it sort of is the nice thing as like demonstrative of the fact that like, I think we would love if Nova would support something like that natively but we don't necessarily have the ability to go change Nova's model. So that's kind of why it can be interesting to get ironic with different entry points because you kind of have different sets of trade-offs cause I would wonder what like, maybe the adoption story is better for Metal Cube because Kubernetes handles that sort of stuff as a first class thing better. It's just, it's interesting to see the sort of trade-offs you make when you put different software around ironic. But that's very good to know actually because like, you know, like this is exactly what we do with the blog posts for instance for like summarized what we did so that other people like can move along this just as we move along blog posts that we find for specific tasks to hear that's actually helpful and used somewhere else is great, I didn't know. So I've seen one more than one occasion in my past OpenStack days customers deleting their whole over cloud and then repairing it because cleaning was disabled using a similar process even before your blog post came to life. So that's maybe not like a positive example but it's definitely been happening in my experience. It's also not that we invented this from A to Z, right? We also build on top of like various bits and pieces here and there. I mean, fake drivers, we didn't write the fake drivers they were there for some other purpose but we put together like various recipes in order to like have an A to Z like, okay I have a note that's installed note that's completely outside of OpenStack and ironic and I end up with a note that's completely controlled by OpenStack Nova and ironic but there were various bits and pieces that we took from others as well. So, okay. And basically good. Thank you. Yeah. At this point, I want to open the floor and hear about people's cases, ideas, experiences that may be just fun story where it's something you want to do and something you need. I have a question. Has anyone made it possible to use the graphical console of hardware and give that to the end user especially in an OpenStack case? I think that's another one where the answer varies based on the hardware you're using. I think we haven't implemented the base interface yet. So there are patches in certain state of readiness which implement that, but we haven't merged that. Yeah, I'm looking for it right now I'll make sure it gets linked in the chat. Would it be instead of the serial console or would it even be possible to use both? Honestly, the specs are in progress the code's in progress you might could influence that with a review. I think I think we're doing this to use boss. Perfect. So it will make my colleague happy. Yeah, I'll note that it looks like right now that the only hardware support for that landing is for Dell-DRAC and the patch involved has minus ones on it that hasn't been responded to since November. So don't get too excited, but it's in progress. It just may take a while. Yeah, I was also, I mean, want to like manage expectations because this is like a recurring issue. Well, a recurring issue, a recurring request but it's not that easy to do that there have been various attempts in the past there was various co-patches that were put up by different people that never like materialized into something that is like widely used. But I agree it's a very interesting and very useful or would be a very useful feature. My hope was that with RATFISH that would become a lot easier, but actually no, not a lot easier at least. We had one board, one specific type of hardware where actually the RATFISH, the BMC via RATFISH would provide you with a like one time URL that would open a graphical console. That is already like pretty cool compared to everything else that I had seen so far. So you basically could use this then in order to, for instance, embed this into a horizon and then you just have a link that you click and it opens through your HTML5 page directly to the console. But I have seen this only on one type of hardware. I tried to like do this with RATFISH for some time but it never got anywhere. But yeah, again, it would be really, really nice to have this access to graphical console. Actually, my teammate is working on this task with no community and with ironic communities, I remember, and he has a good progress. So I think it's possible that this feature will be implemented in Bobcat or maybe next release. That would be awesome. Yes, the implementation that we can found in community was made for iDRAC driver and it replaces serial console and the implementation that we are working on it uses Blueprint which allows both serial and graphical console to work. So I think the good answer is we need to wait, but not for a long. Can I ask a question about rescue mode which was asked to Ernie? So does anyone actually use rescue or some kind of hardware inspection with another fails? What everybody do because we have a lot of ironic notes and we have no good process for another replacement or fixing which is provisioned for our client. So I'm Jay, one of the people who originally wrote rescue. I haven't used it in a long time, but when we built it, we actually got it working to a point where we exposed it to customers and the primary use case that was targeted for the idea of a rescue mode and ironic was a non-privileged way for a user to recover their server. So for instance, if you've got untrusted people who have some access to your ironic APIs or servers, like in our case, we were literally a public cloud. Rescue gives you an opportunity to, in case of like a disk failure to let someone boot up, recover what they can or even in case of a configuration failure failing a boot, let someone get in there and do it. And it's a lot more onerous than using a console, which is sort of the reason why I don't think it's one of the most popular features, but it's a lot more secure than using a console. And so that's sort of the use case that I used it for, that I've seen it used for, but you can do all sorts of fun things with rescue mode, including putting like lots of useful tools inside the rescue RAM disk or configuring it to boot. So any sort of manual offline thing you needed to do with a provision node, you could absolutely use rescue mode as a method to do that. So I don't know if that answers your question, but that sort of is, is the use case that was targeted when that was written because I feel like console mode is maybe better suited for an actual like administrator or operator type of person to go and fix a broken server rather than using rescue, but rescue is public safe. Sounds cool, Jay, one more question. Our hardware team wanted to run MemTest from this rescue image. And as I know, the MemTest is like separate process which cannot be run from the already booted operation system. Do you do the MemTest in your rescue image? That's a good, like, I don't think it's so, let's scope here. There are a couple of different types of memory testing utilities, good ones and bad ones. The good ones are typically the ones that take over immediately from the bootloader and test RAM. As far as I know, and that's not saying much, I don't know a lot about this, but as far as I know, those processes don't do anything to communicate status back. So I don't think there's any sort of hook for ironic to use to automate that or to allow our customers to automate or users to operate that. That's sad. Now there are lower quality memory testing tools that run inside a preexisting OS. You could absolutely run those in rescue mode. You could automate those to run as a deploy or cleaning step or maybe someday over the rainbow and active step. And in fact, I would say if that's something that you desire, if you find a tool that you're comfortable with with that, the work that Arne and CERN have done on Burnin is very much in the same category, right? Like, cause it's all just trying to figure out something to fail. So depending on what your actual goal is with a MIM test, like if your goal is get a confirmed failure of a DEM to RMA, I don't think the hook success for ironic to automate that for you today. It sounds really cool. Honestly, you might have nerds typed to me. So, you know, if I still remember this come the weekend, maybe I'll go poke around MIM test, but you can do stuff that you can do inside an OS which memory testing is just not always one of those that are that great. So sort of yes, sort of no, but maybe that helps you with the borders on what we're dealing with. Okay, I'm kind of thinking about the big C boot menu modification for these purposes. Yeah, so this IPX image from MIM test is even published in boot IPX work. Now I'm not sure it's easy to bend ironic to boot around on my IPX image instead of a CERN or RAM disk pair. If you have someone sitting at the system. What console? And if you have someone at the console, you can edit the IPX template. That's a configuration file. The IPX template we use as a config file. So it's, if it's possible to do a menu in IPX, you can definitely implement it in ironic, but I don't know the resolution to that if I'm, it seems like it would be possible-ish, but I don't really know. It would actually be a fun thing. Let me put it this way. If you do that, do a talk about it or a presentation at an XSIG, because that'd be super cool, actually. But your idea of using rescue to do that is not the worst one necessarily because you could still call rescue boot the node. I think you'll end up in rescue fail, but you should be able to take it from rescue fail back to active. So like, I think there is a way to hack rescue mode to be a memory test, but it's gonna be ugly and your way out of it is likely gonna have to include switching from a rescue fail, unless, oh no, okay, okay, it's possible. Because I've done something like this before the previous job. You're gonna have to do the callback to ironic inside the IPX-y if you wanna do it in rescue mode. You would have to have an IPX-y thing in rescue mode that did the following, that curled ironic with credentials. So how comfy do you feel about that? Curled ironic with credentials to say, this node that I'm booting is now rescuing. So it like does the rescue callback. And then you would boot the mem test and the only way to abort the mem test would be a hard unrescue from the ironic API. I do not recommend you take these steps. This is insecure, it is breaky, but it is possible. And believe me when I say, if you did an insecure, breaky, but possible thing with ironic, you would certainly not be the first installation to do such a thing. So the tools are there. That's actually, that's an interesting use case, I think. The idea of like almost like temporarily booting something into the RAM disk driver. You maybe could come up with a way to do this as well, using something similar to that NOVA adoption workflow, except for swapping out your drivers for a RAM disk driver and reprovisioning your machine on the RAM disk driver as long as you've got cleaning disabled. Like there's lots of ways to like backdoor yourself into this, none of them are gonna be squeaky clean with ironic. And I think you'll end up with a better solution if you go the route of like the hardware burn in stuff that CERN has, but that's interesting. And that's a fun little game of like ironic ops code golf to figure out how to do some weird thing that ironic doesn't quite support, but we certainly have the automation primitives to do. So. Yeah, a bit unguentually, but we should probably, I mean, crazy ideas to support rescue through booting into UFI console as in connecting through IPMI zero console to it. Well, that is, that's not rescue at that point necessarily. Rescue is we boot a RAM disk, like rescue is cleaning without any cleaning because it just boots the RAM disk. When the RAM disk boots, it does the callback to get the network flipped and change the network config away from DHCP. And then it, you just stay there. Like from there, you could rescue the node and then go in via the console and do console things. But like, I don't understand how we can make rescue mode work without that callback to confirm that something's been booted there. It's like that. I mean, a hacky, hacky territory still because it's exciting. Okay. I mean, if you get into UFI console, you can probably even run mem test or boot it mem test from somewhere. So that may be an interesting possibility to research if we can get there. So I have a quick question and it's about the tenant networking. We run mostly OVN in our infrastructure and we for sure, we do not have access or we do not want to touch the switches for the whole tenant. Is there, was there ever one in talk with the neutron guys about running the networking stuff on top of a DPU inside a server? So like you have an OVN controller running on like the basically the network card which has its own image and runs whatever arm based distro and then basically just passes through the VxLan Genev based network it receives to the physical node. Was there ever consideration about that? Because that would enable you to like run OVN as your network backend and have like your virtual machines in the same network as your physical machines like treat your server as like a cattle in this case. Was there ever thoughts about this? I just think we are just thinking about this because we are receiving some of these DPUs now and this is one of the use case we wanna test there. So that's why I'm just asking. So if Demetri and Arne don't mind I'm gonna pull the curtain back a little bit on our hardware enablement track chair meetings. Go for it. Where we have approximately four million talks proposed on DPUs and the majority of our meeting time was spent discussing what the heck a DPU is. So like this is not a technology we're overly familiar with in general. Quite frankly, you'll find that a lot of times it's the, you get the cool hardware before the open source developers do. I strongly recommend you make your way to the OpenStack Summit. They're a large number of interested people. I know we accepted a talk on that specifically for the hardware enablement track and I believe there were talks in other tracks related to DPUs. So like, I hate to give you a non answer but the reality is that you're talking about tech that's so new we're not super familiar with it like the primitive sound familiar in terms of like it seems like maybe they're stuffing some of the software logic into the card itself which is exciting given the limitations that I've seen with the older school stuff like we used at Rackspace but we don't have the answer for you. And the reality is we hadn't even thought about it at all until that, huh? We actually, so if we were discussing essentially smartneaks, we actually have code to support that in ironic. That being said, there's not much support on the ironic side. We're just, you know, some, there's some state wrangling around it. So smartneaks are essentially what you describe in melanox running around several years ago making contributions to all OpenStack projects mostly neutron but also running to make that happen. Why I, maybe a good starting point for you would be to look for some resources that melanox created. They probably have some blueprints against neutron, probably something against Nova. They have some code in ironic but that's not like a complicated code. We just make sure we do state transitions at the right time not to break the smartneaks because I think to program them it has to be powered on or powered off. So there's some complexity. You cannot reprogram them in some states. So ironic takes that into account and you have to mark ports as is smartneak true. So the place to go for you is neutron really. That's probably where the magic is happening. Yeah, so the first question, yes, I will be on the summit and I hopefully have a little bit of talks with some people about it. And I'm sure that I have people who are basically responsible for doing like the connectivity side and they are already in talks with the neutron guys. So I second that as well. So from an ironic perspective, it doesn't basically, it doesn't carry with users to physical interface or like just a physical function but I need to second that it's not a smartneak because a smartneak runs fully on the OS itself in a deep UK's, at least the machines we get, there's basically the OS abstract that this thing has an own OS. And that's the next thing, which I not even thought about is you need to provision this thing with like an Ubuntu arm to even run stuff on top of that. So that's also a whole not a story which I did not cover yet. And it gives me nightmares currently because they have like dedicated BMCs, dedicated OS on top of them. So that's a whole not a zoo of stuff you kind of need to manage. But currently we're just thinking about like having like a neutron component running on that thing and then pass it through. And I guess as long as Ironic and boot into RAM disk and communicate with the conductor and whatsoever, I guess it should not be a big change about that. But yeah, and yeah, it's the new stuff but I also just received them, I think last week. So I just had a quick look at them. And this will be a very interesting topic. Yeah. So that's also interesting ideas about the storage guys then came up with the next ideas. If we just enable those things like your DPU, you can even like to NVMe over fabric and then pass it through as a PCIe device from the leak to the host system. So there is crazy ideas in the room but let's see, let's keep it, get it running first in the next few months at all. Thanks for the answer, Stefan. So I'll say two things. First of all, I nominate Samuel as the official Ironic and bare metal SIG ambassador to the DPU delegation. It sounds like you understand a lot more about this than I do at least right now. So like, please do go find out about what exists in Melanox, go to the stuff at the summit and cycle that feedback back into us because we can't know everything. And it sounds like you've already got a head start. So that is greatly appreciated. Yeah, sure, sure, we can do that. I think back to some of the conversations when Ironic was started and the sort of composable hardware stuff you sort of start to get into at the end of that is definitely, that's definitely in line with some of the original visions for Ironic. I don't know, I don't think we do the composable bits today but that gets exciting because you actually get to treat bare metal even more like VMs, which is just gonna make us interact with OpenStack bits better if I could actually say provision me an eight core machine with 16 gigs of RAM and it gets the RAM and the disk and stuff from over the network. That's super cool. So yeah, but I think that is one even one thing further apart about like this segregation stuff. This is another layer I did not cover yet and I think there's also nothing usable there yet. Currently I think that the first disposable thing is the network you wanna run. So basically what you said, you treat your bare metal server as a VM which would be already a huge benefit to us and mostly to people I guess who want to do it. And I think to be honest, that idea isn't new. I think AWS already does it with their Nitro cards and I think that's where all the other Nick vendors got their ideas from. And now I hope just with the DPUs now it's just more broadly accessible to everyone. I think the approach is pretty cool to segregate everything away and just isolate the VM into the tenant network. But yeah, I'm not sure about the memory stuff and the CPU stuff about this composable stuff later but yeah. I would have another question from a colleague regarding IPA, it requires an incoming and an outgoing connection. And he wanted to ask if it would be possible to use web sockets on Ironic API and Ironic Inspector Conductor site instead. What's the virtue of that? In our case, I think the issue is that we need like a routable IPs and these are limited because who needs IP for six. So we wouldn't need to care about these if we could use web sockets. If I understand it correctly, but I don't guarantee. No, no, no, that, no, your answer makes a lot of sense. So you don't want to have to have a route back to the agents. Yes. So I wrote a patch downstream for Yahoo. I think it might be upstream somewhere, some version of it, which I called heart beatless IPA and I tried to spec it out for upstream and it didn't work too well. The main issue with what you describe is that it's not actually Ironic in our current model that begins the transaction with the agent. When the agent boots, the agent like checks in to Ironic via an API endpoint we call lookup and then it heartbeats. Heartbeats a terrible name for it, but that's what we call it. And so it's sort of like the agent is actually the thing that's driving everything. So in order to change that, in order to support something where the API goes first, you've got to change some significant stuff about our model. The idea of using web sockets is absolutely a new one that I hadn't thought of before and might be safer. Like when I implemented that downstream, I literally basically a syncopated everything and use the node object as my ugly cache dumping ground. And it wasn't very pretty and it certainly wouldn't scale up to people who had extremely complex sets of steps simply because you'd run out of space in the database field. So that's not, that sounds cool, but there are sort of along the lines of what Arnie said about video console, about VNC console. This is one of those topics that's been tackled by people in the community two or three separate times and it just never quite gets there because the technical difficulty of it is pretty high. The turn in Ironic is pretty high and the value is limited based on who, environmental concerns, right? Like I think you're not the only operator and now who's not the only operator who's had that kind of problem, but it's not something that's like easy to rally support around, you know, like let's support this thing we already have again better. So what I'm gonna do for you is I'm gonna find that no conductor to IPA spec that exists, which is essentially the feature you're asking for not in the way you asked for it. I would not necessarily expect that spec to move unless you move it. That's something that you care about and you're concerned about. This is a perfect, perfect case for going and taking that spec link and putting up something on our Bobcat PTG about it. But admittedly, we've got very limited resources to do development stuff. So, you know, you could ask, you love it when people ask, but don't be surprised if it takes a while or if the answer is no at first. But I'll make sure that existing spec review gets linked in there. But to put that in perspective, this spec was last updated two years ago. So that is, that's how long it's been since this has really been looked at and considered. But... I think we already mixed up two cases pretty heavily. So a bit less IPA is the opposite thing from no conductor to IPA communication. You mentioned both in one conversation. That's a bit... Oh, you're exactly right. Hold on. You're right, this is the reverse. So if what we need is to prevent ironic from talking to agents, which is how I understood the initial request, that is significantly simpler. That does not require rethinking ironic. It just requires us to provide a list of commands in response to hard bits. That's it. For now, hard bits have no response on the status code. If you provide a list of commands in response of hard bits, here we go. That's how you make sure that only agents talk to ironic. If you want to remove agents talking to ironic, that's where I think become ugly because of ILLOOKUP, BE inspection. So which one are we talking about in the end? That's a good question. The colleague wanted to get rid of incoming connections on the IPA agent, so on the IPA. So no ironic talking to IPA. Only IPA talking to ironic, right? Yes. So that's simpler. That's pretty doable. We just need somebody to sit down and do that. That's what I did at Yahoo for what it's worth, to be clear. I know I switched around when I was talking about the spec at Yahoo, what I did was the agent connecting back to the conductor for everything rather than the conductor connecting into the agent for everything. Okay, I think that sounds great. I'll take a look into the spec and give it to the colleague. One last question of mine regarding cleaning. I know we probably all wanted Redfish to be these cool things that standardize everything. Like IPMI wanted to do it before and at least in our case we noticed, oh yeah, everyone does it differently. But now we are looking, we are still mostly using the IPMI driver and we're now looking to either switch to Redfish or the vendor ones. And the question is most of the newer vendor one drivers seem to all use their Redfish implementation. So if we define our own step or on cleaning step that is based on Redfish, how can it know when it runs? Because if they all use Redfish, kind of different that this is a Dell node and it should only run if it's on a Dell node. Does my question make sense? It sounds like you need to define your own hardware type, which is something you still anyway have to do because that's how the other compositional works. And you use that only for nodes that can run this step. I mean, let's also, that's the most ironic proper answer. The most operationally simple answer is that you have access to the node object in those cleaning steps you're running and if statements are great. We did that quite a bit when we first implemented cleaning downstream is we basically would put a tag in node extra saying what class of hardware it was and basically look for that tag when deciding what cleaning steps to run. And we did it that way specifically so that we could also say like, okay, if this is tagged as hardware type A and we know hardware type A is supposed to have 16 gigs of RAM and it's only got eight gigs of RAM. We can flag that as an error in cleaning. So like there are some benefits to go on that route. Like Dimitri said, there's real stuff you can do with hardware types and subclassing it out and all that. Quite frankly, if this is code that's going to be written by operators and maintained by operators, I might would suggest going the simpler route of just inspecting the node object and skipping that step inside the step if it's not the right hardware to do it. And that is a, I'm specifically suggesting to hack up your thing. So, you know, Dimitri might disagree with me here, but I'm saying if I was the one making that decision that's probably the route I would go. Just so someone doesn't need a PhD in ironic to understand what's going on in your cleaning. Right, so I just wanted to know, I don't disagree that I want to notice that our retribution implementation already caches the vendor in node properties. So if you want to write logic based on that, it's always there. We take it from the systems manufacturer field. And the vendor drivers and the vendor cleaning steps have this already included. I haven't looked at the code in detail, so I don't know, maybe it's written in there, but that was one question of us. So if we just use the vendor provided steps, how will it make sure that the dual steps doesn't run on a Lenovo node? They implement this already. They're not, they expect it to run the right hardware type. So if you use driver equals redfish, you only get the generic stuff that is supposed to work everywhere. If you have driver equals I drive redfish or I low five, it will assume you are right. It will assume it's talking to HP machine, for example. There's no checks there. There's some a bit of logic in my drive redfish because of their closeness, but generally you have to be right there. But is this something specific that you want to do in a cleaning step that is not provided by these drivers? So what's the actual use case is my question? Well, for one case, we have hardware that is, for example, not provided or not supported by NVMe CLI. So we need to, so we can't secure wipe it and we don't want to wait for shredding. So we probably want to do like the vendor, vendor provided to do this stuff. And the other step is we are now in progress of, right now we have, well, what we call a block and the block is like between 12 and 18 bare metal nodes that are connected to a single conductor. That was how we did it in the past. And now we are in at least one, some regions, so big. So above 1000 instances that we want to scale better because it doesn't really scale if you have a conductor for less than 20 nodes. And now we are getting into the question, okay, if we want to now also switch to vendor provided drivers, would it work? Or would we need to like make a conductor per vendor? Right now we have it because each block is the same hardware apart from maybe like some have more CPU or more RAM, but it's all the same vendor. And if we now mix it all together, is it a problem for us? I won't, will it work? Maybe I misunderstand, but the like the drivers per node, it's not per, it's not on the control plane, it's not on the conductor. So you have like, you can mix on the single conductor, you can mix multiple other types. I'm not sure if this is... Yeah, that I know, but the question is if all drivers use the same like protocol to talk to the nodes and that is Redfish, do the drivers still understand that it's not a generic Redfish driver, but they can like identify, okay, I'm using Redfish, but I'm still using the HPE driver or the Lenovo driver. And I assume yes, but I just wanted to make sure if I have the possibility to ask so many people here. So drivers for nodes are your input, your as an entity that creates nodes. So you create a node, you use driver equals LO5, you are making a statement that this is HPE node. And you can have, then you can create a node that uses driver equals Redfish and you are stating that this is generic Redfish node which can be Lenovo or many of them, even HPE itself. And you will get only the subset of features that is available for all of them. So I may be also misunderstanding the question, but I don't see a problem as long as you provide this input correctly. That sounds promising. I think we need to look into it more. Thank you. Why are we talking about Redfish? And let me just add that I was just checking and we're running a little bit more of a thousand nodes with Redfish, the generic Redfish driver. And it's actually working for the standard stuff, it's working fine. So we don't do like any sophisticated half or specific things with Redfish, but like the usual instantiation, all the things that I described in this fancy plot that I had earlier, they work with the generic driver from what I see. So just like make a statement from outside the Redfish, the Redfish support in ironic is actually working with the basic things from what we see quite well. And we have it in production on more than a thousand nodes. Just that this was basically as a reply to the comment that and the fact that different vendors implement their own thing. And okay, we also hit this of course, their own thing when it comes to the Redfish implementation on the BMC. And we've submitted a couple of patches to handle this and there's some very, Dimitri was already nodding this like some very specific cases where they actually like violate some standards and there's some pretty funny cases. But in general, it works. Yeah, I would hope that it does. We were thinking about using vendor provided drivers to have access to the extra features like deploy with a virtual disk instead of PXE. Virtual media standard, you don't have to go in the drivers for that. At least in Redfish, it's standard. If you hardware supports it in a standard way, the generic Redfish driver would do that. You just have to pick a different boot interface. So instead of boot interface, I picked you have to put boot interface, Redfish, virtual media. That's what we do on many vendors. And as Arna said, sometimes there are issues. We fix them in ironic for you so that you don't have to bother with that. The primary thing to be aware of, and no one said this when we talked about V-Media boot of food times, the reason why most operators choose not to use V-Media is it does require that your BMCs have access to whatever's hosting your agent. And so in some places, that sort of crosses a security boundary in a bad way. So like I'm just gonna call that out as like, if you're watching this video or maybe you specifically are thinking about doing this, just make sure you consider that and make sure that connectivity exists and it's not gonna make any of the big, bad security folks that your company angry at you. But another thing is if you have a long latency between where the virtual media is being hosted and the BMC, you may end up not being able to ramp up enough bandwidth to actually quickly boot the machine because of the case being window sizing. This is unfortunately something we've observed in the wild. Oh, very helpful, thank you. Yeah, it's sad because V-Media seems like the panacea in a lot of ways, but just like everything else, you're trading something off. And if you don't know what you're trading off, then you gotta think a little harder. Okay, so we had some stuff we're gonna talk about sharding and scaling. Do you wanna keep this conversation going? Sure, I mean, we're looking at about 28 minutes left. Do we wanna move on to next stuff and then still have more open discussion at the end if there's a more fun chat to have? Like we can speed through the sharding stuff, but I think that's interesting and folks should know that it's coming. I would say we could have a three, three, five minutes break and continue with scaling and return to any topics in open discussion if anything happens. But we also had like, we are more like one and then more than 45 minutes late for the scaling discussion. Maybe some people came for that. Or maybe we'll do it in a relatively quick way then return to all questions, open discussion. Or maybe it's we've had enough open discussion really. Yeah, I mean, I say we let's hit the sharding stuff. Let's not like blow through it, but let's try to do it pretty speedily. And then what time we have left, we can re-engage discussion. Maybe that will create more discussion or not, but all right, are we ready to have a chat about scaling ironic? Sure. Yeah, I'll let you start with that. I'll be just clicking the slides when you tell me to, Jay, you okay? As okay as someone can be, expecting to hear things about their headphones then. All right. Yeah, I'll let you take this, and I was just clicking through the slide. So actually we have some slides from RNA first, right? So go ahead, then we can discuss shardings. I have a few things about MetroQt. Yeah, yeah. Okay, so I put in a couple of slides about our scaling experience. So this plot actually shows, this plot shows how we grew the ironic deployment over the past like five years. As pointed out earlier, we started a little bit earlier, 2016, 2017, which is like cut off. But you see how like this very steep increases, these are usually new deliveries that are coming in. And in the beginning of 2021, you see this like rise from 5,000 to like, sometimes eight, eight and a half thousand nodes. This is actually the adoption where we pretty aggressively enrolled in production nodes. And then when it goes down, that's usually a retirement campaign where hardware was removed and then we get like new hardware in at some point. And at the very end, you also see a sharp drop below 7,500. But actually the nodes are down at the bottom. They just fell down to the yellow bar. These are nodes that are switched off or like instances that are deleted actually to save some electricity. So they're still there. So the yellow block at the bottom actually belongs to the top. We're still around 9,000 nodes at the moment. So during this journey, there was actually quite some, as you can imagine, there were quite some things that we ran into. Some of them have already been mentioned in the past hour. If you go to the next slide, I have listed some of them. So there were various issues that we hit over the time. For instance, with the databases. So whenever we started up our control plane and all the conductors connect to the database at the same time, we had a thundering heart problem and ironic after, for instance, an upgrade could not really start because the database was overloading. And then we basically had to like start them in a like more controlled way. This was fixed to a large degree by changing the way the database is accessed to lazy loading. And you see on the graph how that helped. But you still see that there's a very regular pattern. So when you start up by running and they're all synced because they started at the same time, they hit the database at the same time and you create this like very regular pattern. And then the lower part, for instance, you see, if I remember correctly, this is actually an actual upgrade, full upgrade of ironic. So you see I start, you'll be, this around 10 a.m. And then things are down for a while while I do things. And then at some point I think I'm ready and I start. So this is the thundering heart at the beginning of this sharp peak at around, where I said 11 a.m. And then you see like the database activity. So this is basically showing the database activity is much higher than it was before the upgrade. So this is when you realize that actually you have forgotten to add some of the patches and you have to rebuild the software because you forgot the lazy loading patch. So at two, I realized, no, early I realized, but at two basically, there's the, where like roll out the new software. And then later during the day, I stagger the start of the conductor and it gets basically into like very quiet relatively quiet area again. So database access is one of the areas where a lot of work has gone into with this lazy loading. And also Julia made like major contributions in order to improve the way we access database. And it was massively, has become massively more efficient and scalable. One other thing that we struggle a lot with and other deployments as well is resource discovery. So whenever there's an instance that's been deleted and then a bare method node becomes free, how long does it actually take until the system realizes? Ironic realizes relatively fast, but nobody has a different story. So conductor groups have, there are lots of conductor groups is basically grouping these nodes with a controller. We use conductor groups in order to do this to make ironic more scalable, but this should be superseded by charting. I guess Jay will explain that in a second. There are other things that you need to like bear in mind when like your deployment grows, which is the power status checking that Ironic does. So I did the math once for our deployment and there's like roughly a million calls to check the power per day. So that's quite a lot in the data center. So there's some parallelism now that was built into Ironic a couple of years ago where you can actually like configure how many BMC calls should be done in parallel. So that that helps a lot and you can also control the frequency. And then of course there are API overloads. So for instance, when there's like a lot of inspections happening at the same time, you need to make sure that this is like properly scaled out. There's also the problem. There was also the problem when the inspector downloaded database. So we added something which was also included upstream where there's an inspector election to like have a master inspector that basically is the one like synchronizing the databases. We had issues when we did active introspection because when we did this in order to update the inventory that I mentioned earlier on a bunch of notes, it was basically calling because they were in the same conductor group to the same API. And if you do this on 200 or 400 nodes at the same time the API cannot handle that load. So we had to go through the load balancers in order to do this. So there's a lot of like things that over time we needed to adapt. But at the moment, at least at our scale with roughly 10,000 nodes, we have no major scaling issues at the moment. But this is mostly just say like this was quite a journey over the past couple of years. Do I have another slide? Oh, I do. Okay. So this is basically the conductor group set up like a sketch. So at the very bottom you see the physical nodes and then we have ironic controllers per conductor group. So it's basically initially we started with only a couple of thousand nodes and we had three controllers basically and then we had some overcontrollers on top. And at some point we needed to separate these a little bit and introduce conductor groups. And we have basically one ironic controller per conductor group managing at most around 500 nodes. You see this at the top some conductor groups are a little bit larger but 500 nodes is like a good compromise for us between the amount of additional ironic controllers that we need and the time it needs to discover the resources. So it's a compromise. If you have more, you can be quicker but you need to spend more virtual machines, of course. Now there's a couple of things that we did in addition. So you see on the very left side we have a special group, which is the conductor group zero that doesn't have an overcontroller because we don't create instances there. The nice thing about conductor groups is also that some of the configuration you can then do per conductor group. And in our case, we do fast track there. So in order to onboard nodes we don't reboot them in between, but we use fast track in order to like onboard them, clean them, do the burn in without rebooting in between. And at some point when we are ready we move them to the conductor group where they actually switched off again. So fast track is basically where you don't reboot or switch off after you've done something but keep the node running waiting for the next instructions. And once they're ready, we move them to the conductor group where they are actually switched off and then are waiting for the instantiation. The other thing that we did if you look at the very right that the leading group, we sometimes have a leading group where we have, where there's more activity, we have more, we can also add more conductors to handle increased load. But this is basically how our setup looks like. We have, as I said, roughly 9,000 nodes and around 20 or so conductor groups to do this. But with sharding that will become even better, I guess. Next slide is probably sharding now. Let's actually go back and leave that one and I'll talk about sharding over this one. So this is sort of an abuse of our conductor groups in some ways. Like what Arnie's doing with the different configurations is more what it's designed to reflect or maybe even physical locations. So you could have a single ironic installation working across multiple data centers. But what it was not intended to be is a key for less scalable components like Nova Compute to key off of. I will say that they're a extremely large number of ironic deployments today using this type of setup where you have conductor groups in order to segment your nodes into smaller chunks that Nova Compute can handle better. And sort of the main, and I don't want it to sound like a drag on Nova because it's not, Nova designed their components to scale to approximately the maximum number of nodes that can be on a hypervisor because that's what they deal in as hypervisors. And so when ironic comes in and we have a single ironic installation that has thousands or tens of thousands of nodes, that's just a level they never designed for. So how do we make this work? How do we allow the functionality of conductor groups to still be there for people who need it for that config separation people who need it for that location separation while also allowing us to do this. And this is where sharding comes in. So what we've done, and I don't have any specific slides about this, but what we've done is we've added a key to the node called shard. This is a free form text field. You can name your shard FIDO if you want. There's no rule against what the naming convention is or anything like that. But the idea is we've also added support for querying nodes and ports by a given shard. So for services like Nova compute or networking bare metals, another good example with ports instead of a nodes that essentially operate on a pattern of give me all of the information from ironic and then iterate over them. We're literally running into slowdowns that are caused by parsing gigantic JSON documents. And the sort of things that Python are slow at. So by cutting those into more bite-sized pieces, we're gonna be able to make those client services use the shards and work a little faster. And the cool thing about this is that this isn't just something that it can be useful for our tooling. So in Antelope, we hope to ship the support for this in ironic, the support in Nova compute and in networking bare metal is scheduled for Bob gap. But obviously no guarantees it gets done when it gets done. But once we land this in Antelope, you can start assigning shards to your nodes. The cool thing is, is you'll be able to then if you would like, you can also limit your actual operations queries based on shards. Like I know when I've worked at hyperscale environments and we had to do something like in times for in nodes that that can be really painful when you have 10,000 nodes and you have 100,000 nodes, like that gets really tough. So you're able to look at it, you're able to use those shards to make the parallelize out your operations work as well. And so it's a minor thing, just adding an arbitrary key that you can segment your nodes by but we think it'll have good impacts on the ecosystem. The other thing we implemented there is just an endpoint that you can call and it'll give you a count back. It'll just tell you, here's the name of all your shards and here's how many nodes are in each to help you monitor and manage that. Then like I said, we'll have support for that in ironic in Antelope. I would not expect any of our clients we ship such as Nova Compute or networking bare metal tab that support until Bobcatter later. But it's a cool thing that's coming and hopefully that'll free up people's environments to use conductor groups for conductor groupie things and not have to overload them to also shard. Also, I think this is one of like our most painful points. I'll tell you what, I'll tease it. The Nova spec for this is still under review and there's a lot of cool things in it that'll make HA for Nova Compute more clear cut and it'll also remove some of the races as well. So I'm pretty excited about that. Shift in our model and Nova to look more like other stuff but we'll talk about that at some future bare metal sig when that spec is actually approved. So I think that's all there really is to say about sharding. I think you had a couple of things you wanted to talk about about with metal cube, right, Dimitri? Yeah, but I would let people ask questions and provide comments first because I guess it's exciting topic. In the move to sharding, is there anything the operator needs to change in its way? So I assume conductor groups will still be there. Do I need to do anything else with, for example, node resource, do I need to assign stuff to shard? So this is a fully dynamic process in this case then. So it'll be, it'll be the, there's no automatic shard assignment in ironic. It is an attribute on a node. You set the node driver, you set the node shard, right? Like it's just one of the things you set on there is not gonna be required. And if you don't operate a deployment at a large enough scale to care about or need sharding, you can ignore it exists more or less. Like that field's just gonna be dead to you. For when we get to the point where we have support in clients for this stuff, where Nova Compute's ready to go, where Network and Bear Metal's ready to go, we're gonna provide detailed operator documentation and migration for people who wish to adopt shards in that situation. But for Antelope, there's not really an action for you to take. If you are at scale and you go, gosh, I have this automated process that runs and it's slow because it has to iterate over my giant ironic cluster, then you'll be able to solve that for your problem by simply updating node.shard on all your nodes to some non-null value to separate them into different shards. Like this is literally as simple as it gets. It's node.shard, you set it via the client or an API call to any string value and that string value is a shard. Like there is no, there is no top-level ironic object of a shard. You're never gonna call an API to create a shard. You just set node.shard and the rest is just magic. Like under the covers, our V1 shard endpoint is literally just a select out of the database of unique node.shards on their account. So the amount of fancy here is very little. We're just adding a key to the object that operators can set optionally and clients can consume optionally in order to give them a consistent subset of nodes to operate on. Is there a diminishing return on how small you should slice a shard? Oh, certainly, like there's gonna be an infrastructure cost for shards when we do Nova Computes because each shard is going to need its own Nova Computer. It's tough for me to talk about the full ramifications of operations for shards when only the ironic half of it is done right now. So like I don't wanna tell you processes that are gonna be a certain way when they're not set in stone yet, but there will be an infrastructure cost for each shard almost certainly. Like you're going to have to run some separate stuff for that and the changes that we're proposing for Nova Compute are basically gonna change us to where it's gonna be more of a active passive style failover for conductors instead of the clustered things or right now like multiple computes can provision a given ironic node, that's gonna be going away with sharding. We're gonna make it where you basically, this Nova Compute handles these nodes that provisions on it manages on. And if you want to be able to access those nodes, if that Nova Compute comes down, you will have to configure some sort of failover mechanism to like a cold or warm spare or something like that. Okay, Fedora posted a question. Is this scheme of conductor groups relevant to modern open stack version or can it be simplified? What scale numbers do you have for every service? And how do you use Prometheus for monitoring? Any common exporters or internal ones? So I would say that the graph here of conductor groups represents the current state of ironic art of how you would deploy a cluster. You have conductor groups to help you scale with Nova Computes and a bunch of physical nodes set up to each one. Like this is a good modern ironic open stack design. As far as scale numbers for each service, I think we have like two separate categories of services, right? There are open stack services that ironic consumes that might be designed to scale to hypervisor levels. These are tools like Nova Compute. These are things like the neutron agent model that networking bare metal implements because the idea is that they're never going to have to do for given in larger than an in that's like a single hypervisor's worth of VMs. So those are only scaling into hundreds or maybe low thousands with some performance degradation. When you start talking about ironic services, we scale up a little bit better simply because we're able to utilize the hardware that we're provisioning on. The direct provisioning method means that most of the effort done to provision a node is done on the node gives us a lot more freedom. So do we have a good nodes per conductor number? Like I would think I have worked in environments that had as high as 7 or 8,000 nodes per conductor, but that was with very, very, very light deployment traffic and the power status loop disabled. So I think that's probably a bit high. Do y'all have any insight for that? Yeah, I would agree. So I think we moved to up to 3000 or something. And then we were really like struggling and we had like the power checking loop still enabled. Now what we have, as I said, we have like around 500 per conductor and that seems to be working okay, but we have not tried to like install 500 nodes on one conductor all exactly at the same time. But in general, this is like a number we are like pretty happy with. Yeah, that is a good thing to be explicit about, right? Is that there are two knobs here that impact your scaling. It's how many nodes you have and how often you're churning them. Like if you have an environment where most people are provisioning a server, putting it into production and not touching it for two years, then you're gonna be able to scale that number higher than other folks are simply because your conductors are not gonna be doing as much. Exactly, also if you like, for instance, when we deploy, I know we get a delivery of a couple of thousand nodes that goes into a batch computing and you know that you wanna deploy all at the same time is different from you get a couple of thousand nodes and then users are picking them up slowly over the next year that is a different load profile. I see that Scott also from GE Research added something on the chat saying that they have around 1000 node per conductor. So that's the same order of magnitude, but they haven't tried to go like a lot higher than this. And then some of it says we have around 250 nodes per conductor. So it's up to 1000 maybe, seems like the consensus. And it's also like with the experience that I have. As I said earlier, the 500 that we use is more or less an arbitrary limit where we say, okay, this is a good compromise between the number of conductors that we would need. Each of these boxes are actually virtual machines versus the amount of time it needs in order to find all the resources because that like increases linearly with the number of nodes that you have in a conductor group. So if you need to find your resources faster, you would probably have smaller conductor groups. Julia. One additional aspect to consider is what driver are you using? If you're using red fish, you're using native Python and hash sessions, most likely as long as you don't set it to basic authentication. Or if you're like using IPMI, then your scaling profile has to change dramatically. And yes, to your settings appropriately because that's a high CPU overhead to launch that process. Another thing I was going to comment and this fits right in with what Julia was saying is that when we say multiple conductors, you should not map that in your head to physical hardware. Small VMs, containers, even multiple instances on the same server if it has sufficient cores can be useful for this. Like as long as you are not network bound on your conductors which doesn't really happen now that the iSCSI driver is gone, there's no reason you can't run multiple conductors on a given piece of physical hardware. I'm not sure I would go as high as one per core necessarily but maybe one for every other core, something along that lines. It's not gonna be a problem because it's sort of like I was saying with the sharding stuff and a lot of these cases are scaling limitation is straight up Python being slow at doing some of this stuff. And so getting it spread across more processes, more CPUs does a big difference. Well, there's a stupid issue that will prevent just having several conductors because we only use host name to distinguish conductors. You can set that in the config file. Right. And if you have rabbit in queue, it's even gonna work. It's not gonna work for us with JSON or PC though because we use that for navigation. Okay. So containers then? Well, I had an idea that we had to take ports into account so just make conductors addressed by host name colon port. I definitely had physical hardware running multiple container processes but that must have been broken at some point then which makes sense. I mean, it's not something we ever officially still be supported. Well, it's also, sorry. Okay, how'd you know? What's Dmitri saying is it'll only work if you're using a message button. If you have slightly different host names if you're using JSON or PC, it's not going to magically work because the host name, it'll try to connect to that and if that has unique IP address then the world will be happy place but if it doesn't solve unique IP address, it's not going to connect to it. Yeah, there's also stuff you can optimize around the provisioning itself. We get it easy because as I said, we don't have images. We don't use images at all, like instance images. If you, for example, if you want to optimize for maximum throughput of conductor you can use raw images and make so they are not converted on conductors as just streamed directly or you can disable conversion and use QCOW images and convert them on the side of the node. There's a lot of knobs in Ironic to tune if you want to optimize the number of nodes per conductor. I actually wonder, I haven't thought about this much but I wonder how our alternate drivers scale compared to our image-based ones, right? Because we have a kickstart driver now, we have the RAMDIS driver and I haven't heard much from folks who are trying to scale that up. How much the conductor is involved I would almost think they're probably pretty close for kickstart and maybe RAMDIS is a little easier. RAMDIS is very easy. I think our folks who are using RAMDIS deploy so this life iso-deploy meta-occupied terms they are going into styles and split easily with one conductor. We only have one. Yeah, honestly, like one thing and I, again, understand your trade-offs before you do this but if you really have a need to scale a conductor crazy high turn off your power status loop and just make sure that power controls are done through Ironic's API or tune the power status loop to be much less frequent than it currently is because that is one of the big taxes. But I think, do we cover all the stuff you were asking about Fedor? We kind of just have been talking about scaling. We might just want to talk about that old three thing and move on to the big way of your scaling. Yes, that's perfect. Yeah. Thank you. I have one more small question about Nova Compute service for Ironic. Does it okay to use one Nova Compute for one set of Ironic conductor with 500 nodes, for example, or Nova Compute should manage more nodes or less? I would not go above 500 personally but it's all a matter of how much you're churning those nodes, what your tolerance is for failure and what your tolerance is for slowdowns. The weird thing about the way Nova Compute scale today is that sometimes, like there are certain aspects to running a Nova Compute without sharding and without conductor groups. If you're just running it with a pretty plain Jane Ironic, then there are pieces that all of those computes have to touch all of your nodes at startup to gather inventory. So that's kind of getting to the heart of why sharding exists at all is that at a certain point, if you get a single conductor group or like I said in Ironic with no conductor groups scaled to a certain size, there's just no way to get enough Nova Compute capacity to it. I've even seen situations where the Ironic was scaled up so large that a restart of Nova Compute services would literally take the better part of a day because it was having to fetch so many nodes, parse them, update the database and then do that in times for in Nova Compute processes. And it's pretty miserable at high scale before conductor groups existed and before sharding, it was extremely miserable. So that's part of why I'm really excited about sharding. When I operated Ironic, I felt all that pain constantly and so I'm very excited to have a first class way to get rid of it. I put this in chat, I'll chime in. One of the huge things in that time was actually Nova trying to retouch all the networking and we have since fixed that bug and I think that fixes it back all the way down. I had that patch for the story I just told we had that patch down for that. It was still that patch. But you had one of the largest environments in the planet. I, for sure. For sure. Yeah, we're done with this topic because I had a few small things but maybe not so exciting. I'm more like, so things I've got are more like questions and really answers or exciting stuff to tell people. But I think I touched upon some PDGs but maybe people have ideas or you know, because we're not in stack setting. There are many conductors. Each node is handling exactly one. And this exactly one conductor has all the artifacts belonging to this node locally. Like it's pixie settings, images, you know, stuff. We, you, we, not use this idea. Yeah, right, right. So let me just start this problem statement. We use Neutron to tell the node that you have to go after DHCP put from this conductor, not like any other. Which is cool, fine. Stand alone setting, no Neutron. We have DHCP static. Or anyways, there's no way for Aronic to influence that. So my struggle with that is, how do we correct the node at the right conductor? Or, like Jay is saying, no, share directory please, right? It was like a first idea share directory but I feel like I have a gut feeling it's a bad idea. It's bad for a couple of reasons and that same environment was using a shared directory between conductors. I think there's actually, may have been like upstream public talk about that from people who worked there. So please do find that cause I'm going to talk in riddles. We were running that with a shared directory and there's a couple of things about it. Versus we had to carry a ton of patches to make this even reasonable because of things like local caches, like each conductor wanting to keep its own local cache in the shared dir. And what we also saw was we had a bug in that environment which I never found a smoking gun. And if you've ever worked with NFS hangs you sort of know why where we would have nodes that would just in the middle of a deployment it would just deadlock the entire conductor thread. And it was literally just silent. It's like it no longer existed. We wouldn't get a failure state. None of our restoration loops would catch it because it was still locked as if it was being actively worked on. The absolute only way to recover this node was to restart the conductor that it was running on which was extremely disruptive. So I don't think that's the solution. I can tell you about how we solved it at Rackspace which has a different set of trade-offs. We use static addressing for everything. So what we did was we hard-coded ironic to point to a dedicated Pixie server or in your case you could use in Pixie servers it doesn't matter. And we basically in the same process we used to get nodes into ironic we had it also generate DHCPD configs over here as well with the MAC addresses and the IP addresses specified. And when you're able to do that that lets you do fun things like specifically based on a given node booted in a different RAM disk and things like that all out of band. The downside for that for us was we essentially had three private networks a provisioning a rescue and a cleaning that all were statically addressed ahead of time. Now I mean, RFC 1918 addresses are cheap so that may not be a big deal but it certainly was not elegant. It worked but it was clunky. So like those are sort of two directions you could use to attack that. Personally, I've always thought the best way to approach this if we did it in ironic properly. Yeah, that's exactly what I was about to say. Julia stole it, she stole it right out of my mouth with her chat of that the conductor being able to tell all the other conductors set up pixie for this node, right? Like and that might help you a little bit and that sort of port would have also solved the use case that I had downstream at previous job that was patched. Can you tell me what an RWX TVC is please Samuel? No, it's a persistent volume claim in Kubernetes. So it's basically a piece of file storage and RWX just means it's a read write many. So you have, it's basically a flat NFS share between all the containers which use basically the same directory. So it's like a share but it's abstracted through Kubernetes whatever storage driver you use and therefore it's attached to the containers using this method basically. So and therefore it's basically in the container it looks like a shared directory between all the ports you wanna attach it to as long as your storage provider or storage driver is able to do that. I wonder if the fact subtracted means you're less likely to hit like NFS silliness because NFS is, I never worked in an environment that had an NFS setup that did not just occasionally fail in a really gross way that's usually locked the underlying process. I wonder if like the fact that Kubernetes is doing some coordination here has a meaningful impact on how stable it is. It's also, I think the vendor specific driver. So we don't use like a plain NFS driver. So we use a vendor specific driver for in our case a NetApp and this is very well written in my case. It's right now since two and a half years in this setup and so far this shared file system, let's say abstracted between all these containers works really well. But I second that the moment it breaks, let's lose. But so far it's didn't break. It doesn't surprise me that it works much better with an actual bespoke NAS. And whereas the environment I was talking about we're literally talking about just one of the conductors was a server and all the rest were clients. So not some sort of high advantage NFS to, plus I'm not sure we had anyone who was an NFS expert in there, which I think we were all pretty grateful for at the time that none of us had to be NFS experts. That's interesting. That's very interesting. I'm trying to think of other ways to solve this. I think the, I really liked the idea of the conductors being able to tell the other conductors to set up pixie or even just like an iPixie file which would chain load to the correct conductor or something like that. Like there's no reason we should be using external coordination for something like this when Ironic specifically has more information about how to do it. And the actual action needed to be taken is much, much, much lower. In fact. The idea was custom iPixie file. We had a spec by Pavlo who was like about generating this iPixie scripts on fly instead of pre-caching them. And yeah, that's interesting. That requires some coding, but that's pretty interesting. I thought of that original spec of the dynamic generation as well. My worry honestly in doing that largely going to be the operational impact of getting people to do it because we were talking about then as an endpoint on the API which means people have to do API filtering if they want to do public access. Should be API, so getting like you really have to know Ironic how to configure it to be a secure Ironic deployment which could be problematic. And also I think the other problem that exists is not everyone just uses iPixie. Some people use grub. Some people need grub and the secure boot functionality. That's another thing to consider is maybe in some cases depending on the driver we'd have to go ahead and set up a copy of the config as appropriately as possible so that everything will boot. And that's almost a driver, a boot driver interface dependent understanding. Yeah, that is. Can you chain load into grub? I guess that doesn't matter because if someone is choosing to use grub we can't necessarily assume that their hardware would support some other option for essentially a iPixie 301 redirect. Yeah, so the challenge with grub is you cannot say, you cannot chain load and fall back. You can chain though. But it's generally when people use grub they need the signed shim binary in the process so that they can not lose the security state on the PPM. We can dive deep into the topic of secure booting machines but it's a case that some people need to do and in the case of like ARM hardware your only real easy quick direct option is grub. iPixie is not officially distributed in most distributions for ARM by default. I'll fix it in Gintu if you fix it in Red Hat. I may very convince someone to fix it in Red Hat, actually. Oh crap. Good luck. These are some of the things or things we've had to have discussions for a very long time. So I'm looking at our package list for OpenShift Ironic Image and we definitely have a package called iPixieBoot.mgarch64 installed. So maybe you don't need to convince people because apparently there's a thing. Apparently we stole it in the images. I had a chat with the folks that maintained that about two years ago or three years ago. Like, you know, this is a youth case. No, you're gonna need it. Yeah, so I think it's a thing in row nine, I think, or some very new version of row. And Jay has determined he's off the hook. Yay, the world has progressed. It's easy to make those sort of promises with Gintu because like if Gintu doesn't have a package for it it probably doesn't exist. So it's like, yeah, I'll package it. I just gotta find the package that already exists. But that's interesting. And like, I think, I don't know, it's, I would be interested to learn more about how people running Ironic on ARM and to provision ARM is succeeding because that's interesting to me personally but we'll probably do need to move on. We should probably hold a session on that. And I know I have the business card for the people that email. All right, so it sounds like if we're coming to a close it looks like Julia just volunteered to do our Q2 bare-metal SIG presentation. That was nice. Thanks. Great, I haven't decided the date but we already have topics. That's awesome. I guess if anybody has some wrapping up questions, comments, some scaling on any other topic then we have a human minutes until 30 minutes, exact. So there is, and some people might feel this is a little controversial but there is a couple different things going on. From my point of view, one is we do want to try and expose better metrics out of the inductor and we're kind of working on that. So like you can have a, you can actually understand how many times the call is getting occurring inside the conductors through Prometheus. That's work in progress. I'm hoping in the next, sometimes within this next year it's in a release. He was working on some functionality to more safely release the load, the workload of the conductors working on so that we don't have to kill jobs or act as in-process deployments. Which I think is really cool and I think is way past you. And there was another thing that occurred to me that I should mention and now I blanked on it and I will never remember it most likely. Well, Julia is trying to find her lost thought, I'll say generally, it should be pretty obvious if you've been here the whole time that we love having these kinds of discussions. We're all passionate about this stuff. We've all been working on it for years and years. So if you have these sort of questions, don't save them for a bare middle sig, come hang out in IRC, come ask on the channel. I even plug on my YouTube channel on Jay of Dune. I do two times a week, I do a one hour office hour session. And if you have questions like this, you can bring them to me. If I don't know the answer, we'll figure it out together. Like this is, I enjoy the community and for those of you who've been here and been interacting like we really do appreciate it because it's a lot nicer than releasing software to a black hole of silence. So thank you for being around. Please stick around, hang out with the community and such. I mean, a lot of you already are, but maybe there's one or two who aren't. And if so, we can rope you then be nice to have you around. Yeah, another request before we close. That's our first instance of a meetup in this format. Please let me know, or Jay or Arne, if you're afraid of me, how well it go, what suggestions you have. If it was useful for you or not, it was more useful than previous meetings or less useful, you wanted us to change something because we're happy to hear your feedback. And I will be very unhappy to not hear any feedback because I put a bit of effort into organizing this. So yeah, let me know how much you hated it or loved it or what we should change the next time. Cool, thank you. Go ahead. I was going to wrap up, so we'll go ahead you. Oh, I was just gonna say, and I don't remember what I was thinking. It has escaped me. All right, so I'm willing to hang out and keep talking questions with folks, but I think we'll get Samuel's and then maybe cut the reporting off and then if people wanna hang out, my Zoom room, we'll welcome to hang out for as long as we want, but maybe let people be free to do their day job if they've gotta do that. But Samuel was asking, is there specific documentation on what an operator needs to adjust if they wanna run a multi-arch deployment? Yeah, I think that is operated, documented. It's like a CPU underscore arch in node extra, something like that. It is, and you have to set some configuration parameters. I actually responded to the same question on the discussed mailing list, I think last week. Well, I will buy that as well as some Winkit. That sounds good, okay. I think I also made some thoughts of what I need to do, like cross compile, IPXE, and all that stuff, but I would just, before I start, because I don't have the servers yet, I would have just, maybe if there was someone who already has that information, I could just cross check if I missed some hilariously obvious step currently, like, oh, you need to do, also, if you wanna deploy an image then later on, you need to have that image as an arch 64, whatever. And that was just the idea. But if you can find that link, because I think my OpenStack mailing list stuff is currently not working with my company account, if you would find that link, that would be really great. Yeah, I already dropped it in chat. And please ask those questions ahead of time because I'll tell you, I was the doc's liaison a little bit. I tried really hard to make ironic docs better, but it's such a complex project that it's incredibly hard to make the docs well-organized and easy to find stuff. And I think the price we have to pay for being bad at organizing docs is that sometimes we're an index for those docs. So that's okay, ask those questions, we'll get answers for them. Like, we're not upset that you asked a question that someone asked a week ago. It's kind of our fault, that the docs are not well-organized. So that's the price we've got to pay. We're better at automating hardware than organizing docs, so. I completely agree. And we still follow the overall community Josh guidelines, which makes things a little complicated. And when you're focusing on one area and you talk to the other. I think one of the things that I've received feedback is those that know just set the parameters based upon the docs. Usually as long as they build their own iPIX, they never have an iPIXy loader that they can use that just works. That's the feedback I've been getting. But if you have questions, we can rope in some folks who actually work for the chip designers who also use Ironic, which can be helpful. That sounds good, thank you very much. I would have just a second quicker question is for the Ironic Matrix Explorer for Prometheus. I currently know the one which just exposed the like the metrics for the machines. And now the ones which will be added to is the one for the internal metrics of like the Ironic components. Will there be a mode to just like it for the just running for the internal components or is it always the full package? So the intent is the full package but the patch is still work in progress with the services. We could probably make it optional or make it such that you can choose. And one thing that I think we haven't spent, well I haven't spent too much time on is thinking about API impact if you wanna get those metrics as well. The methods are labeled, so they will get picked up. It's just I'm not sure by default. There's an easy way to do that. So that's the exported. And we wanna export it all in one run so it will go complicated as well. But we can take that offline and discuss it further. Okay, thanks. That sounds good. I'm gonna stop our recording now.