 So hi again, I'm going to talk to you about API builder. My name is Dimitri Tansov, I work for Red Hat. I hope most of you actually know me. Slides, if you want to follow yourself on my website, here is the link. And the link is also in the Bermato seek other parts if you want to check it later. So let's start, RAM disk. I hope all of you know why we use a RAM disk. We need to get into the machine to conduct some operations inside of it, like partitioning, grating images and stuff like that. Most of you have not probably followed the whole history of our RAM disks. So let me tell you just for the fun of it. It started as a batch script that was injected in a draw cut of an operating system. And it only worked by exposing the ice guys a shear and do some simple CTP calls. So it wasn't even ironic Python agent. As I can understand that many of you can hardly imagine ironic without ironic Python agent. But there was time before that and we had a pretty horrible looking batch script which kept growing, kept growing. And then folks from then REC space invented ironic Python agent and proposed it to the community. That first was running on CoreOS, which was a different CoreOS from what you know right now, it was before acquisition by Red Hat and it was pretty different, even based on a different distribution. There was a tarball built from Debian, which wasn't related to CoreOS. Before that, it was even a container based on Debian that contained ironic Python agent that was added to CoreOS as a payload. And yeah, it somehow worked. It was pretty resource heavy. And then CoreOS changed and we had to change ourselves. The third iteration of our RAM disk was disk image builder, just build a normal operating system image but pack it into a neat RAM FS. So instead of building a QCAL, just zip it, yeah, the CPIO plus zip or something like that. And the latest thing, the latest iteration was a small image based on tiny core Linux which is a minimal Linux distribution. The current state of things is like this. The FS2 no longer exists. Actually, the best script can still be found in disk image builder sources if you are into history. The CoreOS one is an IPA sources if you scroll back to Newton, Metaca, something like that. The main production way to build ironic RAM disks is now through disk image builder and tiny core Linux exists also for the purposes of our CI. The IPA builder, what is that? It's a project that appeared a few years ago by merging two things, two existing parts. First we had some scripts to build tiny IPA which is a tiny core Linux based IPA and that used to live directly in the IPA source tree. And we had an element, I'll explain what an element is to build a normal IPA image and by normal I mean from a normal operating system that lived in disk image builder itself, in its source tree. So we took these two and we created a new repo to consolidate all the tooling around building ironic Python engine RAM disks. First, a few words about tiny IPA, really few because I think most of you should not be using that. I know people are trying. Tiny IPA was written for resource constrained environments like CI or your DevScript, sorry, DevStack VM or Byfrost VM. It's kind of built from ironic Python engine build directory by running these simple two commands. It's based on tiny core Linux. It doesn't have many things you would expect. Like I think it's TLS support is limited and more importantly, it doesn't have firmware for some hardware and I don't mean exotic hardware here. I know some dels have problems with tiny IPA. So be careful when trying it on bare metal. It works so great but we designed it for virtual environments and cloud environments which brings me to the actual image and I want to concentrate on it for a bit to give you an idea of how it works and how it's built. More importantly, I'll talk less about how it works but how it's built and how you can customize it. As I said, normal image is usually built from some real operating system. In the most simple case, you can invoke this command. The script ironic Python agent builder is shipped with ironic Python agent builder repository providing with output here and the operating system you are going to use. It sets up ironic Python agent to be started as a system D service depending on networking being up. It installs all the required dependencies. It can support ironic Python agent installed from source and from packages like RDO and as any disk image builder element and that's an element itself, it can be customized by further elements. So what's it all about? Disk image builder works by taking the cloud image usually or an empty directory. It can use things like a debut strap and so on but in normal case, it takes a cloud image and packs it and starts running things in truth and outside of it. So elements have several stages. I've listed seven there more than that. I've listed the most important ones as they executed one after the other. So we start this extra data which is usually used to populate some environment variables, screen install, install, post install and then pre-finalize, finalize, clean up D. As you see, I marked some with a star that means they run inside of truth and there are few of them extra data pre-finalize and clean up that run outside of truth. So we can use something from your host operating system we're building it. For each state, each element for each stage, each element has one or more scripts, usually batch scripts and element can also declare dependencies on other elements. So from a deep run stage after stage, as I mentioned, and scripts from each stage from all elements are copied in one location, sorted alphabetically and executed sequential. So it's mixing, it's working not element by element but stage by stage and scripts from all elements are mixed together. In this image builder, everything is an element. Even operating systems are actually elements. In this image builder. And I want to show you really quickly how you can write your own elements to customize specifically ironic Python engine builder. So I have this blog post, it's pretty long, it's about deploy steps but the part of it is dedicated to building a RAM disk with a custom deploy step. So asset element is a directory with files. It has a file that declares dependencies of the element and each stage is a sub directory that contains scripts. It can contain more but in a simplest case, it's scripts. Some elements have configuration files. So if you look here, by the way, let me know if the font is too small. I can increase it further. Looks okay to me. Looks okay. Two dependencies, these are also elements so we are writing a custom element that extends ironic Python agent RAM disk. So it should depend on ironic Python engine RAM disk and it uses source repositories functionality which is an element that comes with this image builder and can clone git repositories. So it looks for a configuration file inside of your element which contains this magical line which you can find explanation in the documentation of each element in this image builder. It's actually not so bad if you look at that. It contains documentation for each element. So for example, I look at source repositories so probably, yeah, there is something about it here. It's pretty nicely formatted. You can check everything in this image builder docs. Then we have our script that just peep installs the cloned repository and that's it. When building with your custom image with ironic Python agent builder, as you see here, I'm using a custom location for git repository that's not required. More importantly, I'm providing an elements path with my custom elements and I'm providing a custom element as E flag. This works similarly with this image builder itself because ironic Python agent builder is simply a wrapper around this image builder, using the minimal as an operating system and yeah, a few other things. The result of that are two files, kernel and it need RAMFS which can use directly for your ironic Python agent builder. Sorry, for your ironic installation back here. And that's pretty much what I wanted to tell you. Do you have any questions for me? Thanks a lot, Dimitri. Any questions for Dimitri? Dimitri, sorry, I was a bit late. Maybe you already covered it but is part of this topic perhaps include describing strengths and weaknesses of various operating systems to host IPA? Oh, well, I don't think there is necessarily strengths and weaknesses for supported operating systems. I mean, if you're trying to use Arch Linux you may have some troubles because it's not what we normally test with. Then the difference between mainstream operating systems probably marginal nowadays. I know from my experience that CentOS stream images are substantially larger than, for example, Deben images. And if you build and redistribute Ubuntu images you can have problems with their copyrights rules. Specifically, if you already distribute. But functionality-wise, they should be equivalent. So I asked this sort of from the perspective primarily of being a hardware vendor and wanting to actually execute this on bare metal. So in terms of like bare metal support, drivers, you know, what not, that we could use for both our developer integration testing as well as in our CI. So I'm just wondering, you know, if there's any guidance at all provided or sort of overview of all these sort of strengths, weaknesses, different aspects of the different operating systems available in our upstream documentation or on maybe a wiki or some such place? No, not really. We're trying, in the absolute community we're trying to be distribution agnostic. This extends that we support not all distributions but try to support all major distributions. And your question essentially boils down to which distribution supports your hardware better because in the end, kernel models, firmware, system deep stuff, we inherited from the base distribution. It's not ironic specific in any sense. Right, and I understand that as a community we try to be OS agnostic and I think that's where the challenge occurs for us as we try to find something that works well. Like any guidance that could be provided to assist not just us but I think other hardware vendors that try to stand up third party CI's and try to do their work around developing drivers. This kind of guidance could be very helpful and just presented in a way I guess where we don't necessarily try to make a strong recommendation of one or another just provide whatever is known about them, strengths, weaknesses, different things to consider. I think that alone would be helpful in helping us decode all the options that are out there and which one might be the best fit without our having to do a R&D project to find the one that's the best for our needs and hence for the community's needs, right? Cause I think the community benefits when we're more productive. I understand your interest but I also don't think we should take any position on what is better and what is not especially since we have Linux distribution representatives in the community and that can cause interesting issues. Related to this question maybe. Do you have a feeling for what the main issues are that users have with building images? Or I mean, is there feedback at all that what's the most common issue that people have when they build their own images, their own IPA images? Is it for me or for Richard? No, it's for you. Okay. I think the most common problem is really not knowing what to do because especially if they need some customizations and I know a lot of ironic operators use extensive customizations or at least some hardware manager and people start inventing ways for customizing ironic Python agent which not necessarily optimal. For example, they start unpacking the resulting Gazeep changing files and packing it back which is not exactly a reliable way to customize that and to show today, there are much nicer ways of doing that like writing your custom elements but this is not obvious at all. Right. And that probably is the biggest problem. But I guess that like the main element that users like to add is their own hardware manager, right? Right. Then there's also a question of minimizing the size of the image. That's a never ending problem. Right. No, this was one of my next questions like what are the issues? But for instance, I'm just looking at what we add. I mean, we of course have our own hardware manager but we also have like some extra elements that we add which I mentioned before where we actually download things on the fly in the image, these kind of things. But I guess they like hardware manager is probably the main issue that people add. But I think the size would be like I mean, this is a never ending story but what is going to happen next with the size issue because people have hit this already we have like removed some packages but I guess there are limits to what we can do, right? Yeah, I'm afraid their operating systems are growing quicker than our expectations of the RAM disk size. Yeah. And with immigration also to a stream gained I think 100 megabytes and they had to revert that because it wasn't acceptable for the CI. I know that Debian image is a smaller I'm not sure why honestly. In ironic passenger builder we take some measures to minimize the size like we remove documentation we remove certain things we know we don't need. I'm really sad we cannot remove Git because Git is like pretty heavy. Is it 20 megabytes or something or 30 megabytes? I had to like force remove it so I'm downstream I'm building a minimal container with ironic Python agent that I had to force remove Git from there because otherwise it wasn't minimal. Do we actually know what the maximum size can be? Well, is there some kind of hard limit where we know okay that the kernel addressable space is like this and this is like the maximum you can have. I think that would be interesting to know because then we could check for this as well. So there are some limits on that and we've seen a person heating that but then they used a different grab boot loader or something like that. So you can check them manually so it was an interesting story on that. In practice I know that well, triple used to have pretty large RAM disks maybe like 700, 800 megabytes that order of magnitude that still worked on all reasonable hardware. I checked the other day I was just like 600 something, the maximum because I was debugging something and I had a 400 megabyte and someone was saying yeah, well your run is just too large but I had one that was 600 and was still working so that was not the issue. But yeah, I've not gone much further than this. Yeah, I've spent some time on that but that has to be a repeated exercise and I don't always have time for it. All right, I very much liked changing topic now. I very much liked the history because of of course I was not aware at all on one of your very first slides. The very first situation when you said like there was a bash script injected in Drucker can you expand a little bit on how that worked? Because this is like, it's a bash script that's in the in-it RD or? Yeah. Okay. So you can customize Drucker Drucker is a in-it RD builder for many distributions. You can customize it with your own bash script. So it wasn't like a normal system-d service, it was because what we do is a in-it RAMFS now is a bit weird, we use a in-it RAMFS not only for literally in-it RAMFS but we actually have system-d services there and stuff which is the way distributions work nowadays. But Drucker runs earlier and it's a bunch of bash. Look, I may even try to find that for you. Even though I think they still have this element, it was called something simple but maybe they removed it already. Yeah. Maybe they removed it. Unfortunately, I also don't remember how it was called. So, but yeah, it was pretty much a large bash script. It was run in a boot sequence. This Drucker run this Drucker network? Yeah. It wasn't exactly that. Never mind. I was just wondering, but this like, of course, offered like limited functionality, I guess, right? Those were sort of like very limited. So just to get an impression, that thing did not even have an ability to run SSH. So I tried to use a stage to debug it. It did not have the users bootstrapped correctly even the root user wasn't really bootstrapped. So that logging could not happen on that RAM disk. I haven't thought about this. I guess that like debugging this was quite a nightmare. Debugging that was indeed a nightmare. Because whenever I have to like debug the IPA, I mean, it's basically booting up. You use SSH to log in and just like poke around. So it's quite, quite easy. Right. Nowadays, it's easy and you have like normal system services that can have dependencies and what's not. Before that, it was a bit sad. Okay. Is the IPA builder missing some functionality? Like, I mean, is there some like big thing that is still missing? Or is that actually like complete and does everything it needs to do? Anything you want? It's complete. It's actually a wrapper around this Kimmich builder. So whatever this Kimmich builder can do, you can do with that. Or you can just skip it and use this Kimmich builder directly with the elements we provide. So I found that bash script if interested, I checked out the tag 100 from this Kimmich builder. And there was this bash script. So see this, bootloader installation, vendor password support, ironic URLs, some primitive root device detection, as guys deploy, which we need removed recently. So that was a deploy. That was the round disk. That's the entirety of it. Ricardo is just saying, let's go back to the best way. I was thinking the same thing. It basically has everything you need. Yeah, it's grandfather for in Python engines. Yeah, but it has basically, it has really basically everything. If you go to the very bottom, I mean, it downloads the image. No, it doesn't download the image. It's used as a SkyZ deploy. Okay. It starts as SkyZ target, then it's still ironic through vendor pass rule because we didn't have RAM disk API at that point. So RAM vendor pass rules that we are ready, wait there, and then stop the SkyZ target and installs bootloader. So there's not much here. Cleaning was not a thing back then, if you're curious. So cleaning did not exist. So there's no clean steps, of course. But I recognize some like the part where you just been like right before installing the group, the bootloader is something that looks very similar to what we still have in Grop install, like mounting defs and prog. Yeah, I mean, it's all inherited from here. It's part of the code. It actually looks very similar. Some code has made it, nice. Are there any more questions for Dimitri? Am I still online? Do you still hear me? Yeah, yeah, we can hear you. Are there more questions? Just speak up if you have any. So the element is deploy ironic, if you don't want to do some archeology. So I guess Dimitri, you mentioned size being a challenge. Are there any other challenges that you see with IPAs that are built? Not really. I mean, rebudigit is a challenge. Debugging it can be a challenge and size can be a challenge. And I guess also device support, right? Just sort of out of the box, what kind of support there is for various devices. Right, is that fair? We expect that we support whatever is the underlying operating system supports. This is something that I also brought up the other day, which is that if you have aging hardware and the operating system that you use for the IPA, drops drivers like CentOS-A does for some disk drivers. I mean, it's not related directly to the IPA builder, of course, but this may be challenging. You see what I mean? It's not like an IPA image that's built on the newest release of the operating system, but the operating system doesn't have support for the drivers anymore that you have, then you have to go back to some older image, which may not work in your environment anymore. Yeah, I mean, we often run into a similar but different issue where we get a pruned down operating system that has very minimal support for devices and then we find that this is what we need for a device that's common today. And newer systems is not available. It just becomes a black hole trying to get something that'll actually work against our actual bare metal. And that's my concern. That's sort of the reason why I raised it, is that it may not have anything to do directly with this image builder or IPA builder. But with the IPA image in general. The result of that activity is often not usable. And that's when the fun begins. And so any guidance to help us make reasonable choices to get us on the right trail will be helpful. And I think this gets exacerbated by perhaps Ufee and virtual media and other such things. So I don't know if you found anything that becomes sort of challenging in those environments, Ufee and virtual media. Yeah, everything is possible. We have some issues, but not with the Renegbisen agent right now when we cannot always boot Ufee with Ufee and some machines. But yeah, it's again, anything specific to Renegbisen agent or the way we build it. Except for you have to have some spare RAM because the image is quite large. Yeah, and I think generally though, I mean, at least in the systems that we integration test against, there's usually no shortage of memory. We're pretty, we're not real conservative. We buy what we think we need. We have been a benefit from having plenty of hardware. But we do find that sometimes it's a lot of memory, especially Ufee and I think virtual media, we've seen that in the past. Yeah, virtual media can be interesting. For example, I know there is some not really popular hardware that has 150 megabytes limit on virtual media. IPA won't fit there. Right. And maybe we've encountered that in the past. Hard to say, it's been a while since I've seen it myself, but I have seen it in the past. Yeah, with virtual media, another problem. Again, not IPA specific, specific to any virtual media is hardware differs in how they treat virtual media. So this 150 megabytes limit comes from the fact that they don't load it completely first. Some hardware don't load it in batches. Some hardware don't load it on every access with pretty small buffer size. And yeah, that can be a problem. So I guess what I'm getting to is, do we have a way of like sharing these learnings amongst members of the community? You know, these things that we learned, that maybe not, they're not directly related to the tool, but certainly with the thing that's produced by the tool and it's usefulness, that's what I'm interested in, Arne, from the perspective of the special interest group is sharing our learning so that we could be more productive and get to know what we're... Yeah, exactly. This is why I was asking if we know what the limit is. So if you have like built an image that is like, I don't know, three gigabytes in size, you know that this won't work because it's too big, but I'm not sure I know what the exact limit is. I don't know if this is like hardware dependent, for instance, or like firmware version dependent, or I mean, I don't even know what this depends. Right, we may not know, and it may be just sort of guesses based on what we do know, right? So, you know, I've tried this and this is my configurations, the firmware, and I found that this size didn't work. Not sure what the cause is, but you know, here's what I've seen. So something like this, something like a knowledge base may be sensible to do. So we have something like for our deployment, because we just, well, just like two years ago, we switched most of the things to UFI and of course, this has created some issues. So we have like a list of errors that are sometimes cryptic to understand to actually know what does that mean. So when your node displays some error message and it's something cannot move forward because of something, what does that actually map to? And sometimes this would probably be sensible to have. So if you run into something, like I cannot allocate memory on like very early stages of the boot, what does that actually mean? Has that to do with the run this, yes or no? Or I think something like this would be good to you. Yeah, those are the kinds of things I'm referring to. Any information on all your guidance would be helpful based on other members of the community's learnings. I don't think we have such a knowledge base or place to share. We have turbo shooting guides in our admin docs. Yeah, maybe it's something we should add there. I mean, I can like put some of the stuff that we have there. I mean, of course it's like, I mean, for the example that I just gave, this is probably like very hopper dependent or from the version dependent because the error message is probably, but at least it gives you some ideas what to look at. Like, okay, is there an issue with the image because it's like broken or is the image too big or is there something wrong with my TFTP server or these kind of things. So we have like, I don't know, 10 or 12 different error messages that we decoded over the past couple of months to actually understand what they really mean and what to fix. That's ironic that we have to deal with all these bare metal challenges. Yeah, exactly, that's very hard to do help us.