 this spaghetti thing sounds delicious. Yeah, funny that I'm eating spaghetti for lunch. I had spaghetti with pasta for lunch as well. I also had some pasta for lunch. Oh. I mean, for me, probably it's normal, but. Yeah, totally. Thank you, it was good pasta for lunch. Yeah, it would be a problem for Ricardo, I would say. Oh. Everyone had pasta for lunch. Thank you. I had a day off for breakfast. That's fair enough. Looks like the streaming's been started. Thank you, Allison. Yeah, no problem. Who should I make the host? I'm probably going to drop off, but I'll be around if y'all need me. Myself or Dimitri, either or. Dimitri, Dimitri, Dimitri. Oh. That's cool. Hi. OK, we voted Dimitri. Dimitri is the host. All right, well, let me know if y'all need anything, but otherwise have a good session. Thank you. Thank you. OK, round two. Yeah, I see that people are still joining. Yeah. Maybe we give it two minutes and no discussing pasta. Definitely. Since you can't really join the room if you don't have plain candy until the exact time. So where's the etherpad? I can't paste the link. I've got it almost here. I've got it. I'm sorry, I just read the first warming up question. Even if we are not starting yet, it's kind of interesting. If you were to write ironic from scratch today, how would you write images on nodes? So what's your answer? Well, by experience, I'd say like RAM disks. So not writing images directly on nodes, but just using images loaded on RAM. This is what we used to do, actually. So the RAM disk deploy you mean? What, sorry? The RAM disk deploy interface, pretty much. If I could stir completely over, I probably focus on Kexa and RAM disks. I guess the problem is at some point someone is going to have gone, I need this disk image. I need this file system created on disk. I need this special configuration done and demanded that we do it because VMs have images. So I do like we're seeing some use of the RAM disk interface, even though like building RAM disks is kind of a magical art that people don't really grasp or have common knowledge of, unless they've really done it more than once. Yeah, we have a use case for the RAM disk deploy interface now because for MetalCube we're trying to use a separate ISO for deploying essentially. So just make a running fire up this ISO. It's magic, magic, magic, then have a vendor pass through code to disconnect the ISO and leave it there as active. That should work. It should work. We don't have this vendor pass through for disconnecting ISOs. It shouldn't be a blocker, really. I mean, if we have first one, we do RAM disk boot, maybe we should have a cleanup periodic. What do you mean? Maybe we should have a cleanup periodic that runs every so often and if the deploy was x amount of time ago, go ahead and detach it. Some people may actually use this ISO as the one and only thing. So constantly accessing the ISO, you mean? Yeah, I mean, just running some smart software for my term. I mean, it's initial use case for HPC people. Connect the ISO, start some simulation. Yeah, I have that in mind, unfortunately. And unfortunately, no one knows. OK, we've got a couple more people that have joined. And we've already kind of started rambling into the top which is actually kind of awesome. And ended in a completely different direction because I actually spoke of K-Ink back. And yeah. So I guess the next question is, what would you omit completely today? I mean, I think to me, Triari answered ISO as a deploy. I think I'll second that. Honestly, I don't know. There's no gym here, right? I can say that maybe I would omit having an agent completely and just use the Ansible deploy. If I was to start ironic today, I would probably just do the Ansible deploy and the RAM disk deploy. Kickstart maybe. I've kind of been surprised how few people want to use Ansible and how few people want to actually leverage it. Because it's not a perfectly canned solution for all things or for all possible cases that can be encountered. You have to know your scenario basically. And it's just been really surprising. Right. It's a good compromise between all the new in terms of deployment. You need to, I mean, I expect from my point of view, I expect when you do a deployment that someone should know their infrastructure or what they want to achieve in advance. And so using Ansible and having some kind of reconfiguration and knowledge about that, well, they feel kind of comfortable using a modern tool with some kind of old way to do a deployment. I think kind of that because there's part of our user bases, virtualization, compute-based, they're so abstracted they don't really know that much about the hardware. So they can't really assert a bunch of different things. So we do a really good job with the agent of trying to do the right thing and trying to go down the path required with what we have. And a lot of times the images are not perfect. They're like in some cases they may not they may be broken completely. And we kind of make it all work, which is kind of awesome at the same time. So I don't think that's something, I don't think you take different images like we do with virtualization, throw it at the RAMS interface and just expect it to work. You'd have to know, you'd have to have your trust to know good, this works in this hardware image, which is a much higher level of, I guess, barrier to entry. So operators, do we have operators this morning? Maybe one? Not many. It seems that we have a smaller session on Monday. Oh, well. Yeah, I guess my point against agent, and that's pretty weak point, actually two points, is memory consumption and the problematics of building it, all those mechanics around building, rebuilding and this stuff. But yeah, I'm not suggesting to deprecate. It's just speculating what you're most listening. I think it becomes a thing of you're trying to just push the problem around or the area of overhead or work around the plate, in a sense, at that point. I mean, see, each case has its valid use cases. It's just we can't support everything with the agent. And the answalsing and support everything the agent does, at least out of the box. And I guess, in a sense, it is kind of a good all-in-one, because we have inspection built in. So image types, we had one vote for partition images, two for whole disk images, no one vote for anything else. And there was a question about growing images. And then also, we went to everyone uses, or most everyone seems to still use legacy boot and starting to go to Ufie. Yeah, fine. You know, the cloud images are pretty much whole disk images that can be used with Ironic. I personally use CentOS image just the way I download it from clouds and to Sorg. Yeah, and most of the cloud images do just work. And they usually do boot, usually. I think I know CentOS 6 didn't Ufie boot. But I know more recent images, at least, it's always been dual booting. It always just generally just works. And yeah, I know we've had some operators expressed interest in like a TGZ-based image. But I guess TAR now does correctly called Identify and Preserve Character Devices. So it doesn't actually read the contents off the device. So it actually seems feasible. Does anyone know what I'm talking about? You need to ask. Yeah, I guess nobody cares enough to just push this work through. Yeah. Mirantis Houston then, Mirantis happened. I really liked their idea about TORN-based deployment. Quite sad they didn't have enough time to push that through. That was interesting. Yeah, that's true. Maybe I should add we've got users that actually quite like the TAR. We'd like to do the TAR thing. But with R-Sync in particular, to do the Delta. Is that satisfiable with an Ansible Playbook? I've not tried. Shouldn't be. Yeah, I mean it plausibly should be, right? Well, the idea is that you do it with minimal data going across the network. So you don't want to just download the TAR ball and then do R-Sync. You want to R-Sync from the kind of if you could mount NFS and then R-Sync that. But again, that should be doable in the same way. Yeah, that should absolutely be doable over an Ansible Playbook. And that kind of scenario is a completely different thing. So it's the perfect use case for the Ansible Play interface. I don't know if you were using stuff like, what's it called, ZFS. You could probably do snapshots live, couldn't you that way? ZFS or better FS, they both separate snapshots. Yeah, you could do the snapshot and extract. Yeah, I always forget about butter. I guess I would work on it. I mean, you could even snap other file systems that aren't knowledgeable of it. And if there's LVM in use. Yeah, that's true. You can go underneath, can't you? Yeah, I had a backup system once designed around snapshotting the big massive database and bringing the database back online and copying the database files out from under it off the snapshot. The fun of 1.6 terabyte full text databases. I'm glad you tested it for us. That's good. It does work pretty good. You just need enough disk space in the volume we pulled to account for Delta. I think in three years of operation, we only on that backup job, we only had it die twice because a DVA1 rebuild or we did large updates. Oh, so you ended up with? Yeah. And that just caused the database backup to fail, LVM did the right thing and made our snapshot worthless and pointless and dead. That's pretty cool. Yeah, I hadn't thought of that option until we just started talking about this stuff. But that sounds kind of promising. So we talked about kind of like images and ways. Bootloads, any more discussion there? How are the limitations? Yuki, like, is anyone trying to like turn secure boot on? Maybe we need to add that as a feature in Ironic? I'm actually trying to do that. I mean, I think it's a feature in Ironic. Or I mean, turning it on from Ironic side. Yeah. I law an RMC, I think, can do that. So the secure boot capability. We should implement it for Redfish. It should be doable to change the keys, I think, is doable to Redfish as well. Not on all BMC's looks like, at least. Obviously, it's Redfish. Yeah, it might be doable. That's actually probably something we should try and discuss this next week. Because I kind of, since people are suddenly rushing towards the UV, I feel like they're suddenly going to go, I need secure boot by default. And we're all going to go, oh, no. That's what happened to me downstream. No good. Look in the source code. Yeah, the different RMC code that checks for secure boot is requested. Yeah, they have a code to set secure boot mode. In ILOs, they also have some updates secure boot mode. So we have a prior work on that in Ironic. I think we should do that for Redfish. How far was story? I guess Redfish should probably go ahead and have generic methods that wrap that stuff to interface level. We may even have them. Yeah, double check. Yeah, I mean, just kind of drive that consistency. I can go from there. Is this about turning it on? Because I'm sure Pierre had secure boot working with Bifrost, but it was kind of turned on outside, I guess. Yeah, it'd be about, in my mind, it'd be about turning on secure boot so that it is running on these machines or able to be toggled on these machines based on the control. OK, got you. Something. And it might be, oh, your flavor defines it. OK. I mean, I think that that's the continuation of secure boot from the original pattern. So I'm going to make a note. Does anyone have any thoughts if I were to go ahead and request more PTG time for next week? Probably makes sense. I already have full schedule and got to skip popping up. Yeah, so let's see. We talked about partitioning. People want highly customizable partitioning as well. Architectures, we did talk about that. And I also talked about ARM in another session this week. There seems to be a lot of interest to start getting unit tests running on ARM hardware in addition to X86 hardware. And fundamentally, because the end results and some of the faults that you end up with are different on ARM. And because we've hard-coded X86 patterns and X86 binaries and tools in the source code. And it's not just us. It's all the projects have done this. So we should probably expect to see some of that. And I don't think it'd be hard for us to have an integration job at some point on ARM if the pool of machines at 1RO are reliable and available. Unit tests are a good start. I think there was already a template and project to just add them. Yeah. I didn't mention that. I know in our case, the only thing that I knew that was blocking us was that we need nitixie binary. And we can always add the code to do the cross compile on demand for ARM. Since there's no one builds it for ARM, it's doable. It might just be someone cursing CI for a week, though. Sorry. Just a wild idea because you like wild ideas. Did we plan on duplicating NBR support? No. No. It was a short discussion. Can we change the default? Yeah. Yeah, let's change the default. Let's change the default. Yeah, let's change the default. I thought we changed the default. Oh, no. Are these additional items that are showing up in our partitions LVM, ButterFS, and DFS? Are they all coming with Laker support? Oh, well, yes. It's fine if you don't know, the Fedora has switched to ButterFS by default as a partitioning schema, so not just as a file system, but as a way to partition the hard disk, because it has partitions, believe me or not. So this idea is crazy, but it's not as crazy as it may sound initially. I always want to type ButterFS out. So that's really interesting. The one concern I'd have is if we move away from supporting things, well, Windows would support like a GPT, we would make ourselves incompatible, potentially. We are very Linux-focused, though. So maybe I don't see a problem if we were to carry code to handle DPRFS partitioning as well. I just don't think it could ever be the default. Yeah, I don't think either. And honestly, there's no actual operator interest probably, it's premature. It's probably something to consider. Maybe actually doing LVM by default would be more interesting. Well, at least that allowing doing the same thing but with LVM, not like anything fancy. But partitioning, a partition image with partitioning but with LVM, that may bring some value? Maybe. Although I think one of the problems we're going to run into is partition file system is not in RAM just not having the appropriate things for LVM. And I think our downstream QA found such an issue on whole disk images coming out of DIP yesterday. So, yeah. This Cambridge build is honestly another discussion topic because it's a cool project, but it's pretty much in life support, in my opinion. I think it's in life. We should probably discuss why. Let's add a name, Disk Image Builder, to our stand. Because there's Ian and this guy whose name I never remember, who is pretty new to that. And just the two of them. And then infra people who approve page and demand. Yeah, and people like us that come, why did you do this? When we finally have the cycles to focus in on the problem and find it. Yeah, I think it's three boxes cycle and DIP or something. The one that they identified, I'm pretty much like, it's not an ironic bug. It's not a bug in our provisioning process. It's the fact the thing doesn't know what to do with itself. That's not anything we could fix. Right. It's like a software rate. We expect the operating system to understand software rate. Yeah, true. Very true. OK, so we briefly talked to the architectures before we move on. There was mention of power. Or PPCLE64, 64LT. I did offer to email Hugh Blemings of the Open Power Foundation to see if there are any thoughts or ideas about maybe seeing if there's some way somehow some sort of power CI could be built that is independent of PowerKVM. But PowerKVM still runs. So it's just the projects or reports too are less because it's like for ironic it broke and they haven't had time to fix it. So RAID, we talked about RAID briefly, interest with RAID. We started talking about RAMDIS deploy again. Someone asked about tempFS usage. And then we got to this interesting, are there interests in these ideas? Or am I moving too fast through the etherpad? I think it's OK. I hope that people can just speak up as they want. There are lots of plus ones on RAID. And this is more just a question. It's something I was banging my head against. I think specifically this was ILO hardware RAID to be quite specific. But to do with secure arrays because the disk supports secure arrays, but the rated volume isn't secure or raiseable, which is kind of a pain. Because you basically have to break apart the RAID, secure arrays the disks, and then put them back together again in the hope that's quicker than putting zeros everywhere. I'm just wondering if it's just me or I pressed the wrong button or I haven't bought enough license. No, if I remember well, it was exactly like that. You need to break the RAID and then clean the disks and rebuild the RAID. You cannot just, yeah. Yeah, I got the impression if I paid enough money to the right people got a license, they might do something magical. I can help. No problem. OK, I'm just glad it's not just me. So, you know, that's nice. So they did add, I think they added a feature, a clean step to help do that for the ILO 5 specifically, up to date firmware. Are you talking about total erasing? No, there was a different one I proposed. OK. Total erasing, I'm decommissioning the hardware, I'm never going to see it again. Yeah. I mean, this was relatively, this is quite modern hardware that was still the same. But it was still the same unless you fed it an expensive license, I think was my memory of it. And it could still be that there's a license required. Yeah. OK. The fun of computers. Yeah, it's good. Software ready for the win. Anyway, we briefly talked about partitioning. We just talked about secure race. Some interesting kicks start deploying. That's currently a spec. From the prior session, there was a question at LVM. And we just talked about LVM. Yeah, it was also for me. Oh, OK. It was the same idea I just voiced a few minutes ago. I mean, is it something you feel so strongly about that we should discuss? Or is it just an idea you're throwing out there? Well, I don't feel particularly strong about that. But I think if somebody ever has some time, it may be like a good checkbox to check. Like, do you support LVM? OK, we have these basics. We can talk to you about how to make it more of it. Well, I don't volunteer to writing this, to be honest. Well, someone just raised a point. If it would help snapshot plans, and if we had long-running agents inside the OS, then yes, it would absolutely help with snapshotting, which could be, as someone's typing, or no, it was a different line, someone's typing the kick start would be awesome. But I think snapshot in the running OS would be awesome as well. It just people would have to understand that they wouldn't get all their file system. Or we'd have to make it tunable. For what it's worth, there's almost an equivalent in Nova with the QMU guest agent, where it kind of says, if the agent's present, turn some features on and off in a different way. I'm not massive fans of persistent in-house agents, but I'm massive fans of in-place snapshots. So maybe I'll have to suck one of them. And we can still boot into a RAM disk, make a snapshot, or mount a file system, make a snapshot. True. Yeah, it makes it optional, doesn't it, I guess? Apart from that weird API problem. With the conundrum with friend, with doing the snapshots, do we delete the snapshot, or do we? And what happens to snapshot, basically? Do we keep it, stay in RAM disk and extract the data, or do we go into a running state and extract the data? Or do we just do it within the OS and extract the data? Or not? Yeah, I guess there's two things out there there, because I really like the idea of crash consistent snapshots going into Glance, just because it's so consistent with VMs, and it's kind of what people are used to. But it seems wrong not to have the feature where it's a proper shutdown and image the machine into Glance, right? I'm talking of the non-standalone case, of course. I'm typing, basically, what you just said. Why does it also seem wrong not to shut the machine down? Is it just file system consistency? Yeah, it's that crash consistent thing. It's, because as soon as you start going down this line, you then start talking about the, oh, is it? The QAS stuff, is you can try and QAS the file system while you take the in concert with taking the snapshot, right? She is theoretically possible here. Yeah, I mean, it is, but if we try and do it in host, then we run into a lot of risks. And I know there are gay-based platforms that, sorry, the corky overlords demanding attention and deal it out. I know they're- Bring them here. What was that? Bring them here. This is your king. The corky overlord. Hey, corky overlord. Hey. Okay, so he's going out now. Where are we? Shutting machines up or not? Yeah, people are still typing on Anaconda stuff. Yeah, we should get back to it afterwards. Okay, I know there's some database platforms that use shared memory mapping and they don't clean and purge out. Unless you totally shut everything down. So you can't actually snapshot the system and expect it to work. Yeah, and I, that's, I think it's a very valid concern. I think if we did snapshot without the ability to do a, well, I suppose you can just power the machine off, can't you? If the machine's in the power off state and you do a snapshot, you just do it the other way. Well, we'd have to gracefully shut the machine down. Well, I mean, if you do snapshot when it is in a powered off state, having it, if it has been gracefully shut down. Yeah, you could bring it back up and then start extracting the data once it's rebooted. Do it as a, I mean, we've maybe given ourselves another excuse for Kexec. Maybe. But at the same time, we've probably also just, we are now in the, if we can meet back in Dublin at some point and just start drawing on the glass walls, we might fill one of those glass walls with the workflows that could be. Well, this is a random idea. That might be an easy first step for snapshot. If you just require that it's in the powered off state for snapshot to work and just send a 409 conflict of everything else, that would work. Yeah, that actually would work really well. That would require that the operator has done something or knows how to, has considered the fact that they have to turn it off. Yeah, I mean, but that's better than not having any support. Because I think from, like the particular use cases I'm thinking about, well, what are they? It's really just creating an image for a cluster, right? So if I just take an arbitrary node, power it off and call snapshot. And I just call success on that for any. Yeah. I don't know why I haven't thought of that before. That could be useful. Could be. So we briefly talked about Anaconda again. I think, did we? Maybe, maybe. No, yes. Yeah, there was a short exchange here. I'm not sure who is green. I'm green, I'm young. Yeah. Yeah, not a user of ironic, just more or less trying to figure out the deltas to adopt ironic. So your concern is that kickstart scripts will be different from whatever you're using right now, right? Yeah, so the trick comes in that if you have legacy kickstart scripts, there's an immense hesitance to modify them because they're kind of tied to OS plus a machine configuration. So yeah, technical data in other words. So I, if I remember how that works specifically, that could be a good case for just doing something around the RAM just deployed. The conundrum I guess is there's never work. There's unlikely to ever be a really clean way to ask Nova to give you a bare metal machine that was fired up with this kickstart file you just provided unless we figure out the delineation, how to identify the differences and when to do those things. That being said, it's probably good to go make such comments on the spec as, hey, this is an aspect that needs to be considered from our operational experience because it's something we have to deal with and think of. So Richard, I didn't quite get your comment on not being able to request from Nova. I think the spec proposes that kickstart file can be passed through image metadata, no? No, I wouldn't. What? Yay. Sorry. What I'm saying is if we have another case where we need a slightly different kickstart behavior, then we need to probably, we probably won't be able to support it in the same exact path using Nova or we'd have to somehow identify that we need to have different logic and ironic to handle it slightly differently. You mean for an old? Yeah, or per kickstart file, what it sounds like. A broken kickstart file, for example, will time out because it will never inform ironic that it's successful? I think it's a current proposal to end up with a timeout. We can, of course, have a flag just being an assumed success in a way, but that's for sure. I thought that the Anaconda stuff, if it knew that there was an error, would indicate that to ironic at one time out. Well, if there's an error. I could be wrong. But if it's still spinning away there and ironic doesn't know what's going on, yes, it'll time out. Another way is just assume success. Is there any reason why ironic can't just say, this node is now, this node is under a kickstart. We don't have any success feedback at all, just. So it's interesting, you mentioned that. That's the exact behavior of the Rambus deploy interface. We boot and we just assume it works because we have no way of knowing. I guess this mode has a right to exist, even in an Anaconda deploy. I'm not sure we should do it by default. I'm pretty open to have this as an option, just assume success with kickstart deploy as long as you manage to start it and you're done. Yeah, as long as we didn't raise an exception on turning the power on, we're probably okay. Yeah, at least I have an option where you can say, basically, this is a blind deploy. We're assuming that external polling is gonna check the health of the machine later. Yeah, that's a useful comments to make to the spec if you could do that because. Post on there? I think we want it to be more secure, more robust by default, but as an option. I think it's cool to have. Yeah, well, that at least gives you the option that you can use unmodified kickstart images if you're confident that you can do tooling outside. Anyway, what else? Do you wanna talk about that for a few minutes? So as to snapshoting, I can throw a crazy idea that I mentioned on the spec and I think I frightened KFM too much and all that. So what you currently describes, so this booting IPA, I'm not taking any instance agents into account right now, booting IPA, snapshoting sending back. The whole mechanics around that sounds a lot like rescue. Even more, it sounds like the update steps or active steps which we are planning on. So I was like, should it snapshoting be implemented through update steps? And there was silence after that, which maybe may mean I'm crazy. Here's the conundrum. It probably could be articulated that way, but because we're throwing out a nebulous future feature or a nebulous feature that is not clearly defined and already in development, he is blocked and effectively cannot contribute. Cody's already written. Yeah, that's the problem. On the other hand, we're gonna have another duplication of the same code. I think we'll have a duplication regardless. It's just we have to figure out where the happy path is. And without getting people to upload code and participating in the community actively with the things we're working on downstream, we end up in a situation where if we don't actually make the progress they feel we need to feel as if they're part of the community. That makes sense. It makes, yeah. It's definitely conundrum because as engineers that care about a project, we're focused on its success and taking the right path and not building too much technical debt and not just shooting ourselves in the foot. And it's a big ask when someone comes on saying, I want a code dump. It's not fun. Right. I feel it for him also. I remember my last refactoring of the steps code adding another dimension to its own scary to me. Yeah. It may also be that he's got something elegant. There's no way of knowing until we see the code. Right. So I didn't block the spec. I think I was just kind of like a comment. Yeah. I thought about this. Well, we're functioning at time but I want to talk about this image builder briefly. Let's do it. So I agree that we're facing a situation where this image builder is an open stack center tool that is barely being maintained because it's built around a specific use case or has evolved to be a specific use case. I don't know how we fix this. So multiple times we had to actually intervene to maintain some parts of this image builder ourselves and maybe it could be, I don't know, something that, I know that it takes resources but it's something that we could think about or. Yeah. And it's not even about like writing a lot of code because the project works. No, no, exactly. It's just small corrections, I mean. Yeah. Maybe we should help them define, for example, the CI testing metrics better so that it matches what people are trying to do rather than what they presume people are trying to do. That seems like what we should do then is we should schedule a call or meeting or something with them and actually try and talk through some of the issues and maybe see if we can find a happy, happy path forward to improve testing and improve that feedback loop. Right. And how we can help because we're relying on it quite substantially. Yeah, true. I suspect they'd appreciate that. Who's, except like, besides us in terms of ironic project, who else is using this image builder directly in the open style community? For example, and infra. Yeah. Okay. Notepool. Okay. Okay. I honestly don't know how other people build disc images. The only comparable tool I've ever seen was a web interface from Susie that told you you could download your images in 30 minutes. Okay. We can use guestfish as people sometimes propose if you have a good base image, I mean. Yeah. It's always the best, but yeah, it's in a crunch, it works. Strangely enough, I was looking up how the CentOS disk images or CentOS cloud images get generated. They get generated via kickstart. Right. Oh, I think Arne mentioned yesterday that they're creating images by installing an operating system in a VM and just taking this VM disk. That is exactly what they do. It's pretty weird. Yeah. And I think actually Arne mentioned the same thing occurs at CERN. Yeah. It's actually a common practice. It happened as well in my former company. Yeah. I've done it as well. I mean, I've built a lesson just that way many, many times. I'm using guestfish for my purposes. There are positives and negatives to it. Anyway, everyone, thank you. And I guess we'll talk again soon. Hopefully next week. Thanks. Thanks. Thank you. Thanks.