 Thanks everyone for joining. Welcome to the bare metal sick meeting today. Without further ado, we have the ironic PTA jury Talking us through what happened for ironic in Zina and what is planned for Z Yuri out the floor Thank you are So basically this is just a highlight of some of the cool things that we did in the center cycle and what are the plans that We will attempt to Do in the Z cycle for ironic For those who doesn't know me. I'm Yuri Gregory a senior software engineer at the head hat and current ironic PTA You can pass this light So the agenda Just yoga cycle what we're able to achieve there and the plans that we have for the Z cycles That cycle and then I will open for questions, but if you have any questions why one Providing any information feel free to Just do it go ahead so for the other cycle in the Ironic project We were able to add verify steps. It's a new step that you were able to run Predifying actions in the driver for the node when you are in the transition from enrolled to manageable and this is prior to running introspection on it The head fish hardware type now it's enabled by default we were holding that for quite some time I would say and one of the cool things that we were able to do during the yoga cycle is now that The full deployment boot mode for the nodes is now you fee instead of legs bios So if you are an operator that You are using legacy bios and when you upgrade to yoga you have to specify things Even the configuration level for the conductor in the default boot node boot mode or For each node that you have separately The fast track feature now can be set at the node level. This is something that is really interesting that we were able to achieve during the cycle Also, we were able to verify the values for the default enable interface based on the enable harder types that you have configured so if you will set the enable interface for an empty value and The enable harder types will be able to figure out What values for each interface we can have? Configuring your deployment We also add a new parameter to be able to distinguish between partition and whole disk image This is the image type parameter and it's available in the instance info for the node Next slide part two for the ironic project we have a deprecation for the Networking boot. So basically booting final instance via network is deprecated except for the case when you are using boot from volume or Trying to use the hand disk deploy interface The IDRAC was We had support for it in the RAID BIOS Management Queen steps to be able to run without IPA when we have the Hand disk disabled during clean steps Redfish and IDRAC Redfish management interface the firmware update Queen step Now has support to use Swift, ATP servers and file system support for ironic when providing the stage files on it and Basically, this is the full highlight for the ironic project itself We also were able to do some bug fixes mainly for RAID and Anaconda driver also Now we have things working for it and Next in the ironic spectrum the Only highlight that we have is now that we are we are able to support a future by the state of the introspection for the node. So now if the operator wants to figure out For each node in the state for introspection if it's starting if it's finishing Now they can filter by the state with the view on introspection using the state filter There on Kupaito agent we had a few features that are interesting We add options to have new multiple files for the burning logging thanks for that and also A new option for the disk burning ancient burning field disk is marked test It was added For network burning nodes can now be Paradynamically via a distributed coordination back in as another way to the stock configuration that we have in the beginning And if you want to give some highlights since you work on that feel free also Right. So for this one, I was like, okay I was supposed to like thinking of waiting to the end But for this one specifically the network burning and the by a distribution coordination back and we have just done this I have So far I had only tested this with 20 nodes. So basically 20 nodes go to this back and say, okay I need a partner to do the network burning but we had recently a new delivery of Around 180 nodes 150 of them were burned in parallel with this dynamic pairing and it worked so At our scale, it seems to be working fine. So all these hundred fifty nodes were basically boot up booted up To do a network burn and they all went to this Room or this I think it's a room or group in the zoo creeper back end And found a partner and all of them finished their network burning Successfully so basically like they find a partner one is writing the other one receiving once they finish They swap roles and do the same thing again and I was a little bit surprised when the like hardware colleagues came back to me and said like there's some things with you know The whole auto discovery and auto registration, but they didn't mention anything about network meant we're burning because that to me looked like the most fragile part So I asked them but they said it worked like a charm So it seems to be good and that's quite nice. The reason we move from static to Dynamic is that in case you have a static file and you basically configure these pairs Well statically or manually In case there's something wrong with one node or a handful of nodes the partners won't be able to burn in Unless you reconfigure the static mapping between them with this dynamic approach Basically, everyone who can will do the pairing and burn in and only the ones that are broken will be left behind So this is the reason why we have this and it seems to be working. Okay, so We deployed this successfully Ask a quick question Good to um, so did you say that you're using zookeeper on the back end? No need. Okay, and So I guess you saw a quite of increase in network traffic when you're able to do all this in parallel Maybe it was the same because with the static you got the same network traffic If all the nodes were capable of participating, right, okay I don't have any graphs on showing the network traffic, but When I did this on the nose, yeah, of course, I mean they the the this is basically Saturating the network between the nodes, right? I mean they're going at full speed Depending on how many nodes we have that involves one or multiple switches so when if it's like the same switch and It would probably be able to go at full speed if you have like multiple switches that need to go like one level up and I don't think that the switches have like Blocking factor that would allow to go at full speed, but I don't have the the numbers but Yeah, I wanted to check to see like the network, but yeah, it basically is saturating the network on that switch Yeah, so when I did like one-on-one you could see that it goes to the full network speed basically Thank you. So this is all done with FIO, right? Thanks Awesome. Thanks, Arndt. And so let's but not least We were able to add express node cleaning capability to IPA Mostly to be using environments with hybrid storage configuration NVMe plus AGD The idea is basically that we will try to perform first and secure data erase in NVMe device if they support and other devices on the node that can't perform the secure data Erase it will fall back to erase device metadata only Also doing the the yoga cycle you were able to have some bug fix the most interesting one is that Finally for software rate, we were able to get rid of group to install and be therefore to be using a fight boot manager and That's it from this part there's one more thing if I may add to the option about the named output file and So people may wonder why do we have this because it's on the inside the IPA So why do I need to have a named file on the IPA where the login goes? So this is something that we added in order to have So in our image, we also have flu and D which will basically pick that file up and then send it to some Central logging that this would go and that's easier if you are actually able to have the output of the various Burn-in steps like this or network or CPU and in specific files so that you can as fluent send can send a flu and D configuration with the image that actually knows where to find these files and Send them to a central in our case elastic search instance where you can then visualize things So this is what these options for Yeah, makes sense. Awesome So I move on with slides. Yes So for the Z cycle, these are some of the plans that We have we are still figuring out a few things and Some are under discussion. So I didn't add everything here Mostly one of the interesting topics that we have is the iron safe guard It's focused on the cleaning operations that we have in Ironica. So The idea is that we will be able to Possibly limit the number of concrete open cleaning operations that we have in deployment and Also It's bad. Maybe specify a list of discs that should or shouldn't be cleaning The deployment for each node. This would probably at the node level The configuration to specify the node if you want or Don't want to be clean And the idea is that Limiting the number of concrete operation that we have for cleaning You can imagine that if there is an attacker and they have the creditions, they can Try to delete all the nodes that you have in your deployment. So Having these configuration it would Be good for the operator so they can limit the number of nodes that will be affected If someone is trying to delete all the nodes in the deployment basically Our back phase 2 This is a go from the TC that we have been doing for a while. Thanks Julia for all the efforts on that We are in the phase 2 That is describing the document from the TC and basically it's adding support for the service world if I'm correct If I'm not please Just let me know CI health it was a topic during the PTG that we have some discussions We will be trying to add more testing coverage to our CI basically related to some of the drivers that We have especially the anaconda one and trying to re-enable some of the jobs That we let for no voting for a while like the grenade with no job and also IPv6 testing If this one is already fixed we are working mainly on the grenade and also try to Reach some of the community goals basically that is one related to Skip upgrade From tick to tick release So basically if you have a yoga version you can upgrade directly to the AA version that will be out next year And also we will be trying to ensure that we will be able to use the new Ubuntu 22 I don't recall the name at the moment The open You're upgrading Zool as well or is that something that's done by the infrared team on our behalf I'm not sure if I got upgrading Zool you mean Zool The open dev Zool is upgraded as new releases come out and that's all managed by the Okay. Thank you. It just happens. All right. Yeah, you know, are they going to five? I think that's the latest. We may already be on five Okay, well Thank you. I mean, it's best to just directly ask the folks Thank you Next topic open config support for net and bare metal Basically the idea in this one that we will be adding device configuration capabilities for the networking bare metal This is a use case that it's valid for access and edge switches Since most of main vendors have done the ML 2 mechanism for pool that accept plugins to have support for that This is enough for that is ongoing and mainly driven by Harold Deploying clean steps proven improvements during the PTG. We had some discussions relate to that The idea that we could have custom timeouts for each Deploy or clean steps and also be able to have a pair nodes overrides for it That's it for the Z cycle some of the plans that we have If I forgot something that we are we planning to do please let me know Next slide So open for questions if anyone has no Thanks a lot Yuri Does anyone have questions Other points to raise One thing that I would be interested in is if we know something about the adoption of redfish because we make this in like default driver And at least at CERN What I checked earlier today we have a couple of hundred nodes now that use redfish as their default Hover interface In production But I was wondering what the adoption is or if you have a feeling for the adoption of redfish for In deployments Because one of the things is that we have to at some point decide similarly to the decision whether we move from bias to ufo Which was a little bit easier because we were basically forced to do it by vendors because vendors moving to ufo IPMI still works so I don't know if like You know What the way forward would be I mean it both works at the moment for us so we can use IPMI and we can also use redfish There's some things that we need to iron out around like So Maybe it's also special because our users have access to The console for instance so they they get access to the to the credentials so the all the tooling like the The redfish tackle box and there's some things that are open in order to like see the service event log or something All of this is improving but I'm just wondering like what's what's the adoption. I mean, what's the feeling for So I can't speak to our some of the AMD right and I can't speak to our downstream customers Plans I do know that our All of our test fuels are running redfish very heavily and my understanding is that our Internal tools and like biases and BMC, you know out of band management We don't tend to make those changes unless Our customers and by that I mean like the large server vendors Have decided to go to that Like we're not usually going off in a different direction than they are So with that I heavily suspect that you're going to see everybody trend towards redfish Fairly soon. Okay Just based on that We're heavily using it, you know in our development machines and For those who don't know me I do one of the the server hardware bare metal testing cloud at AMD And so everything I've got is a test mule like If you all ever see one of our models something has gone deeply wrong But As a result we write our own BMC code and in our own biases And Like so we tend to stay pretty on top of what Customers are using downstream because if we don't test with redfish and they're Heavily using redfish and that has a problem integrating with our reference bios Then that's a problem Right In order to find it. Sorry Joe. Yeah, go ahead I was gonna say for my my experience looking at cases I would say Probably half the cases I look at these days are actually redfish based and that's where the default examples all just say IPMI So I think we should probably Yes, people are trending but The other aspect that we should probably keep in mind is we are starting to see some vendors have weird breakages IPMI Especially when it comes to Yuffie or hardware that doesn't support BIOS mode anymore Weird things are happening and Yeah We are seeing we are seeing Some stuff from the ODMs when we get ours that their default You know was not enabling it until we read it to turn it back on and so Like that we have that ability Nobody else does Yeah, I imagine you're gonna see that more and more as time goes on Because I was mostly in all deployment. So I was mostly Introducing redfish as like an alternative So I was doing mostly doing this after the hardware could be accepted Basically I verified everything with IPMI all the tooling is based on IPMI and then once they the nodes end up with me and ironic I changed the hardware driver and this Retrus instead for all new deliveries since a while and it seems to be working where there were some issues in the very beginning Remember this ETAC thing and there were some other things that are a little bit off But now since like three or four deliveries is actually working quite quite well But at the same time I have to like sell this to the to the hardware colleagues And so like hey, this is not betfish get used to the new tools. So So I was running because for for as I said for you if I it's a little bit easier because we were forced to move I was also saying we need some kind of I'm waiting for some Nice features like this we have different features like this We have discusses or these one-time links for for the HTML5 console And if you can get something like this, that's quite And a nice thing we can say okay that that it makes redfish superior because if you say with the like switch on switch off set the device Yeah, that's very little to Sell actually but yeah that being said I'm trying to like move everything to redfish that we have Yeah, and basically on my side most of the cases that we are getting from customers is related to redfish also more than 75% I would say I don't see many many cases related to IPMI But this is this is because so this is the number of cases that you have But how does that relate with the number of deployments that you have that have actually redfish So if you have like a large number of redfish deployments, that's kind of good But if you have a small number of redfish deployments which create a lot of errors then that's rather bad Yeah, when we consider like for example going in the telco scenario, I would say like yeah everyone runs redfish basically So I would use this like this to sell it in one Yeah, I think that's a good point because of the Photography issues that exist in IPMI anyone with running with strong security groups can not be running IPMI Realistically Unless they've already accepted it and just go it's a known evil and broken which is not great either So one of the one of the interesting things that I've been seeing is I'm not necessarily seeing customer cases on new IPMI hardware issues I'm seeing it from our own internal labs where people are getting new shipments in and stuff goes sideways We're really seeing it from customer side I wouldn't be surprised if a lot of folks are now running IPMI by redfish by default not IPMI All right. Okay. Thanks. I've got a Number of nodes I cannot reveal but it is very large and yeah redfish is our default and It's only getting larger and I haven't heard any reports for our next generation test fuels doing anything other like They'll support IPMI, but they're not making it a first class citizen That's kind of echoing what I'm seeing At least with vendors And they're now basically expecting you just use that Redfish and I think part of the problem is IPMI became synonymous with BMC So people are Mixing the terms and then it's only take been made with the last couple years that people in those are separating Okay Any more questions or comments Thanks again Yuri Yeah, next slide there is some information also. So If you have any questions anyone to reach out to us feel free to Join the RSE channel open stack ironic on RFTC network or send mail to open stack discuss mailing list With iron tagging the subject