 OK. Hello, everyone. It's nice to see you all here. My name is Marina, and I'm a member of Cloud Infrastructure team at CERN. So together with Arne, who is here as well, we managed almost 9,000 physical nodes in CERN with Ironic. And today, I will tell you the biggest challenges which we have faced so far. We started to use Ironic in 2017. And I will explain you how we handled them. I should warn you that it's not always the perfect solution for all the problems. Sometimes it just a workaround, but I hope it still will be useful for you. So just a few words about Ironic. Ironic is service for management bare metal nodes. It can be used standalone, or it can be integrated with OpenStack. In our case, it's integrated. It's fully open source. And basically, it provides the same interface and the same functionality for physical nodes as OpenStack does for virtual machines. So let's start with problems. The first problem was transparent adoption of nodes into Ironic. So it's quite common that operators already have some nodes in production, and then they decide to switch to Ironic. And while Ironic does support transparent adoption, no end placement, they don't. So if you use standalone Ironic, then it would be easy for you. In our case, we had to come up with some solution. And what we did, basically, we tricked Nova into believing that these nodes are fully new, and they are just enrolled into Ironic. So after the node is enrolled into Ironic, you need to go through cleaning step before the node is getting available. And this was our issue because nodes are running, so we don't want to actually clean them. So we had to use fake Ironic drivers before the cleaning. We set fake drivers on the node. And these drivers, they just provide interface, but they don't do any actual cleaning. So in the end, we still managed to successfully go through cleaning step to make nodes ready to use. And after, we switched from fake drivers to the proper ones. On the graph, you can see the growth of our deployment. So it started in 2017. We have some small chunks of deliveries here. And in 2020, we had adoption campaign, which was quite big. And now, as I mentioned, we have almost 9,000 nodes in production. So the second problem was mapping of a specific physical node to the specific Ironic instance. So we have had this use case, for example, for adoption, which I explained previously, because we already had some nodes running. And we needed to map them directly to the specific Ironic instance. So the first idea, a solution for this problem, was to create separate flavor for every node. So let's say for five nodes, it's doable. But if you want to do it for 100 nodes, it's getting quite cumbersome. So second idea was to move all nodes, except the only one, into maintenance node. So this will leave us with only one node available. The problem with this solution is that it's taking 15 minutes for Resource Tracker to update the available resources. So Resource Tracker is responsible for saying which resources, in our case, it's bare metal nodes, which resources we have available. So for 100 nodes, it will be 100 times per 15 minutes, which is quite time consuming. So the final solution was to add nodes to the placement aggregate, one by one. This is an instant action. There is no extra time waiting. So we have created a script, which will add one node to the placement aggregate. We will instantiate the node, and we will repeat it for 100 nodes, let's say. The third problem is user-facing resource overview. So in contrast to virtual machines, when a user can characterize the node which he wants to have, for physical nodes, we have a direct mapping of hardware type and the project. So other users, they say, we will need such hardware in such amount, and after we have it, we assign it to the project. The problem is that there is no simple way for users to see how many nodes are still available. So they can do OpenStack bare metal node list, and they will see the nodes which are instantiated, but not the free ones. So what we started to do, for every project now, we set a property called max instances. And there, when we assign nodes to the project, we set how many nodes of every flavor we assign to the project. So with a simple mass, user can see how many nodes are still available. Second solution, which we started to use recently, multi-tenancy was presented in recent releases in Ironic. And with multi-tenancy, the node has an owner field. And there, we set the project ID, indicating the project where node belongs. So now members of this project, they have much more freedom with their nodes. Not only they can see nodes in all states, they can clean the nodes, they can set rate, for example. The next problem is missing inventory data. So after the new node is enrolled into Ironic, it goes through introspection. And introspection is collecting the data about the node and sending it to the data storage. And we have had some moments when we had either missing data or the data was wrong. This happened for various reasons, among them that S3 data storage was recently introduced. So nodes from before don't have data on it. We hit a chunk of nodes with wrong serial number. And there was LS hardware back. So LS hardware is tool for specifying information about the hardware. And it was reporting nodes with zero memory sometimes. The problem with missing data is that we can't run introspection, usual introspection on the running node. Because to run introspection, we need IP image on the node. And node is already running some service. So we had to use active introspection. To use active introspection, we need to log in into the node, install all the packages for introspection. And we can run active introspection, which will collect all the data and send it to the data storage. We have created our own container. So now we don't have to install every package every time. Fifth problem is reduction of hardware management cycle. Hardware manager is think how ironic can support different hardware types in one agent. And if you want to override any of hardware management actions which they perform, we can create our own customized hardware manager. So for example, we have our own concern and we have some extra cleaning steps. By default, hardware manager is hard-coded into the IP image. So when IP builder is creating the image, it puts hardware manager inside. So if you want to change something in hardware manager, if you want to fix a typo, let's say, then we need to fully recreate the image, which is taking around 10 minutes. And after, the node will boot with new image. So in order to avoid this extra 10 minutes, we have created our own patch, which instead of hard-coding the image inside, now it will pull the image every time when the pull the hardware manager every time when the node boots. So all we need to do is to restart ironic node and then we will have new hardware updated hardware manager. So that saves us 10 minutes every time. The last problem, it's scaling issue. So with the growth of our deployment, we started to hit different scaling issues. And the biggest one was the time that resource tracker was taking to loop through all resources. So as I mentioned, resource tracker is responsible for updating information which resources, in our case, it's bare metal nodes are available and after it's sent to placement service. So for us, when we had deployment of 5,000 nodes, it was taking three hours for resource tracker to run. And this was a big issue because the operation was blocking. This was happening because we had all our nodes controlled by three ironic conductors and we had one Nova compute on the top. And resource tracker is running in Nova compute. So what we had done, we split our deployment into 20 groups. So now we have more or less 500 nodes, ironic nodes in every group, controlled by one ironic controller and one Nova cell on the top. So instead of having one Nova compute, now we have 20. And now instead waiting for three hours, we wait only for 15 minutes. Moreover, we have one special group, we call it leading group. And to this group, we add new nodes. So after we add the node into our loading after it's enrolled, we need to run introspection, we can run burning stress tests and to avoid host overload, there we have five ironic controllers instead of one. Plus there we have fast track function enabled. So this lets our node to stay on from the moment when it's enrolled and starting into ironic through all the steps without rebooting like through stress test benchmarking. So it still saves us some time. And the last bonus, it's our group set Aston. So we have created the collection of most common grab errors. For us it looks like a table on the right, we don't keep it. And you can see that, for example, we have like two errors here. So it's the most common error, where we can find it, how to fix it, why it appeared. Because sometimes it's not very obvious. So that's all from my talk. If you want to know more how we use OpenStack, I encourage you to come tomorrow to talks of my colleagues. And if you have any questions, feel free. If you don't mind, maybe you can come to the microphone so it's recorded. Thank you. For the active introspection, what do you use, any authentication or is that not a problem because you are in a trusted environment? No. No. There's no authentication. Okay, and it's just for that. For the hyper manager integration, that sounds really fun, but I always said it was like Shelly into the note while it was introspecting, or while it was having the IPA boot and edit control. So that's obviously not as nice. Is this git integration public somewhere? I think, no, it's not public, but if you have IPA builder, then we just do it via configuration. And we just set it like pre-installed option. So like when notes boots, every time it has some actions to do and pulling the image is one of these actions. Oh, okay, so you... So we can figure it in IPA builder. So IPA builder is saying to our image that when the image boots, it needs to do this, this and this. And pulling hardware manager is one of the actions which note, which image has to do. Okay. Yeah, thank you. Thank you. Hello. One question, are you using some kind of fabric network or SDN? No. No. No with iron. No with iron. You said that you went from three hours, 15 minutes with the research tracker runs. Is there any available data for post-upgrade with the performance patches? Is there any, sorry? Any update statistics based upon the Ironix API performance patch work that occurred last year? I don't know if there's any. Just any, what's the... Is research tracker running much faster now? Ah. I know it's not all research tracker itself. I can't answer. I don't know if it's faster. It's fast enough for all of us. Okay. The 15 minutes is basically like the compromise of how many nodes we wanna spend per conductor group versus the accept of the time. So we can show this by adding more in more conductor groups. Okay. The API performance I noticed when we like the list for instance, but I didn't check up. Can I have the microphone please? Sorry. Sorry, sorry. Okay. So the research tracker time is mostly determined by the amount of conductors that we have or conductor groups that we have now. So we could easily have more conductor groups which would shorten the time to actually collect the information about the various conductor groups. For us it's mostly a compromise between how many additional resources we wanna spend on this versus the acceptable time until a new resource basically appears. Okay. So when you delete an instance, it takes like 15 minutes until it's or less than 15 minutes until it's like reusable again because the research tracker needs some time to find it. It's now less than 15 minutes. If the API changes had any impact, I cannot answer. For sure we saw that on other operations, the API changes that were done had a massive impact and improvement of performance. But for the research tracker specifically, I don't know if there was an impact. And we haven't measured the research tracker time again. Please. Which we can do. But the reason we didn't do this is that there was no issue or there's no issue at the moment. So it basically like is like flying under radar for now. While it was like extremely visible when we have like thousands of notes within a conductor group. Thank you very much then.