 So, Ben Miro, I just made you co-host. You should be able to share your screen. So as a quick introduction, this is the first of our large-scale SIGS video meetings. We used to do IRC meetings, but we also will semi-regularly do video meetings so that we can have more of a presentation and a theme discussion around a very specific question. And today, to kick this off, we have Ben Miro Morera from CERN who will talk about the choice between regions and cells when it comes to scaling out your open stack infrastructure. And hopefully that short presentation should trigger a lively discussion between the participants. The goal is to share as much best practices and good advice and experiences and questions as possible so that everyone can learn more about how to better scale open stack infrastructure. From a general standpoint, everyone manages to scale out open stack to very large infrastructure, but at the same time, we don't have strong documentation explaining how to do it. And for newcomers to the open stack ecosystem, it can be a bit daunting not knowing exactly what path everyone has traveled. So to kick this off, we have Ben Miro to get away. Thank you, Thierry. Can you see my screen? I can. Yeah. Okay. That means that you can hear me as well. So hello, everyone. I'm Ben Miro Morera. I'm a cloud engineer at CERN. CERN is the European Organization for Nuclear Research. And today in this first meeting in these new formats, as Thierry said, so we're going to try to discuss how to scale open stack, open stack deployment using regions and cells. So this will be a very brief presentation just to set the pace for discussion. So when managing an open stack infrastructure, one of the questions that operators always have is when should I start wondering how to scale out my open stack infrastructure? Of course, there is no special formula that will tell you that you need to scale your open stack infrastructure. As almost everything else in open stack, this depends on your particular requirements and expectations toward infrastructure. There are, however, some factors that can help your decision. For example, as you add more and more compute nodes into the infrastructure, we increase also the risk that if something goes wrong with the control plane, that will affect all your infrastructure. So to mitigate this risk, usually what we do is to set up high available solutions, most of the time very complex ones, for Rabbit, MQ and databases. Usually they are complex, difficult to configure, difficult to maintain and keep running. Network partitions in Rabbit MQ are huge issues in the infrastructure. So if you reach this point, probably you should consider the option to scale out your deployments. Of course, you can have also other motivations. So in these slides, I try to enumerate some of them. So of course, you have a large infrastructure, you have a large number of compute nodes that could be a reason. This huge number of compute nodes create a huge role in your database and in your Rabbit MQ. And you are reaching the point that it's being very, very difficult to continue to scaling up the database and your Rabbit MQ cluster. Or in the other way, you just want to avoid to set up all these high available solutions for these components. You want to keep it simple. Another reason could be that some of your OpenStack components, it's being very difficult to scale them. For example, Neutron. So if you have a lot of compute nodes, Neutron agents could be very, very shatty. And that could be very problematic to scale Neutron to thousands of compute nodes in the same cluster. Another reason could be that you want to partition really your deployment physically because you have multiple data centers in different areas and you want to expose these to users. Or just logically, because you as operator want to organize your resources considering the hardware type, different features that your resources offers to users. There are different ways to do this in OpenStack. We will try to explore some of them. Or in the other way, you want to isolate resources for specific workloads or for specific users. All of these are good reasons. This is not only to do with scale, but good reasons for you to scale out your OpenStack deployment. So there are different options to scale out. Today we will consider cells and regions. Let's start with cells. So cells exist in OpenStack almost from the beginning. They were initially implemented by Rackspace to help them to manage their infrastructure. At CERN we are using cells since our initial production infrastructure in 2013. I will give CERN example a lot during this short presentation to give you a concrete example. In 2013, we had two cells, one cell per data center. And we had two data centers and two cells. Notice that we didn't have regions at that time. And of course, in 2013, these were cells V1. Cells V2 were introduced much later in, I think, bike release in 2017 or 2018. So what are cells? Cells are a nova feature that allows you to partition your nova deployment. This allows you to share the communication load of your compute nodes with the control plane using different write in queue servers and databases. So as a result, you can increase massively the number of compute nodes in infrastructure without having, for example, to manage complex and central write in queue cluster and complex database setup. So let's see some of the cells' characteristics. So with cells in nova, we are able to chart the databases and the write in queue because we are able to do that. And we have a different control plane per cell. We can isolate failure domains in nova, increasing the availability of the clouds in general. Cells are completely transparent users, so there are no APIs to interact with cells. You need to use the old nova manage commands for this. They allow you also to dedicate resources to workloads easily, meaning that if you have resources that were bought for a specific project or set of users, you can easily map them. So basically, these are logical partitions in your cloud infrastructure. And last but not least, they are very easy to configure. So because they are so easy to configure, at CERN we use cells a lot. Currently we have more than 80 cells in production. Basically 80 cells that are the sum of all the cells that we have in our regions. Okay, so a certain example of cells. So this is the architecture of cells that we have. So what you see here on top is basically the cell API, where we have the nova APIs and the main controller. So this is the super control conductor and also the nova schedulers. Then we have the database for the cell zero, the database for the nova API cell. And of course the write in queue and the DNC proxy. So I'm just reusing a old slide, now the DNC proxy is in the cell control plane. And this is the cell. The control plane is very simple. So the nova API for the metadata service write in queue conductor. And then you have your compute nodes. Because we have a lot of cells, so we don't have high availability in this control plane at all per cell. So this is our choice. What you see here is basically the architecture that we have per region. So that's why I'm saying here that per region we have around 30 cells currently. And we have three main regions in our infrastructure. So I would like to show you now the cells that we have in one of the main regions. As I said, because cells are completely transparent users and there is no API to interact with them. So we need to use the old nova manage commands. So to see the cells in a region basically just do at least cells. I'm doing all of this because I don't want you to see all the database details and write in queue connections that also are displayed by these commands. So as you can see here, we have in this region all these cells, the cells zero. Then we have all these shared cells. We call them like this, like a normal user authenticated in the CERN cloud. If they try to put an instance, it will end up in one of these cells. This is completely transparent to the user. Groups of these cells will be one availability zone. For example, we have three availability zones. We group them like this. And then we have all these GVA projects. So these are dedicated cells that we have for specific projects. And those projects can only create instances in the cells and all the other users cannot touch them. Another particular cell that we have here is, for example, this one. This is the cell where we have all the bare metal nodes. All the ironic nodes are managing this nova cell. And these ones are particular cells for resources that have special features. In this case, these nodes are backed up by diesel generators in case of a power failure. So this was just to give you an example of a real use case of using cells. Back to the slides. OK, but cells are the problem. This is only for nova. It's a nova feature. With this, you are not scaling any other components. For example, if your neutron is struggling with loads, cells will not solve that. So we continue to add more and more nodes into nova and just increasing the load issues into neutron. So, for example, again, at CERN for a long time, we only had one region because we didn't want to expose this concept to our users. And because the load on neutron was starting to be so high that we decided at some point to introduce regions in our infrastructure. So that leads us to a region. So what is a region? A region is basically an open stack deployment. It's a completely independent open stack deployment that typically shares the same keystone and horizon components. Regions are useful when you want to explicitly expose users to a deployment partition. So users are aware of the region. And also, this usually helps them to achieve a greater fault tolerance for their applications because they are aware of this partitioning that in case of cells, they are not. So they know that they can deploy their application in region A and B. And if something happens to A, B should be up because it's independent. From the public cloud providers, this is the image that we have from regions. But the region doesn't need to be in a different continent or we don't need to have data centers spread between different continents or different countries or even different data centers to set up regions. So let's now go through some of the region's characteristics. So as I said, they are independent open stack deployments. Users are aware of them. They can give some fault tolerance to the applications running on top of them, the open stack infrastructure that use them as a whole. They can be used to scale components that can be shards. A good example is always neutron. However, we can share more open stack components. Usually it's only Keystone and Horizon. But also you can share other components like Glance, for example, and avoid all the issues of image synchronization between regions. Again, at CERN, we are sharing Glance between our different regions to avoid this issue of image synchronization. Something that we also do, we don't only dedicate cells to users, but also we dedicate entire regions to some special users. So for example, we have the use case for the data processing from the LHC. So we have dedicated regions only for these use case. And finally, regions, because they are an entire open stack deployment, they are much more complex to deploy when comparing these with cells. So as I show you the cells that we have in one region, now I'm going to show you the regions that we have in our production environment. So this is visible to the user. So if I do open stack region list, we're going to see the regions that we have available. Because users are able to see them, it doesn't mean that they will have access to all of them. In fact, most of the users, I will say 99% of the users will only have access to this region. We have three main regions that is the batch CERN and point date. CERN is for generic use cases. This is the region that I show you all the cells in the previous commands. Batch and point date, this is regions dedicated only for batch processing data. And then we use regions to test different functionality. So instead of pulling a completely separated open stack infrastructure to test, for example, STN, we just create a new region and because we need to deploy all the projects, we can have a different configuration to test this functionality. But we use the same Keystone and Lens and that gives us the ability to allow users to easily give us feedback because we don't need to create special projects or again users in these new open stack deployments. They use exactly the same infrastructure. So we just give users access to those regions and they can give us easily a feedback of these new features. So when we have, for example, these regions here, they have an handful of nodes there. We don't need to have thousands of nodes to the region. I think these regions have five, ten nodes maximum. All right, so we are reaching the end of this presentation. So should I deploy cells or regions? Of course, this depends on the problem that you are trying to solve. If you are adding more and more compute nodes, I would recommend that you start deploying new cells because that will allow you to scale massively. But later, if you start finding problems in other components, probably then regions will be a good way to go. More challenging, but if you grow a region too much with a lot of multiple cells, you can always split the region into two or more regions. We did this in the past because the scalability issues that we eat. So this was just a very brief introduction about these two concepts. Let me know your questions. I think this can be a good start for discussion. Maybe you will not agree with me. But that is good for discussion. Thank you. Thank you very much, Ben Miro. It was very interesting. We already have one question on the group chat, which was from Seung Soo Cho. He says, I upgraded Nova from Kilo to Queens so I don't have any cells in production. In this case, is it possible to create a new cell and re-sign existing compute nodes to that new cell? Basically, a question of how do you migrate from not using cells to using cells? Well, if you are still in Pyke, it means that you have cells V1 only. And creating now a cell V1 in Pyke, if you want to move forward then bringing to other open stack releases probably is not the best choice. I don't know what others think about this. Ben Miro, do you remember when cells was everyone started using cells by default? I can't remember what release that cut over was. So they were introduced in Pyke and the next rule is I think that became mandatory, no? Maybe. Maybe it was that quick. I think it was Cata. OK. I guess that's the thing to say. I think I'm messing up with the number of releases now. Could have been Queens, but yeah. I guess the thing to say is that currently when all of your hypervisors are in a cell today, so the nova compute nodes are all in a cell. So the trickier question, I guess, is how do you move a hypervisor between cells? Have you ever done that? Without deleting all the instances, I guess. I'm not sure that's possible today. I think I saw something in one of the release notes that said that some version allowed live migration of VMs between hypervisors. So I think one thing that you might be able to do is bring up a new cell and then migrate the VMs across. And then once you've drained the first cell sufficiently, you could then just take hypervisors out and we deploy them to other cells. That might be one way. So we haven't actually done it yet, but this is a fantastic talk because we're about to try and start going multi-soldier where I work. So I want to thank Balmyra, by the way. Thanks, Chris. I think that is supported since Huzuri, the cold migration between cells. But I think it's called migration. Okay, cold, but even cold would be great. Yeah, but definitely not pike, right? So you need to be really close to upstream or to the latest releases. We have two more questions on the chat. Benjamin is asking, is there a way to scale out neutron like nova cells in a single region? I've seen some user having a dedicated rabbit MQ cluster just for neutron, but wanted to know if other solution tricks exist. Well, if I can start, I think that is the best having a dedicated rabbit for neutron, we have it. So per cell we have a region, we have around 3,000 compute nodes. That means only one neutron. And we have a big ribed MQ to handle that only neutron. Maybe you have a dedicated database as well. No. Oh, right. But okay, so I can continue. So in our case, we have a dedicated ribed MQ for everything. For everything I mean for each component. So each cell has its own ribed MQ. Complete and independent. Then the nova API cell, of course, is also independent, right? MQ seen their different component has its own dedicated ribed MQ. Ironic as a dedicated right MQ. And this is exactly the same for the databases. So we don't share my SQL instances. Between different open stack components. So right MQ as a dedicated. Sorry, a new turn as a dedicated. My SQL instance. Like each cell, like each other open stack components. And do you have in separated database. For cell. Or is it the same my SQL for all cells? No, so it's a completely separated my SQL instance for each cell. Okay. One question. I have a brief. Pause there. We have multiple completely separate open stack installations in different locations. Do you know if we could just tie those together and make them regions or is it. It's too late now. So, I think that should be possible. There's two kinds of regions. So there's regions that share a keystone and regions that don't. It sounds like you've got regions that don't. So in the horizon code, you actually get a dropdown before you log in or after you log in depending on which type you've got. So I think in some sense you already what you can do is really have, I think the keys. I think you put the keystone. Is it right? You put the keystone for the other regions in the other regions. I've run out of all this, you know, I mean, like all the regions can see each other's services, but they're in a separate keystone effectively. I think that's how it's done. I don't know if you've, I guess you've not done that down my room, but. Just to be clear during the presentation, when I talk about regions, I was always thinking sharing the same keystone. Because for me, if you are not sharing keystone, they are completely independent open stack deployments. Yeah. Yeah. I think that's true. There's a long question on the chat, but it's, it's like a lot of words. So I wonder could that Chris just speak it or if not, I can read through it. If you don't have a microphone. Sure. Let me see if my microphone works. It works. Perfect. So my question is about new Trump scaling. You mentioned that you had to move from cells to what regions to scale in ultra well to scale your networking capabilities. However, in your presentation, you mentioned that you have 30 cells and each cell has on average 200 200 computes. So my question is, could you describe some of your scaling issues when it comes to neutron at that scale that you have? Because in my experience, one of the first scaling bottlenecks that we hit was with actually an arms networking and our computes scaled pretty well, but a number of flows, a number of instances, a number of routers created by our users, they basically all started overwhelming the control plane of the Newton. So my question is whether are you, whether you're using one of the standard amount to drivers for neutron, for example, into bridge or yes, or are you using some sort of custom networking solution that let you sidestep those issues and scale note on higher than what we have observed now deployment. Right. So in our case, each region has more or less 3000 computer nodes. So that is this 3000 computer nodes is what's the each Newton installation handles. But we use a very simple Newton setup to reach this. We use the Linux bridge driver. So it's extremely simple. So I don't, we don't have the issues like routers, all the shattiness that you have. However, even though we're using the Linux bridge, the Linux agent is extremely chatty. And the load that we see is not really in the AP, the Newton API servers, but in the right in queue. So the scaling issues were more related with scaling the drive, the dedicated right in queue cluster that we have for Newton. Probably if you are using a different network driver, you will, you will reach these scaling issues much more early. Okay. So if I can, if I could, like ask me a follow up question, my understanding is that what you said is that you have a fairly flat network with your new talents. And those talents don't really think about open stack in a context of networks. They don't create their own. You basically have an open stack that plugs into our machines almost directly into your, into your underlying network, right? Exactly. Okay. So that's why when you saw the regions and I saw, and I told you that we had very small regions for testing new features. Those regions were called two of them, SDN one and SDN two. Because we are testing as the end functionality for infrastructure, but it's still lovely days. Sure. Thank you. There is a question in the chat around, how do you share, share glens between the regions? Does glens has its own storage backend? And someone already provided some answer. Maybe, maybe you can, like a the burger can, can already help answering that question. And then someone else can, can reach, can complete that answer. Well, I can't, I can't add anything more than I just said, because we didn't implemented it where I work. But that's mainly the idea we have when I'm asked that question. Yeah. So your answer was that using a shared keystone with replicates with backend for glance, you should be able to achieve. A, a glens between regions, right? Exactly. Okay. So how did you do it? So we do it in a very simple way. So our storage backend is stuff that it's running in our main data center. And we set up glance normal glance, it's just once a piece running pointing to that CF cluster. set up a different lens in a different region because you are concerned about all the latency and network traffic between the data centers. Even when in the past we had two different data centers, we only add one lens. Initially we deployed the lens cache in the other data center. You can use that to speed up the transmission. But at some point we just remove that and we rely completely on the one-glance setup. So to have that between multiple regions, it's just pointing the region to the same glance endpoint. So when you do open stack and point list, you get all the endpoints and each endpoint is pointing to a region. So if you just duplicate that and the point for a glance in this region is the same like the other one. And because you are using the same keystone, it's completely transparent. So they will use the same glance in the endpoint that you specify. But again, this depends in your use case. We don't feel any need to use two different lenses in different regions because actually in our case they are in the same data center. Is there any other question? I thought we could, if there is no more question, we could like do a quick round around the table to discuss like other setup because we've had lots of details on the certain setup. So you're basically combining cells and regions. So I'm kind of curious what's the state of skating out for others on the call. If, like Chris, you just said that you were in the process of migrating to multiple cells or regions. Belmiro, I'd love to get a copy of the slides because I have other people interested in what you share with us here, if that's possible. I don't know if you just published them. Yes, sure. We will also reference them on the large scale seek page, like all the video meetings with presentations, with the recording and the deck. Fantastic. I can talk about what we're doing. So our new opens deck is taking off at Bloomberg. By new, I mean it's opens like Rocky and it's Neutron. We had other stuff with Nova Network that we're retiring. So three data center sites, three open stack installations, all completely separate, but we are seeing scaling problems already with Rabbit and MySQL. And each site is going to have basically additional what we call data holes. So they're not really separate sites, but they're physically separate buildings, physically separate power. So that seems like an obvious candidate for cells so that we can just partition compute and give each data hole its own control plane, its own rabbit, its own MySQL. And I did some research on that, and that seems to be the way. But I learned something today, which is that that's not going to solve scaling Neutron. So we're going to have to put our thinking caps on and make sure we don't have like half a solution when we do that. And the scale is, you know, approximately 1000 compute per location right now, but it's growing every week. So it's a small thing compared to CERN, but it's still enough for us to get into trouble with scale. And the population is, so our model is more like, you know, more traditional stuff, like fairly heavy VMs. We do have, you know, containers and Kubernetes is going on as well, but, you know, quite a few are just traditional Linux servers. And it's in the tens of thousands of VMs already on this, on this new platform. So we have to solve scaling either with cells or with, you know, extreme measures in Rabbit and MySQL. And I definitely prefer the look of cells because the end of the day, you know, compartments or compartments means that even if we mess up, probably at least, you know, the site is still working to some extent, rather than, you know, it's all up or it's all down when you have like a single MySQL leader serving an entire large, you know, site. So I think Belmiro's insight here is fantastic. And I think we're definitely doing cells. I don't know if we can go and stitch it all together into regions. The other thing that I found was another learning thing was fantastic is that you actually have cells for your kind of preproduction resources like testing, because we just got a whole bunch of hardware for preproduction testing. And we were thinking of making them entirely separate open stack installations with their own endpoints, their own DNS entries, their own keystone. But maybe that's wrong. Maybe they should actually just be, you know, part of the open stack at that site. So that's another great thing that this meeting gave to us. So thank you to Belmiro and CERN for forging that path. I don't know what else I couldn't share really, but we're kind of following along, I think, things that CERN has already seen and solved. It's a nice feeling. Thank you, Grace. Maybe I'll know you can share a bit on how you do scaling at DoVh. Yeah, hello, everybody. At DoVh, we are not using cells at all for now, mostly because we were eating some scaling issue on Neutron before the one on Nova. So we decided to scale using regions. So yeah, that's basically what we do. On Neutron site, we are not using regular ML2 drivers. We built our own driver on top of OpenVswitch1. So we don't have this all of the flows issues that you have between computes. The most impact we have on Neutron server is about the load that can be on on HabitMQ RPC side. Because every time you restart agent on the compute, for example, if you restart agent in a bulk, you will end up in with a high load on the RabitMQ side and on Neutron side. And that's why we decided to switch to scaling using region instead of cells at the beginning. I don't know what I can say more. That's great already. It's an interesting view that you did not need to do cells actually. Yeah, because it was mostly in Neutron. And we know that we can do more because, for example, we are not using separate HabitMQs like you do at CERN Belmiro. We are using one big cluster for every OpenStack services. And that's a thing we can upgrade. I mean, we can separate RabitMQs in order to have one specific for Neutron, at least. That's something we can do. It's not planned yet. But we don't know what will be the future because we don't want to scale region indefinitely. We want to maybe roll back or to find a mechanism to hide these regions to our clients. So for now, we don't know. But rolling back from region to cells will be hard because of the way it's done. So having a mechanism on top of regions to hide everything could be nice. But as far as I know, there is nothing magic already existing on that topic. So for now, we are just getting using regions. Okay, any question for Arno or Chris? Maybe? Well, Arno touched an important point that I would like to highlight. If you restart all the computes in bulk in one of our regions, we'll see exactly what you have served. So RabitMQ will not handle this. Even with a simple ML2 driver that we use, you'll not be able to handle this. So every restart needs to be very controlled. Yeah, exactly. Exactly. So what you do at CERN is you are restarting it very smoothly, right? Yes. Yeah, just like we do. Okay. Well, we avoid to have a bulk restart. But if it's needed, it needs to be with some care. It's interesting because it's been mentioned by several people that restarting actually triggers a lot of problems that you don't really have in regular operations. If you're like, it's an interesting topic for maybe a future. How do you avoid most of the problems around that? Yeah, we also hit that problem. So we restart compute nodes by batch to reduce those on neutral. Jean, can you share more about regions and cells at line? Yeah. So we currently don't use cells because we're using the older version of open-stack, which cell is not quite implemented at that time. We do use regions, but we actually separate the regions by data centers. And yeah, for the neutral issue, everyone is hitting. So our solution we go with it's going with an L3 data center solution. So we use our custom Neutron drivers similar to Calico. So we don't need to handle all the ARP issues or other issues when using ML2 or OBS drivers. Next, regional OBS drivers. Yeah, but it's kind of a different use case because when you use Calico, or like an L3 driver, all the VMs will be reachable, so there will be no private networks in this case. Yeah, that's all I have right now. It also seems like the choice of Neutron driver is affecting how load affects the cluster in general a lot more than I expected. And the choice of driver actually has a big impact. That's another question we should probably explore at one point. I see Imtias or Amtaz. I probably butcher your name, sorry. Can you share more about your setup? Sure. Yeah, Imtias, you got it twice the first time. Perfect. So we also looked into cells and regions, and I think I discussed with Belmira a while back. But currently we're not using either. Like even though we've become quite big, we just surpassed like 11,000 compute nodes. But one of the reasons we could have used regions, but between our data centers, there's not much connectivity. They're completely isolated. It's a little complicated to set up the inter-data center connection. So that's why we started with very simple setup. So each deployment is its own like open stack cluster, and we scaled it up to 300 nodes. I know we could probably go even further, but our limitation was more the Neutron provider that we chose, in which in this case it's the Contrail. We are looking at Calico moving forward. So that also opens up, and I know talking with other operators, like it would allow us to scale even further. Like even without using cells. So that's what we are doing. But I mean, we do see the advantage of sharing a common keystone and glance. Like I think we are noticing that our number of images we have is growing rapidly as more and more services are on morning. So trying to replicate for each cluster does that up. We are using Cephas to back in, and Cephas now maturing more within our company. So it will let us like do a lot of glance replication and having a shared glance for multiple clusters. Right now each cluster has its own glance and like the images get copied everywhere. So like one of the plans moving forward is to have a shared glance service, which like you can have different open stack instances, but you can still share the same like glance back in, which is also backed by Cephas. So that's what we are going forward with. And I think you guys already talked about neutron being the bottleneck or a decision point in terms of when you choose, when you are trying to decide whether to use cell or not. And that was our case as well every time we tried to consider like whether we could use, let's say, Contrail with cell, the version of Contrail we had an open stack combination that didn't support going to cells. So that was another point. And I'm not sure with Calico, I think I hear it's possible. We haven't explored that. So has anyone here used Calico hand cells together? We're about to try that. Let us know how it goes. Calico is very lightly integrated with open stack. I think that's a nice way of putting it. So we don't do much with open stack with Calico really together. Calico is just controlling the firewalling really at the endpoints. And we put the policy in separately from open stack. We don't use security groups. So I can't see why it would not like separate cells. But of course, famous last words, right? So I definitely will report back if we find stuff. Okay. Anyone else interested in sharing if they use regions of cells and why and or questions for people that already expressed their explain their setup? Well, I'm curious if someone is using cells V2. If it has more than one cell V2. Oh, because you were still using cell V1. Oh, no, no, we are using no cells. We are using cells V2. I said cells V2 just because because in my mind, cell V1 was completely deprecated. It was completely removed already. But if you are in the old open stack version, you don't have cells V2. Okay, looks like they are not that popular. So we would be on it already if we could have wound forward on open stack version. We're still on Rocky. We want to get to a story urgently for many reasons. And as soon as we do that, we're going to start testing cells on hardware. And we think, as I mentioned, we think we're going to go larger. It's just that we got an open stack with Rocky and Neutron and Calico working. And then there's amazing growth and load turned up. And so we've just been just keeping our head above water. So it's not that we don't want to, in fact, I'm being asked which quarter are we delivering in this year. So I'm going to be about to be doing it any second, really. So this talk has been incredibly timely. Yeah, thank you. I was just going to say we'd have done some work with Color Ansible and getting that able to actually configure multiple cells V2. But a lot of the larger deployments have been so heavily bare metal focused that we just haven't needed that kind of Nova scale out. Like there's just three Nova computes on three controllers doing fine with, you know, 1200 nodes of Ironic, for example. It's just not got that far. But in case someone wants to start writing that in Color Ansible, it's there's bits of it there. Okay, anyone else interested in sharing their current situation and questions? I had a question for the people running multiple regions. We have a few folks doing that. We get a lot of users, well, in terms of like having their application and doing like global server load balancing within the open stack ecosystem. That's quite hard to do. Or if you've got whether you want a Kubernetes cluster that's spanning the multiple regions or doing something outside, I don't know if people have found people gravitating towards a particular solution for that kind of thing. I guess, Belmour, in your case, all of the flat network, all three, all the regions are connecting to the same flat network. So, okay. Yes, they are. Just wondering if the people have hit that. Well, in our case, we took a look last year on the how we utilized the sort of global keystone because we have a global reach and having a sort of one keystone and one horizon for the users. But that turned out to be discussions with the people that if you find out how to do it, then let me know. Kind of thing. And the other thing that we had was this net Newton issue that we really couldn't find any kind of, let's say, good solution to scaling over our about 300 nodes set up at the moment. So we are still in the drawing board with the regions and cells and trying to follow the situation that what happens on the neutron front. Because that neutron gives kind of interesting day-to-day problems also. If you, for example, in our case, because we are utilizing that they're not networking and stuff like that, then guessing the quotas for the neutron elements is kind of really guesstimate. And then you have these eager users who can create 100 networks there. And suddenly you have tens of thousands of open stack boards in the cloud and the neutron is kind of having very much issues there. But as I said, we are still on the drawing table with this one. I suppose I should also share with Color Ansible, we have used that to set up the shared keystone, shared horizon with then two regions that each region has its separate kind of Color Ansible config, actually KUV config. So that should be possible. I think there are Color Ansible docs on that. Okay. It depends what tool you're using of course. If no one else wants to share, I'll conclude this by explaining a bit more what the large-scale SIG does and our current activities in case you're interested in joining. So I'll just share a quick slide here. Hope you can see it. So the large-scale SIG is a group of operators and developers and interested in facilitating running open stack at a large scale, like answering questions that operators have as they need to scale up and scale out their input stack clusters, but also try to address some of the limitations we encounter in large open stack clusters, like try to detect the bottlenecks and then instrument them and potentially push back some of those limits that force you to look into scaling out solutions like regions and cells. We are meeting every two weeks. Usually it's on IRC, but we also do sessions like this one around a specific question and kick-started with a presentation by some large-scale SIG member to try to get the discussion going. The current schedule for the meeting is Wednesdays at 15 UTC. That's 4 p.m. Western Europe and 9 a.m. Central in the U.S. But if we have substantial participation from the west coast, especially we'll probably try to rotate between multiple times. We tried that, but nobody showed up at the other time, so we stopped doing it. Finally, this is the main product of the large-scale SIG. It's a wiki page called the Scaling Journey. Basically, as you create an open stack cluster and you add more nodes, at first you need to configure it so that it can handle some scale, then you need to monitor it to see how well or bad it runs. Then you scale it up until a certain point when you have to scale out. Then, obviously, you have to maintain the life cycle of the installation, a grade, et cetera. Those are all stages in your Scaling Journey, and we're trying to list the questions and provide answers that almost everyone will have at one point. As an example, stage 3 has this fact here where we're saying how many compute nodes can a type equal open stack cluster contain. The answer is, obviously, it depends. It's always it depends, but at least it gives a range between 100 nodes and 2,000 nodes depending on the type of usage you have and the API shown that you can have. Those kind of questions, we're trying to make sure that we provide a basic answer to them and hopefully that gets people more confidence before they start their Scaling Journey with OpenStack. The main activity of the SIG is to try to document those questions and try to extract all the knowledge of all the participants in the OpenStack community. Obviously, this session was really great because I feel like I learned a lot of things on when you use cells, when you use regions, the problems it solved, the problem it does not solve, and that's exactly the kind of content and value that we're trying to create at the large scale SIG. Thank you again for everyone participating, and thanks again Bill Miro for kickstarting this series of video meetings. I'm not sure exactly when the next one will be, but I'll make sure that everyone knows when we schedule a new one. And in between those, if you're interested in joining the large scale SIG, please join our IRC meeting. The next one is in two weeks. Anything else, anyone? I think it's important for people to give their opinion and for us to understand if they think this is useful, if we should try these new formats in the future with a different topic. What do you guys think? Should we try again? We should definitely try again. I wanted to speak for about 10 seconds. So some of you guys have been to ops meetups sessions, but just to let you know, most of the team is no longer working on open SIG or has personal issues, whatever. I'm still here in Bloomberg, so we're going to try and bring something back, but I don't know when we can do that. So I'm going to try and turn up to open to large scale SIG from now on, and maybe this is my new home for these things, but I'm still working on open SIG ops meetups when the world comes back to normality, but the rest of the team probably isn't. So that's probably why we've been so quiet. In case that's of use to anybody. That's a great point. One of the reasons why we're doing these video meetings is to provide some virtual discussions around operations of open stack. When we had the ops meetup, it was a great way to share that experience, but I feel like even in the world of tomorrow, we'll have to have some kind of mix between in-person and virtual events anyway. So I can totally see how we can do the ops meetup for like having this anchor point and have all those personal relationships building and at the same time do maybe with less frequency, but do those virtual discussions as well. So that's the best of both worlds, I think. Thank you. We're over time. Thanks again, Belmiro. Everyone else, please, if you are happy with the session, make sure that you mention it in the chat. Keep it open for a few minutes and we'll let you know when the next one is. Thank you all. Thank you. Bye-bye. Thanks. Bye.