 Hello people My name is Samuel. I'm a cloud engineer at second cloud working in computer-related topics and I'm Maxim. I'm also working at a second cloud, but in the connectivity team And today we tell you about our story from Queens to yoga in one migration, which definitely did not not take only 20 minutes But shorter first a few words about us. Oh nice PowerPoint works very good Ah nice and the second cloud is a part of the Schwartz group and most of you guys coming from Europe Maybe know the Schwartz group at a Schwartz group consists of many Production facilities baking goods ice cream coffee. We do waste management and recycling But most of us know us for our retail brands. Basically. We are part of the Schwartz group is coffee and a little But we also are also a cloud Okay, and I'm going to give you a little context on what we were doing at the beginning of April last year So we migrated our deployment tool for OpenStake from our old provider to our own written open-source tool called Yauk and now we have a nice config management Also, we have a nice lifecycle management We have the possibility to upgrade our OpenStake releases with our new Life's move with our new tooling, but we didn't upgrade our OpenStake release when we migrated up our tooling So we are still stuck on OpenStake Queens Now the big question sure why should we upgrade and how should we upgrade? Why is pretty easy? We want lots of new features Especially compute features. We want AMD SEB. We want virtual TPMs UEFI boot all that stuff we want to Leverage all the bug fixes which were upstream which are definitely not back-ported to yoga to Queens anymore So at the start of that thing train was basically the latest release at the oldest released Supported and also we want to contribute our stuff upstream, but it's very hard to contribute to Queens EOL branch to be honest And then we thought about okay. We need at least eight updates to be somewhat relevant Do we really want to upgrade them all at once? And one thing we assumed was that the combined downtime of doing every single update step by step by step Is a larger downtime do some unplanned stuff for the customer instead of doing one fixed plant migration for every customer? We also will only upgrade a Nova Neutron and Cinder, but I will follow up on that We do it project by project base and we also need to guarantee the customer that there is no OpenStake resource UID change because all the cloud tolling like terraform and that stuff would fail and that's on that case and we also and Thought about a downtime, but we will also talk about a downtime later And while we are on it, we also have the possibility to change some stuff We can change ML to OBS to MLT OVN back-end which we also plan so we combine those two changes And we have one key benefit is we don't need to do a storage migration We basically have the same storage back-end there. There's no changing the storage back-end And short overview how our Clusters look like and we have a central services cluster where we mostly run shared services like Keystone, Glance and Panko They are already on the okay to be honest Panko is not on yoga because I think Victoria's the latest release it even supports And then we have our old cluster running Nova Neutron and Cinder which connects to these central services Which is running Queens and then we set a build a new cluster and Placed it there and also be added barbecue in there because we need it on the yoga release and Our trick here is in the migration that the user shall not see From what cluster his resources are served. So we build a little component called the project routing reverse proxy The project routing reverse proxy is written in Rust And it imposes all the public endpoints for Nova Neutron and Cinder So we have also the option to set a Keystone project flag which then enables maintenance mode So the customer cannot change anything. It just returns a 503 for the customer. So we can have a little freeze time for the customer and We also in the beginning we thought about using keys for keystone endpoint maps But the problem here is about administration. So anyone who has more permissions Spanning the border of a project. So everything using our platform services our Kubernetes engine cloud foundry all that stuff that does in For the user some stuff, which is bigger than a single project That's all our central portal and that's that's that's why we built this project routing reverse box And it was our first dive into the Rust programming language, which was also interesting experience Yeah, and I'm going to give you a short overview over what need needed to be done for for this migration to happen. So first of all, we started in June last year with Enabling yoga on our Yauq tooling and afterwards we migrated glance and keystone step by step first because that's was really nice and then in September we created our second open stack cluster and In January this year we then deployed our project routing reverse proxy So the user couldn't see on which cluster his resources were Now let's have a little bit of a deeper dive into how a migration itself works So from a user perspective and we don't use horizon as a UI tool as we also serve many platform services Which is the integration of self-built UI way easier for us So there's the so-called stack at portal which is basically our UI approach and in this per project of a customer a project Is also the common nominator for how resources are grouped in this portal as well So in a there's also a one-to-one mapping between a project Whatever you call it in the UI and our open stack project So per project we choose or the customer has done in the GUI the option to choose a maintenance window We served 14 time slots per weeks, which are free hours each We but in this time slots this means just three hours in this three hours There would be a migration normally we guarantee the customer them the downtime of its resources. It's less than an hour There is seldom a time in which we exceeded seven minutes of downtime of total downtime from Servers are down servers are up and running on the new cluster So this is we said yes from a political standpoint one hour max But I think we never had one that is came even close to one hour But be honest so like seven minutes is a long time already, so and we Said like hundred time hundred migrations per time slot just to keep a limit for us every Friday we fetched all this data and Did some pre-filtering there's a blog list and where some certain projects were listed which we should not migrate But sometimes the user just plucked them in Then we do some sanity checks. What are the resource constraints? Do we need to need to move some hyper visors do we need to add new hypers? What are the aggregate states and how they are used and then we deployed a migration cron job in it's a Kubernetes cron job and it has basically the selected time of the user in the following week and So once the scheduling was done We basically had just to check if there an alert would coming in that for some kind of reasons in the migration Which Maxim will show you on the next slide. There would be a failure Okay, so how or what does the migration script actually do it's mainly it's Three parts the first part is the pre-migration and the pre-migration is basically checking the project existence and the region So if it's still on the old cluster and then we run some sanity checks and then we enable the maintenance mode in our project routing reverse proxy why we are setting the maintenance flag on the project itself And then we start to shelf off all the servers and then disabling all the routers so that's also the point where the actual downtime of the customer starts and now we make the actual migration part where we take like all the relevant tables from the Nova Neutron and Cine databases and Fetch them and then why we fetch them and try to move them to the new cluster We also do some magic for example We changed the volume service you IDs because Cinder is using The service you IDs to know which volume backup snapshot or else is located on which service So we look those up change those we also convert some OBS stuff to OVN stuff. For example, we change the networks we are just some ports for example, we change the device owner of the router The internal router ports and also we do all of these scheme updates manually so if there's some change we also apply them and After this is done We basically are in the post migration the post migration consists of updating the project to the new region Also, then we run the OVN DB sync to create all of the resources that are That need to needs to be created in an OVN setup We also patched the OVN DB sync to be ran project-wise so we give it the project ID and then it runs just for the project if it's possible and after that we enable the routers and unshift the service and Then we disable the maintenance mode and after the unshaving of the service basically the downtime for the customer stops and to be honest like depart from Shelving the service and to to the unshelf usually took like two minutes three minutes But the bottleneck was definitely the OVN DB sync which was with high load taking a lot of time and then yeah Took some time Why we are at it? We also adjust the public subnet allocations because we duplicated our public subnet Because we wanted to keep the same subnet ranges and we needed to free up the IP on the new cluster and Block it on the old cluster. So if you for example delete your floating IP you can still recreate it in the new cluster and That's basically all the migration script now the migration timeline so we started with our own projects and some dev QA and testing stuff which was just a billing label and And yeah, so we started with those in January and February and then on March we Set the project routing reverse proxy to route all the new projects to the new cluster. So All new projects will be served on a new cluster Afterwards in April we did we did the migration of all the planned customer projects and after I think we had six weeks of planned customers customer Migrations we did all of them by ourselves afterwards and right now We are still waiting for different past projects to migrate their own stuff because they have some complex cross project setups which need to be migrated by themselves and Now probably an interesting part what went wrong during all of this time so one big pain point was definitely human error and communication because For example, if you see in the meme our portal labeled a project as Testing so we assumed it was just for testing migrated it the portal had a short downtime But afterwards it worked fine and then yeah, just a lot of human error mistakes But yeah, the second point is we dust our We dust our new turn So we had a testing project basically a CI Which was running around 200 VMs all in the same subnet all in the same project So we unshift them all at once and then neutral and database Got stuck a little bit because all of the port bindings were tried to be created at once So the database locked them locked itself and then I don't know after how many but after some the database Just exceeded the max statement time and all of the VMs were like in a very weird state And needed to be cleaned up afterwards and we also introduced an unshift limit afterwards So we just unshift like 20 VMs at once The next point also OVN at scale. So we migrated to OVN and then had a very low very Sorry very large scale But for this we have a different talk. I think tomorrow from Felix He's going to tell you all about that and Then the last point is the new machine type Which is a very generous Thing about all the compute stuff which basically broke lots of stuff. So I will dive a little bit deeper into that so the point here is we We move basically from QM 211 to 62 at once and delivered 4 to lip 8 Which is a huge step and we also decided to implement some stuff like we change the default machine type for every machine Which is like the virtual motherboard of the VM if you can imagine that and 235 works a little bit different because volumes are now attached using PCI The problem is the default configuration of PCI lanes per virtual machine is to but we promise the customer You can attach up to 26 volumes that led to some issues, but it's technically only a conflict option Nova So that's that's the easiest part of it Then we started with Kuma 62 if you are a little bit familiar with the PCI spec The PCI spec says if you want to detach a PCI device, you need to press an attention button It's nice. So we can signal the OS, please Power off this device whatever But the point is you also need to do this for a virtual machine So you need to press a virtual attention button Luckily Kuma does all this for you, but it's broken in 62 So you was you were never able to detach a volume You attach the volume and Kubernetes which runs on top of our infrastructure service moves lots of ports with persistent volumes across So it detach and attaches lots of volumes, but the virtual attention button was broken So the fix is migrate to QMOS 7. Okay, we migrate to QMOS 7 but QMOS 7 itself It's no longer capable to live migrate from a mixed platform of QMOS 6 and QMOS 7 because there's some random RNG device missing, which is some feature May never figure 100% out why but the fix is migrate to QMOS 7 too And this this technically works So you can now migrate from 7 0 created VMs Basically processes running the 7 0 binary and process running the 6 2 binary. You can all of them migrate to 7 2 Yeah, lots of live migrations in this case Yeah, but live migration is a very good point. This is the next one We found a very nice bug in Intel CPU based notes is mostly triggers if you migrate between machines with different speeds of CPUs It's a very nice bug. So these are basically the bugzilla ID So if you want to look it up, it's it's these are horrible bugs and One problem is here. It's a kernel bug in this case So you could not just exchange the binary and say, okay every live migrate to VM is now created with new QMOS binary We run Ubuntu 22. So we are normally running the 515 kernel. It's fixed in 519. So basically We had to redeploy every single goddamn Intel based compute node at once Which is there around 400 at that stage. So and combining all of these compute based Side side stories This took us around two months to figure that out while the actual migration we mentioned before was completely running at that time So lots of interesting stuff we found out here Yeah Now I will do a little summary of all of the migrations and so we migrate over 4,000 VMs and that time frames and now we have 6100 I think we are even more on yoga right now 6600 Farico correctly and also migrated like 3,600 projects and during that time We also moved over 350 hypervisors from the old cluster to the new cut cluster to provide all of these compute resources because we just don't double our old cluster and Yeah, we still have 5,000 VMs remaining on Queens, but those are from our platform services Regarding Cloud Foundry and yeah, now the big question should you migrate instead of upgrading? I guess probably not if you are you For us, we have a pros and cons list. So our pros pros list Contains of the OBS to OVN change so we could OBS to OVN needs a downtime So we could just combine both both down times to one downtime, which was a great benefit for us also, we don't need to do any any storage migration which Elsewhere else would have taken a lot of time and also now we are on the newest OpenStack release Which is yoga, which is not the newest but at the time we are planning. It was the newest So we also skipped like say Seven or eight releases at once which was very nice the cons you need a lot of effort So we need the project rotting reverse proxy to manage all of the requests You need the second OpenStack cluster for all of this to happen You need like all the development and the migration script and so on and also human error Like I said before with the portal labeling their project as testing You just need to be careful because there's always going to be in human error in the in that case and You also have a downtime so we could combine both down times to one But you still have one hard downtime for the customer and that's why we decided to do it So we are now at the end We have a talk tomorrow by Felix about running OVN at scale, which is a very interesting topping Also, we mentioned yahoo yoga is an open source Open-stack life cycle management you fully leveraging the Kubernetes operator approach down to already also Contemporizing everything which is on Thursday by Robert and Stefan and we have also some resource if you want to have some Look at the cloud also join us on IRC if you want to So we are basically finished now with our talk we are up for questions and thank you for attention Good question. The big problem is the migration between OVS and OVN which definitely requires a downtime And in the end we would need a downtime for this migration anyway and knowing that Yes, we still would do it again I guess maybe would not use the project routing rearspox in rust to be honest But that was also a fun story to know that but we would do it again to be honest Because the benefit of jumping eight releases plus the migration of the whole network back-end made it very appealing to us That is but only if you combine both topics together to be honest. Yes Yes Definitely not a migration So because the network back-end is now finished yahoo supports upgrades So we are running yahoo supports. I think around 10 open-stack services We are using six keystone glance Nova Neutron Cinder Barbican and ironic which is the bare metal deployment And and most of them run already on set So we will just do the in place upgrade which is basically update in control plan and Empty up the hypervisors and replacing just the containers like moving from an OVN agent container of yoga to set Takes a little bit of time, but the yahoo approach does this basically automatic for you So we will download now do once these 5,300 VMs of the platform services are finally migrated Which we don't have any hands-on Are done. We will also decommission the old cluster But yeah, we are currently already working on the Nova migration to set and the neutral migration So we will not just do regular updates because now the network back-end hopefully stays on OVN Who was next I think there was here Okay, another question Yeah, sure the patch about the making a project scoped. I Think we discussed that with the neutron guys, but Maxim do you know anything otherwise then here's the guy who did it Luca, do you know anything? The question was will we publish the project scope OVN DB sync patch upspin So Yeah, so it's basically just removing all of the lines which is Admin scope and then for something like I think external router ports, which is not project scope You need to give it a admin context. So it still runs on admin context on some spots, but Yeah, we just removed the context We removed the admin context and gave it a context Scope to a project and then just adjusted it to an admin context if needed and then removed all the lines Well, which which are not needed for like project scopes So I would summarize it if we can definitely publish this but no one will accept that in that current state So if there is any request for that I guess that's something you could ask tomorrow at the OVN talk as well because that goes way deeper than we did This is very high-level approach besides the compute stuff, maybe But it's the answer to the OVN DB thing and it's also just we did it only Because running the whole OVN if we migrate every project and then run the whole OVN It just took too long and that's this downtime is because this creates the ports and all that stuff in the northbound database This just is the downtime for the customer That's why we went ahead and made this this hack to made it project scope just to reduce its load a little bit Now it was not the question answered so far. Okay So You talk about like like this basically like I'll set up so The upgrade itself that is something also which is explained on Thursday in the yahoo talk, but just to summarize it We have everything is containerized The computer itself is also containerized so we have a container which runs and no one compute and runs lipwood We use some hack like gd bus to spawn the lipid process on the host space So if you kill the container the VMs keeps running. It's a hack we stole from the open-stack helm It's a very good hack, but it was for c-group v1 and so this is how we do it in the hypervisor just Making it empty the idea is always to live migrate everything away and then replacing the containers to an in-place upgrade of the container So you basically At some point you had some Compute nodes that they didn't connect to any What I'm trying to understand is that so you have a control plane in Queens. Yes So these are these are two separate control planes completely separate these are so there's a nova control plane here It doesn't even know that there's another control plane there and both are read registered and keystone You're just using different region flags. So these are completely standalone control planes So these clusters these we call them customer-facing clusters. They don't have anything in common at all So it is basically just having two regular environments running one with yoga one with Queens Free at least free because the central one is also running But yes, yes, the idea is really take everything from left fixing it up and moving it over It's like on let me quickly go to the slide It is basically Doing this. So this is basically this migration stuff The migration script runs on the new cluster and it connects just using an SQL port to the Queens cluster Fetch us all the tables it needs and then puts it into the yoga database That's why we also do schema updates and all that stuff. Otherwise the data would not match That's just basically so it so maybe we glossed over that sorry for that because it's a very high-level Approach but the idea is we take the one existing Standalone environment take all the relevant things out of that and put it into the database to one if you now go to the old Cluster and do a open-sex server show or so well Server list you see around 6,000 servers and stage shelf offloaded We don't touch them anymore because we also we cannot touch them because if we do that because we have a shared back-end in Storage if you would delete them some Cinder would delete some volumes which are used by the other cluster So we don't touch that anymore So that's basically how we do it. The idea is doing it We also thought about doing like this OS migration There's a script which basically removes everything and recreates all the resources But the problem is that we want to keep the UIDs and the only thing you keep the UIDs is by populating the database by yourself Sure that works. Yes Yeah, it works pretty good. They're pretty easy. So the keystone of glance for the Queens ran about a Quarter of a year at least or half even longer half a year and this this works pretty good So if everyone understood the question we wouldn't so because we have in this slide beforehand We have this keystone and glance running here On yoga and there's API come compatibility between the Queens environment and the yoga and the answers Yes, that works pretty good. They're pretty standalone Yeah, so at the moment also Glance is right before so we have for clients in yahoo Can I build the upgrade to set and to antelope and at some stage in the next weeks? We will just press the button and then it will upgrade itself Sure, so you're using now For your Kubernetes, but is that managing Kubernetes on bare metal or do you have some other layer there? And so what are you using for? Okay The approach is the yahoo approaches. It's called yet another open stack on Kubernetes So every open stack service is running on as a Kubernetes. So it's bare metal Kubernetes. Yes You could also run it on a VM. It doesn't matter. You just need a Kubernetes There's also glue code in there Fetching stuff from ironic deploying that using config drives joining the Kubernetes cluster But it is just bare metal Kubernetes and a bare metal Kubernetes worker is basically done a compute node or a gateway No, whatever you label it because the operator watch on these labels And then you say a you're now an OVN agent and then you get an OVN agent compute port and whatever you need So this is an elaborate just bearman Kubernetes is so so you're using just vanilla bare metal Kubernetes or using Yes, it's it's QBDM Kubernetes. Yeah, it's really vanilla in this case I'm not sure if I miss her to if you glossed over there Are you saying that the bare metal Kubernetes is deployed using ironic like a standalone ironic? Yes, okay So ironic is the tool. It's also a separate cluster. It's not mentioned here. It's a separate cluster here The it's also running. It's separate keystone. It also runs on set currently ironic set keystone set And it just deploys the nodes and then there's in the notes There's a conflict drive script which basically runs a whole lot of stuff installed software and in and it's at its last step It fetches some joint token from Hashi vault and then joins the cluster But that's very out of scope in this case But we can have a talk about it later if anyone is interested are there any other questions If not, then thank you very much for your attention. Thank you