 All right, good afternoon everybody. This is the Moving to Cells talk. My name is Mike Dorman. I'm a senior systems engineer over at GoDaddy. And we just kind of wanted to share our experience that we had moving over to using NovaCells from kind of a standard Nova deployment. So there's several of these slides that are kind of just reference slides with a lot of links. So I'd encourage you just to use the short link to get to the slides instead of trying to jot everything down. So just be aware of that. So we have a fundamental problem that's how to scale Nova, like what is the best way to do that? And if you look at the operations guide, there are a few different ways to accomplish that based on what you need to scale, what your bottleneck is. And cells is just one component of that. What cells tries to do is overcome things specifically like a large number of Nova compute instances in a single Nova. And more importantly, probably scaling beyond a single message queuing system or a single RabbitMQ cluster. There's a lot of RPC messages going back and forth between these things. And there's only so many that you can effectively get through Rabbit in a certain amount of time. There are also a couple of use cases around very complicated scheduling things or doing multiple geographic sites behind a single API. So I think if you look at what CERN and Nectar have done with cells, that's kind of where their use case sits, where they've got these different sites that are geographically dispersed. And they need a way to kind of separate those things and to schedule to them independently. So specifically, Nova cells is really just a hierarchy of Nova installations. It's really the easiest way to think about it. Each one of those Nova instances is going to have its own database, its own message queue, its own scheduler, and its own compute service. So it really is an independent Nova installation, really just minus the API service. And the way these things interact with each other is they pass messages back and forth through the message queues and we'll get into the specifics of that in a minute. But that's how all these things hook together. There's a very top level cell, which is called the API cell. And that's typically where, typically there's no compute at that cell, but that's where the API lives. And that's kind of the entry point to this whole system. So that's where the public API sits. And that's how people continue to interact with OpenStack. And then that schedules down and kind of filters down to the compute cells that are below it. And if you look in the code, the way that this actually works is in Nova API, there's the default compute class there's just overwritten by this Nova cells class, which just essentially re-implements that compute layer within the code to do the cell scheduling rather than the standard Nova scheduling using conductor and scheduler. There are a bunch of caveats with this. There's a bunch of stuff that doesn't work right. There's a lot of extra things you have to do maybe to get it to work in your environment. We'll talk about those towards the end. And another thing to call it here is this is cells v1 for lack of a better name, but kind of the place that cells is now. There's an effort to go to cells v2, which is coming in Liberty. And we'll talk about that and kind of what the differences are there towards the end as well. So just to kind of talk through this, this is how an operation in cells would work. So on the downward path, you can think about someone calling into the Nova API to do a Nova boot, for example. That's going to come into the top-level API cell where the API service is running. And because we've overridden that compute class, that's now going to pass that command, that message, to this new service called Nova cells. Nova cells is the piece that handles the intercommunication between all the cells in this hierarchy. And very simply what that does is it takes that message to do Nova boot and hands it off and sticks it in the rabbit queue down at the compute cell. That filters into Nova cells, which is running in the compute cell, which then hands it off to conductor scheduler within that compute cell. And kind of from there on, it's sort of the normal Nova operation within that cell. Now coming back up, after that Nova boot's completed, and we've got to update that state to say, OK, this instance is ready to go, it's just the reverse process. So the results feed back into the rabbit queue at the child cell, the compute cell, passes it back into the Nova cell service, which goes back up to the rabbit queue for the API cell, which then filters back to the API and updates the instance information there. So it really is, you can see how these are really just completely independent Novas, and we just do the pass through at the rabbit level. So I wanted to put some links in here from some things that were pretty helpful for me in getting started on this. This is mostly for reference later. A lot of good stuff there. Pretty much any of the talks from the last few summits from Rackspace, Surin, or Nektar, I think they've done kind of a cells update, at least the last couple summits. And they're always pretty useful, because they've got a pretty good breadth of experience on this stuff as well. So for us to go to cells, we had a few goals with this. And the main driver really was to we understood that we probably were going to need to go to cells at some point anyway to get to the scale that we knew we were eventually going to hit that we wanted to get to. And we wanted to be in a place where we were able to scale quickly when we wanted to and not come up to a brick wall and have to solve the problem on the fly while things are on fire. So that was the real big driver. Plus we want to keep the rabbit, the message queues, and the database close to the compute nodes, because there's a lot of interaction, a lot of communication that goes on there. We have several different network environments in our enterprise. And we've got compute in all these different places. And we don't want a lot of this rabbit and database traffic crossing distros and all that kind of thing. We wanted to keep it very close. And of course, this is a live cloud. So we want to maintain state of all the instances there. We don't want to just nuke and pave the thing and make all our users restart. And then of course, we don't want very much downtime, if any at all, while we're doing this conversion. The basic plan, we're going to take our existing nova that we've already got out there. That's going to become our first compute cell. So that's going to become our first child cell, if you will. We're going to split the rabbit MQ cluster because we need two. We need one for that API cell on the top. We need one for the compute cell at the bottom. But we want to do this in a way that's fairly unimpactful so that we can keep the thing running and keep maintain our state during the whole process. We're then going to create a new nova instance that's going to be for the API cell at the top. There's a bunch of data import that needs to happen. And while we're doing all this, we kept the nova API service running on our existing nova installation, which again is going to become our first compute cell. We wanted to keep that running there so that people can continue API calls, continue operations against this thing while we're doing it. So as just kind of a visual representation of what's happening here, we're going to take our existing nova instances, which live on these servers at the top. They're also co-located with all the other open-stock services, so Glantz, Keystone, Neutron, et cetera. We're actually going to build new servers that's going to be our API cell and run all the non-nova services. So there'll be a migration there. And then the existing machines, we're just going to leave that nova alone and that's going to become our first compute cell. So some of this kind of goes without saying that anytime you do a big deployment, all these things you need to think about, making sure your new servers are set up correctly, there's going to be a new database for that initial, that new API cell we're going to build. So that has to be created. We have to move those non-nova services to the new machines. And then of course you always need to remember about network ACLs and making sure your DNS records and all that are in place as well. I put extra credit on here because splitting the RabbitMQ cluster isn't like a strict requirement to get cells to work. There's no reason that you couldn't just run the API cell as well as the first compute cell all against the same Rabbit cluster with virtual hosts or whatever. But again, we wanted to do that clean split so that we can keep that traffic kind of localized out to the compute cell. So what we do first, this is kind of our initial setup. We've got a single Rabbit cluster, all the services use that. Those new app servers we're going to build for the API cell, we're actually going to add those into the existing Rabbit cluster so now we've just doubled the size of a Rabbit cluster sort of temporarily. After that, we move all those non-nova services over to our new servers that we built. Now again, this is all the same cluster, we're still, all the queues go back and forth between all the machines here, but we gotta get those non-nova services to the machines that they need to be on kind of in the final state. After that, we break the communication between those sets of machines because we're gonna actually split this into two independent Rabbits. So we just did that with some IP tables, rules on the machines to completely break that communication so that essentially now we've got a split brain in the cluster. For split brain, correct. We wanted it to happen, right? And then after that, just this kind of a cleanup where on the opposite sides of the split brain we just remove the servers that were on the opposite side from that. So we went from three to six and now we've got two machines, they're gonna be three each. And again, that's just kind of a Rabbit MQ exercise there. We weren't able to do this as non-impactfully as I really wanted. There were still some weird things that happened inside the Rabbit cluster while we were doing this. So I think we still took probably 30 or 45 minutes of downtime just because of Rabbit issues. And I'm not like a super Rabbit expert so I don't know exactly what happened just that things sort of stopped and then once we got it all cleared up, it was good again. Typical Rabbit experience. Okay, so we've got our separate Rabbit cues now so that's good. Now we're gonna tell, we're gonna start setting up our lower level compute cell. The first step in that is telling it about the parent cell. So we've got this lower level compute cell that has a parent of the API cell and we need to create that relationship. There's a nova manage command to do this. You can also just populate that information directly in a JSON file that sits directly on the machine which depending on the setup may or may not be easier so a couple different ways to do that. But the important thing to note here is that we're configuring this in the compute cell in that lower level cell and what we need to do is tell it about the Rabbit servers for the API cell at the top because what we're actually doing is configuring this communication path here so we're configuring that lower level cell to understand and to have visibility and to know where its parent is so it can do this routing up sort of mechanism for the RPC calls. Other than that configuring and turning on cells is actually pretty simple. There's just a few options that you have to do to turn it on. After you've actually enabled this in the config at this point you can start up the nova cells service which again does that intercommunication and what you should see here is nova cells connect to both the rabbit cues in the local cell and that compute cell as well as the rabbit servers in the API cell because again that's the service that handles that communication back and forth. So that's kind of a good QA step there to be able to make sure that things are working correctly. And something to note here too, I mentioned that we kept nova API going in our original nova install which is now becoming the compute cell. We kept that running so that people could continue doing calls against it. Now that we've actually enabled cells in the config we don't wanna restart API because if we do, again it's gonna have that overridden class in there and it's now gonna be trying to route things to cells that don't exist. So the nova API has to stay running as is with cells disabled until we get this all done and we can cut over to the new one. Another thing to note is you wanna disable quotas in the child cells because all that enforcement happens at the API cell level. If you think about it, if you have a bunch of compute cells they only have visibility to themselves. So if you have quota set and you're trying to enforce that quota at the compute cells you're actually gonna let people have more than what they're allowed to because they're gonna be allowed their full quota in each of those cells. So that's why we have to do that enforcement at that top level cell and just disable it at the lower level. So that's our compute cell setup. Now we've gotta instantiate and build our new cell that's gonna be in charge of running the API and do that top level scheduling. This just looks like a normal nova install kind of how you would normally deploy it, install packages, whatever your favorite method of doing that is. It's gotta have a new database so we've gotta do the database synchronization get that configured. And again, the nova config here is gonna point to the rabbit cluster that's at the API cell. We're not gonna connect it directly to the compute cell because it's that nova cell service that handles doing that piece. The enable options here are pretty similar. Obviously our cell type now is gonna be the API cell because this is gonna be our top level thing that handles that master routing at the top. But we don't wanna actually start the services yet because we have a bunch of data that we need to import to be able to save the state and to know about all the instances that are already out there. And again, we've gotta tell nova cells about the other cells in the environment. So this is very similar to the step before. Now we're telling the API cell about the child compute cells that it's gonna be responsible for below. And again, these are gonna point at the rabbit servers for the lower level cell because that's the communication path we're configuring, which is this guy right here. So this is what enables our routing down or message routing down to happen by telling the API cell how to get to the compute cell below it. Now we get into all the data transfer stuff which is pretty interesting. So if you think about, we've got this nova instance that we've had running for a while and it's got a whole bunch of VMs and a whole bunch of key pairs, the flavors that we've set up, all that kind of stuff that's associated with Nova. All that stuff is now at that lower level compute cell and the higher level one has no visibility into that stuff because it's not in its database. Unfortunately, really the only way to do this is just to do like a manual SQL dump and then import it into our new API cell database. And this kind of gets into the state transition stuff where we actually need to turn off the API that we had running so that there can't be any changes going on while we're doing this import because otherwise we're gonna end up with database inconsistency out of the gate which is gonna be a bad way to get going on this. So literally what we did is a MySQL dump pipeline to a MySQL command to just pull it out of one database stick it in the other. This is the list of tables that we had to do. There may be some others that are necessary depending on what you've got going on. So for instance, we use Neutron so we don't use any of the Nova security group stuff. So unsure which tables exactly you have to get for that. Similarly, we don't use Cinder yet so we don't have any volumes in the Nova database. So there was, I mean, if you use Cinder you're gonna have to bring those in but that table literally has nothing in it for us. So this is one of those things that if you're gonna do this you have to test it and just make sure that you get everything that you need imported into that higher level database. So that's all we need to do to get everything set up. Now we can start get the services going and now we've got kind of everything flowing. On the API cell there's only a few services that actually run there. Obviously Nova cells is gonna run everywhere in the setup and our API cells where the API is gonna live. Additionally up there you've gotta run all the different like console and serial, proxy type services that's where they belong. Pretty much everything else goes in the compute cell so that's conductor, scheduler, compute obviously, Nova network if you've got that. There are some people that still actually run a Nova API instance in the compute cell as well and that helps you in a few different ways where if you think about if you don't have an API there you have no way to really interact with that Nova instance that's running the compute cell. So there are a lot of times where you might want to see what state the compute cell thinks about the world so what state it believes that the VMs it has are in or anything that you can think of that you might wanna look at. That's a good place where you might wanna keep no API there so that you have sort of at an admin level a place to go into that compute cell and see what it's doing, see what it's got going on. So I mentioned earlier you wanna test this stuff. I mean if you look at the code it still says that Nova cells is considered experimental so be careful and everything. So you can't just expect to kinda double click install this thing and start it up and have it go. You've really gotta test this out. I think we went through two or three like dev test staging environments for us before we actually did this in production because there's a lot of little gotchas. That's where I was able to find a bunch more database tables that I needed to import for example to keep that state going and had I just done it live it would have been a huge problem. So some of the other caveats that we've experienced the first one it's kind of the most I don't know the most important but it's not so obvious is that the neutron VIF plugging notifications that come from neutron to Nova to say the VIF for the VM is now plugged in and Nova you're good to go to continue doing that. If you think about where these things are running Nova computes down here in the compute cell neutron is still up here in the API cell and those are totally separate message queues. So there's no way for that notification from neutron to get down here to this note where it's just there's a fundamental disconnect there. So to get around that you can essentially tell Nova that the VIF plugging stuff you know just assume that it's gonna work after some time out. That's really the only way to make this work at all. I mean it does introduce this race condition because now Nova is assuming the VIF is gonna be plugged and ready to go in five seconds or 10 seconds or whatever time out. Yeah I know. So that's one piece and that's how we've gotten around it and there's some other patches that are in play that address this as well. So in addition to the neutron notifications piece I mean really anything that before was doing notifications between services or between Nova and something else aren't gonna work because again now you have this fundamental disconnect from the message queue perspective where things are running and where these things can communicate with each other. There's also this circular reference bug that happens in the Nova CLI where you do a cells list command. There's a bug for it and there's a patch. I don't think it's been merged or anything but I was able to just pull that patch into our side and it seemed to fix it. Not really a huge thing unless you really need to be using like the cells list command a bunch for some reason that wasn't really a huge problem for us but that's something that you can look at if that's an issue for you. There's also this thing with console auth where you need to make sure that that service knows that you're running under cells and if you look at Matthew's blog he's got a good write up about that. He's the one that actually figured that we weren't, that wasn't something that affected us because it was, we already had it enabled because it was running on the same machine so just something to note there. Probably the biggest caveat with cells at least in the state that's in now is so some of the objects within Nova are not gonna be cells aware so the instances obviously are cells aware because fundamentally that's what we're doing is we're distributing instances among all these different compute cells. Well, a lot of the other things in Nova just don't know about cells, they don't know how to do that. So flavors, server groups, key pairs, all the host aggregate and the vulnerabilities and stuff like by default none of that works with cells really at all. Block devices is another one to call out, that's another one that Matthew had has done a good write up about. Security groups also. I don't know what the solution is to that because again we don't use it, we didn't have to solve that problem but it's certainly something that you need to be aware of and think about it if you're gonna make this jump. Specifically on host aggregates and availability zones those are pretty important to us and fortunately the guys at Nectar had already solved a bunch of that stuff and we were able to pull in a lot of their patches and for the most part we were able to just drop those in and they worked great. So I had to get my meme in here and say thanks to Nectar and especially Sam for working with Chris Lindgren on getting that work and that really helped us out and get past that functionality that we still needed. A couple of the random things that I would call out, we get these Nova cells messaging errors where the UID is null in the SQL statement. I've never been able to figure out exactly what this is or where it's coming from. I spent a bunch of time one day like trying to track it down and really was never able to solve it but it also doesn't really seem to cause any problems other than we get errors and log stash occasionally. So don't really know what that is but if you see it, you're not the only one I guess is the point. Database consistency between the API and the compute cells is a huge thing. So if there's ever a communication interruption that would disrupt that RPC calls that the Nova cells services are making back and forth, you can end up with a bunch of inconsistencies between the databases because remember you've actually got databases at both levels at the compute cell and the API cell. So you can imagine a case where maybe there's a Nova delete command that comes in through the API, gets down to the compute. The compute's working on executing, removing the instance off of the compute node but in the meantime, there's some kind of network issue or some kind of breakage that cuts off the communication to the rabbit back at the API cell. Now that delete finishes, Nova compute sends the RPC response back saying, hey, this is done, it's really deleted but that message never makes it back up to the API cell because that communication's broke. Now you have a situation where the compute cell says, okay, the state of this thing is deleted and it's really gone but the state of the top level API cell still thinks it has this task state of deleting and still thinks it's waiting for that to happen. And you can imagine it's like if that sort of situation happens a bunch of times then eventually you just end up with this inconsistency that continues to grow and grow and grow and grow and grow and eventually you get to the point where you just have to go back and kind of reconcile things. So that's another place where having Nova API running in the compute cell might be kind of useful because you've got a fairly easy path into the compute cell world to be able to see what it's doing and what it thinks about things. But just something to keep in mind when you're doing this because now, I mean, we're in this place where we really have state in two different places and it's really important that those stay in sync. So that's all the cells V1 stuff, what's coming in V2? This is essentially a complete rewrite of the cells feature in Nova as I understand it and it's gonna be, basically it's gonna turn into cells is gonna be the default mode for everything. So even if you're not running cells or you don't think you are, once we get to cells V2, really everybody's gonna be running and you might only have one cell and it might be like on the same box as all the Nova stuff but code wise it's all gonna be, everybody's gonna be using cells. It's gonna completely get rid of that special Nova cells service and now the cells, scheduling the cells, routing all that sort of stuff is gonna become part of the default compute driver. So it's not gonna be like this special other thing that you enable and it overrides everything else and it's completely different functionality, it's just gonna be the default and that's what it is. Another difference is in V1, what we're doing is calling into the individual rabbit queues for all the cells but we still had separate database. Well in V2, it's gonna include knowledge about the databases for the compute cells as well. So now instead of having this sort of master API cell database and all this data up here as well as state and the database at the compute level, now Nova API is just gonna call directly into the database and the rabbit queues for each of those individual cells and we're back to a place where we have a little bit more sane location for state where it's just in one place and not kind of two different things. So there's a wiki and kind of like some other basic links there for that project, I haven't been following it. Too closely lightly but I know they're targeting right now is that for this to land in Liberty is kind of the experimental level type of implementation and then the thought being that for the M release it would be kind of ready to go and ready for everybody to turn on. I have to cough, do you wanna mute it? Thanks. The transition from if you're not running cells to V2 which is the default on mode that's gonna be pretty seamless. I know they're really looking at how that's gonna look. It's a little more unclear what's gonna happen for those of us that are running in V1 mode and how we're gonna get to this V2. So that's something that we've been talking about like in a large deployers group and some other forums where there's several people that are actually using cells so not exactly sure what that's gonna look like. So kind of the lesson to take from this is unless you really have a burning need to move to cells right now you're probably better off just waiting for Liberty or the M release when it's kind of the new way of doing things and gonna be a lot cleaner, et cetera because it's probably gonna cause a fair amount of pain to do another migration later. So all this to say like if I could go back a year and tell myself some advice like I would say just don't do any of this at all and just wait because we're running on cells now but we're not to a place where we're beyond the scale of what a single Nova could do. And I think we're probably gonna be pretty safe for another six months to scale wise. Like we're not gonna exceed that in that amount of time. So in hindsight, it would have been better for us to just wait for the V2 stuff to come and then just it gets turned on just like everybody else is gonna see without really any major migration or issue there. So that is my extent of cells knowledge. Hopefully that's useful or important for folks. So if there are any questions, I would be happy to entertain them. And you're supposed to go to the mic which is in the middle here, right? But the question is how did we orchestrate it? Like how did we actually deploy the changes, et cetera? It was, so we used Puppet for deploying everything anyway. So there were, we updated Puppet to kind of be for the final state that we wanted to be in so that after everything, for the most part kind of all these intermediary stages that I walked through, all that was fairly manual. At the time, we only had one cluster. So it wasn't like this huge scale. Probably had to do it in 30 ohm or whatever. I think I made like some ad hoc Ansible playbooks to do some of the stuff. So I knew it would be consistent across all three, that kind of thing. But for the most part, I'd describe it as this manual. Yeah, why don't you go to the mic? You're pretty close, yeah. Dave's always a rebel anyway. So instead of using cells, you can also create different regions. Why using cells instead of having different regions? And once there, Horizon and Kiston. Right. I think the reason to not do regions for this case, I think would be, I would describe as like, we don't want regions sprawl. So all this that we were doing is in one, one physical data center for us, what we define as a region. So we didn't want to create extra regions kind of for no reason, I guess. Does that make sense? Any others? Hecklers, Dave? I missed the opening, I'm sorry. Are the cells that all expose to the users or is this just an operator concept? So his question is, are the cells that all expose to the users or is it just an operator concept? It's totally invisible to the users. I think, I mean, I don't, probably you got like, these guys that are doing it for different sites and stuff, obviously it's exposed somehow that way. But for us, and that like just kind of default mode, it's totally invisible. You don't even know what to do. The question is, what do we expose to users? Okay, yeah. So we do it via availability zones is what is the abstract concept that people see. Nothing, nothing. Okay, thanks everybody.