 No, yeah, okay, so I guess my slides aren't showing up. I don't know if that's on my end or what's going on there, but hold on, see if we can get this sorted out. Hello? Okay, sorry for that. So I guess I'll start. I'm Andrew Lasky. I'm a core developer on Nova. I'm one of the people leading the effort for NovaCells V2. I'm one of probably five or six people who are kind of key to this effort. There's a whole lot more who are involved in the effort, but I'm going to get up and talk about it. So NovaCells V2, what's going on? I'm glad you all asked. So I'll start off from the very beginning. Let's just talk about what are cells real quick. Cells are, one, a strategy for scaling. There's two things we may want to scale here. There's the database and the message queue. So if you think about Nova today without cells, you might have a single deployment. You might have 1,000 hosts in it that may overwhelm the message queue and may overwhelm the database depending on what you're trying to do with it. So what you can do is set up cells. You end up with a little bit more complexity in your deployment. You have this API cell at the top. You have a database and a message queue up there. If you have cells be one, you have a message queue up there. Then you end up with two cells underneath that may have 500 hosts each. That means that the database there, the message queue there, only need to deal with those 500 hosts. So you've reduced the amount of traffic and load that they each have to deal with. So that's one part of what cells is. It also provides failure isolation. Let's say you were to lose a database without cells. You lose the database. You lose access to all of your instances. You can't query them with cells. If you were to lose a database or a message queue within a cell, you can still access everything in any remaining cells. So you do, you get some failure isolation there. It is also an optional grouping mechanism. Some of the cells be one deployers. What they like to do is put all of the same type of hardware in each cells. You might have one cell that's purely bare metal nodes, one cell that has SSD computes in it, one cell that has spinning disks. That's something you can do. It's not required to do that, but it might make capacity planning really nice. It might make failure isolation really nice. There's some benefits to doing that. Another thing you can do is scale out deployments by adding a new cell. So you can take, you know, a whole new cabinet of computes, stand them all up, kind of deploy everything except the API on top, test them out, run them through QA, send builds there, make sure everything works, and then you plug it up into your production deployment. At that point, it becomes accessible to all of your users, but you also know that everything works. You've tested it beforehand. So that sounds great. There's already a cells v1. So why are we doing v2? Talk a little bit about the v1 architecture. It works sort of by capturing messages and replaying them. It tries to do this in a very transparent manner. It's a really great scheme, but it does lead to race conditions. It was kind of implemented in a really cool way, but there's just inherently race conditions and how it operates because it's not built into the code base in a core way. So that's been a problem. There are two levels of scheduling with cells. You get this top-level scheduler that has to pick a cell, and then once you're within a cell, you schedule as normal with NOVA. The problem is the top-level scheduler has very limited data and has course data. It only knows that cell A over here has these flavors available, or this number of slots for this flavor available. So you can't do anything like affinity with it. If you have a host in cell A and you want to build a new host and say, I want it close to an instance, and you want it close to that instance, there's no mechanism for doing that. The cells, top-level scheduler, doesn't have any information that would allow you to do that. So there's certain things that aren't there. It's also lacking some of the other basic features that NOVA provides. Security groups is a big one. People like host aggregates and availability zones. Again, that's kind of tied into the information that the top-level scheduler has. And just in general, there are other scheduling options that aren't available. It's also bolted on. And this is probably one of the biggest issues with it. Few deployments use it. Mostly larger deployers use it. Those who have resources, they can devote to kind of holding internal patches to make it work. A lot of deployers are not, or a lot of developers are not familiar with it. That makes it really hard. One for people who are pushing patches into cells code to get reviewers to get that code merged. But also people working on NOVA in general, they'll add a new feature. They'll forget to add it for cells. That's part of why we have all these features that don't work for cells. There are also a lot of bugs. There's a lot of testing things around it. And we just, there aren't a lot of developers who can really work on it. And that's a problem. And it's because it's kind of shunted out of the way in the code base. It's for the most part very transparent, but also very hidden. It's also difficult to upgrade from NOVA without cells to NOVA with cells. So if you start your NOVA deployment, and at some point you're like, you know what, we're really starting to scale. Things are going great. I would like to now run with cells because I need the scaling ability that it has. There really is no path for doing that. If you want to run cells, you really should have started with it from the beginning. But it's also very complex, so we don't recommend that anybody starts with it from the beginning. So you kind of get into this weird situation. So the lessons learned, probably the biggest one is don't duplicate data. There are a number of race conditions that cause real problems for people. It's really hard to keep just instance data in particular. There's a copy of it in the API cell. There's a copy of it within a cell when there's task state changes, VM state changes. You get messages coming down and messages going up at the same time and things just get weird. Everyone needs to use it. It should be the main code path. That way developers are familiar with it. Deployers are familiar with it. You know, it stays well tested. That would be really great. The parent cell for cells v1 still doesn't scale well because there is a copy of every instance at that top-level API cell. There's no path forward for that. If you got to a million instances in your deployment, you'd still probably hit your message queue issues for the API cell, your database issues for the API cell. So the v2 architecture, we're starting with some simple principles based on the lessons learned there. Probably the biggest one is data only in one place. We don't ever want to have to duplicate data and therefore we don't ever want to have to keep the data in sync. If we keep it in one place, it's always what it should be. There should only be one way to deploy Nova. So v2 is going to be the way to do Nova. And we want as little global data as possible in order to keep that top-level API database very small. Let's keep as much as possible within the cell. So a little more about each of that data only in one place. One of the things we're doing to accomplish this is we've stood up this new Nova API database. Any of you who have deployed Metaka have run into this. You're well aware of this by now. So that lives in the API cell. That will store data that we consider to be global, like flavors, right? If you go to boot an instance, you have a flavor. That flavor is not specific to any one cell. You know, theoretically that flavor could go into any cell. So we keep that data at the top level. There's going to be other data that's up there. We don't have that all sorted out now. We're working on that in Newton. We're actually going to discuss some of that on Wednesday in the design summit. But anything that we can think of as global is going to live in that top level database. We also store mappings to cells, instances, and computes. I'll get into the architecture a little bit later, but basically because instance data is not replicated between the API database and the cell database, when you make a request for an instance, we need to know where that instance data is. And so we have a mapping that says, this instance lives in this cell, go query that database, get it, and return it. Only one way to play out. So it's basically Nova now powered by cells. No more decisions. Should we use cells? Should we not? We actually probably should stop even saying cells at some point and it's just Nova. So and all features will work. That's going to be a big one for people using V1 right now who really want that. There's a little caveat, which is NovaNet. We'll talk about that in a little bit. In as little global data as possible, basically we just have these small pointers to heavy data in each cell. And as much as possible, we want to avoid putting things in the API database that scale with the number of instances or hosts. So putting it all together, here's what cells V2 is looking like right now. This is the path that we're on. You have an API cell at the top. It contains your Nova API services, your conductors, and they have a very specific purpose. And you know the API database and your schedulers. That's all going to live at the top. You have this kind of special cell zero that you see that's not really at the same level as the other cells. It contains a Nova database, but nothing else. No conductor, no message queue, no hosts, nothing. Essentially because we don't write an instance in the API database, it has to live within a cell. If you encounter a scheduler failure, if somebody tries to boot an instance and all your cells are full, and it says, I can't put this in any cell. We need a place to put that, and that's cell zero. It doesn't actually need to be a separate host. If you don't want it to, you could co-locate that database with your Nova API database if that makes it easy. But we just have this concept of a cell zero. And then you have your cell foo and cell bar. Basically each cell is conductors, computes a database, a message queue, some of the other services, the console, and some of the other things will have to work out exactly how those live, but you won't have a scheduler within each of the cells. It's mostly the database message queue and computes. So what's been done? Within Mataka, we created a Nova API database. So that's now a second database to be stood up, to be managed. It has its own series of database migrations to get run. And more and more things are going to be put into that database and migrated into that database, but not a ton. We have this concept of database connection switching. You know, if you run Nova now and you go to query where an instance is, it's in the one database that exists. What we are going to need is this ability to say, I have three databases for each of my three cells. I need to be able to connect to this one in particular and pull the data. So we had to write some code to do that. That's merged. And we have a few upgrade tools in place. They're experimental, but they kind of let you play around with it, test it out. They allow us upstream to write some testing jobs that test the upgrades and make sure it works. So that's what was done in Mataka. In Newton, there's a whole lot going on. So what has happened already is we've done a flavor migration to move them out of the cell database and into the Nova API database. That was a fair amount of work, but now flavors are up there. That's pretty cool. Currently, I'm working on the boot process and scheduling. What is going to happen now is when you go to boot an instance, the request comes into Nova API. We're going to store some data, enough data that you could list or show that instance, but we don't actually write the instance to the database yet. We have to go and query the scheduler. That's the job of the Nova conductor that lives in the API cell. It's actually going to hold onto that data. It's going to talk to the scheduler, say, where should I put this instance? When it gets a response, it's going to go and write an instance into that cell and then send the request to do the rest of the boot. Cell zero is... That's another work in progress. It's mostly standing up a database. This database looks like the Nova cell database. It's an exact copy of any other Nova cell database. It's essentially just standing it up and creating a mapping that says this is cell zero, and we're going to have a very simple command for doing that. There are more and simpler upgrade tools. Actually, a bunch of them are up for review right now, so they haven't quite merged yet. They are much easier to use. For the most part, it's just one command you need to run and it will set up everything for you. But those are in progress and we're trying to simplify those as much as possible. Message queue connection switching is another one. We have it for the database. We don't yet have it for the message queue, but we need the ability, like with the database, to say, I would like to reboot this instance over here, throw that reboot message on the message queue for that particular cell that the instance is in. So that's another patch that's up and in progress right now. There are going to be more migrations to the Nova API database. We don't know how many yet. Like I said, we're going to talk about that on Wednesday. Aggregates are one, we're pretty sure is going to move up there. Quotas are another one. We're pretty sure is going to move up there. There's some open questions about other ones. Like key pairs is another likely candidate, but we haven't quite decided yet. So we'll keep everybody up to date as that happens. And then we're beginning multiple cell support. Kind of the goal for Newton, I want to say neutron. The goal for Newton is that by the end of it, everybody's running cells V2. It looks and acts like a cell. You have separation between the API and this cell. It's treated as a cell, but we don't yet have all the pieces in place to use the database connection switching, use the message queue connection switching. So if you stood up two cells, some operations might work and some others might not. We might have reboot figured out, but not resize, maybe. So we're going to start the work on multiple cell support, but we'll see where we get and beyond. So afternoon, we'll see where we're at, but some of the things we would like to do is more scheduling control for cells. People who run cells V1 right now have maybe not great control for that cell's top level scheduler, but they do have some control. They can say, you know, I have flavor A only in this cell and flavor B only in that cell. And so when a request comes in, the first thing the scheduler does is rule out half the hosts because it's not even going to look at, you know, whatever cell doesn't have that flavor. Whereas the scheduler that we have for cells V2 currently still just looks at all of the hosts in aggregate. It doesn't yet have the ability to say, I have some properties on this cell that I would like to take advantage of for scheduling. So that's something we're going to be working on. We're going to finish multiple cell support. You know, that's very important. That's kind of the reason we're doing cells is to be able to have this support. So until that's done, that's where we're going to be focusing on to a large extent. And we're going to work on cells V1 to V2 upgrade tools. You have not been forgotten. I know that's a huge concern for V1 people. We're trying to get to it, but we have to have multiple cell support working before we can do that. So that's the only hold up there. The upgrade process. For people who just want a simple upgrade, they're simply running Nova right now. They don't necessarily need the scaling aspects yet, but they would eventually like to have them with this very simple two-step upgrade. But it's kind of actually one step in a sense. But in Mitaka, you already would have had to run this new NovaManage API database sync command. And everybody has to do that simple or not. Then what you're going to run after Newton is this cell V2 simple cell setup and pass in a transport URL. That's actually part of... Well, so that will set everything up and it sets it up with the constraints listed at the bottom, which is basically cell zero. The database is going to live on the same, you know, my SQL Postgres server is the Nova API database. It basically pulls the config option for that Nova API database and sets up a new database. All active hosts in your normal Nova API database, or sorry, normal Nova database will be migrated. I should say mapped. Basically, the API cell becomes aware of them. And after that happens, all of the instances that live in that database are also mapped. So the API at that point, you know, it treats it as cells V2. Everything's set up. We would like to make this a little bit simpler. There's this transport URL that has to be passed in. That's because of a limitation in the Oslo messaging library. It reads the configuration option and does a whole bunch of magic on it to make things work. So we can't actually, at this point, kind of reverse engineer that to get the transport URL that you can figure out in a way that cells needs to use it. So it's kind of required to be passed in. We're going to be doing some work around that. So hopefully the command, ultimately, will just be Nova managed cell V2, simple cell setup, or whatever we end up calling it. But right now, this will work if a few patches merge. If you need a little bit more of a custom setup, you know that you want to start scaling a little bit more now and you don't want to co-locate your databases. You still have to do the API database sync command just to create that API database. And then what you actually do is you figure out where you want cell zero to be and you pass in the database connection string yourself. And that will do the job of setting up and mapping the cell zero to the database. Then you run this map cell and host command where, again, you have to pass in the transport URL and there's a name option which got cut off. But you can actually name your cell and kind of manually make sure the hosts get mapped in there. And then you can run this map instances command and pass in the cell UUID, which you get as a return from the map cell and host command. But you can run it in a number of stages if you'd like, you pass in this max count parameter. It's an optional parameter. But you can just say, like, I want to map 100 instances at a time if you don't want to overwhelm your database as you're doing this migration. So as a result of this, all of your active hosts in the Nova database are mapped and all of your instances are mapped. There are a few caveats. One of the big ones is it looks like a, maybe a bad thing, but it isn't. You look at it a little bit more. Is if you live in a multi-cell world and let's say your database for one of those goes down and you want to list all of the instances in your deployment, we can't return that data because the database is down. However, we can give you partial data. Whereas right now, if you're running Nova and you lose your database, you can't list any instances. In the future, you only can't list the instances that are in that database that's down. So compared to V1, we don't have this, you know, at the top level, we don't have all the information to pass back. But again, we avoid all the race conditions by not doing that. So partial listings is going to be a thing that you'll start to see for like the list command. Novelist instances in particular. Networking. So I say in progress for Neutron. Actually, networking with Neutron is going to work with V2 out of the box. As long as what you want to do is just have a flat network across all of your cells. That will work just fine. What's in progress for Neutron is going to be the ability to, they're doing something called segmented networks. It's going to be the ability to kind of tie your segments to your cells if you'd like. So you can kind of, you know, split your networking up and say that this cell has this networking and this other cell has that networking because most people who've deployed V1 find that their networking is a scale bottleneck as well. And so they end up kind of splitting Neutron up behind the scenes and there's a lot of hacks in place to make that work for V1. It's not something that works upstream. So we're trying to work with Neutron to get that to be just a recognized thing and something that works for everybody. Novenet is not really planned to be supported. It will work with one cell. There's a little bit of work we might do that makes it work for the most part with multiple cells, assuming that you're again okay with this global flat network across all of your cells. And it's probably also going to tie Novenet to whatever your first cell is in particular so that you can never really remove that cell or you lose Novenet in the process. However, most people are running Neutron now. We're hoping this isn't really a thing but it's still under discussion. We haven't made the decision one way or another yet. So if we do need to work on Novenet, you know, we'll go ahead and do that. There's always the unknown unknowns. There may be obstacles. Even with cells V1, it's only scaled up to a certain point. I think the most number of cells people have is maybe in the dozens within a certain region. You know, somebody wants to stand up a thousand cells. We don't quite know what's going to happen there. But we're going to learn. You know, as it scales up, we'll continue to run it. We'll continue to test. We'll continue to fix the issues that come up. So cells V2, expected fall 2016. That's it for me. I can talk about anything in more depth if you want. I just want to kind of do a quick overview. But are there any questions? And if there are questions, would you please go to the microphone so that everybody can hear you? Yeah. It also gets recorded. Talk a little bit about how the scheduler is going to work in reservations with cells V2 or is that out of scope? What do you mean by reservations? So right now, if you make what cells V1, if you want to build an instance, it'll go from the global cell into the child cell. And then if there's no capacity there, then it's dead. It's lost. And so there's this concept of host reservations that's being talked about upstream. And I'm kind of curious how that's going to play with cells V2. So it sounds to me like the need for host reservations comes from the fact that there's a time lag between when the scheduler within a cell knows that you're out of capacity and when the cell scheduler knows that you're out of capacity. And in cells V2, where you don't have that two layers of scheduling, it seems like the problem just kind of goes away, right? Like if the first scheduler you go to knows that you're out of capacity, it's not going to say, yes, you can get past here and then fail later. Do you really need that concept of reservations? Okay, so that in that case then it's not really something that's required or in scope for cells V2. Right, as far as my understanding. I'm sure I didn't explain it all that well that everybody knows exactly what's going to happen, so please, if there are questions. Can you talk a little bit about partial listings that you mentioned briefly, but how's that going to work exactly? So let's say, you know, you're a tenant and you've created 20 instances across, let's just say two cells. And you have 10 instances in one cell, 10 instances in another. And the database for one of those cells goes down. What's going to happen if you say, you know, list your instances is at the API database we know that you have 20 instances connected to your tenant because we've mapped those instances to each cell. We're going to go talk to the database at the set or the database that's up and return all the information for those 10 instances. The database that's down, all we know is that you have instances with these UUIDs in there. So all we can really return is a list of those UUIDs and say, this is incomplete. So we can give you all of your UUIDs, but we may not be able to give you things like this is the flavor that it was booted with or, you know, this is the RAM that it has, this is the metadata that it has. And that would just be if the database was down, it wouldn't be in your normal condition. Right, this is only if the database has gone down, you know, which hopefully, you know, an operator is going to see that, try to get it back up quickly. But yes, this is just in the abnormal. Something has gone wrong case, not normal. I got a question. Okay. You said you eliminated the two layers of scheduling, which is a great thing. When a request comes in, though, how will the API cell make a determination of the conductor to pass the scheduling request to any particular cell? The scheduling request does not get passed to any particular cell. The scheduler lives at the API level, right? Yeah, the boot request. The boot request. And so then how does it pick a cell to then get scheduled? So you're saying after the scheduling, how does the boot request continue on to the cell? Yeah, I do a boot request to the API. It's got to go to a cell. Well, the boot request goes to a nova conductor that lives at the API level. Right. So it's going to talk to the scheduler. The scheduler knows about all the hosts in the deployment. It's not picking a cell, it's still picking a host for you. Okay. And then once it has a host, we say this host is in cell A. We write the instance to that database and then through RPC send it on. Okay, thanks. If you lose cell zero, do you lose everything underneath it? No, all you lose are the instances that were not able to be scheduled into any other cell. And the cell zero is really just a holding ground until those instances can be deleted, right? Like, because they're going to be in an error state, they weren't schedulable. All you can really do with them is show them in the API and delete them. So hopefully they won't even stick around that long there for most users. So that's a good question. And what he asked is, can you run cells with different nova versions? I should have put that in the and beyond slide. The long answer or the short answer is yes. The longer answer is there's some work that needs to go into being able to make that happen because you have to have the nova API database understand how to talk to databases that might have schemas at two different versions. And you need to be able to communicate kind of lowest common denominator over RPC. For all of that, we have, I've talked with a few people about ways we can go about doing that. We do plan to make that happen. It's, you know, a huge part of the upgrade story, right? Somebody's going to want to just take one cell, upgrade it, make sure everything works, and then roll that out across all of their cells. So it is planned, yes. Going back to the scheduler, and what about the scheduling based on affinity? Will you be able to do affinity based scheduling with this approach? At the moment, yes. Because again, the essentially what I'm saying right now is that the scheduler is going to have no concept of cells built into it. It's still going to know about all of the hosts in the deployment. It's going to work exactly the same way that it works today. Essentially nothing changes for it. So affinity will still work at this point, yes. Of a VM that lives on a different host than another VM. So anti affinity rule for every per VM. But you don't have the VM information on the API level, if I understand. Only the host information, and the mapping of the VM to the cells. The API, yes. Nova API only has access to the database at Nova API, which only knows about that information. But the Nova scheduler is still going to be privy to all of that information. It's still, right now what happens is there's a mechanism in place that Nova computes, say these are all the instances that I have, and they pass that information up to the scheduler. And that's not going to change at this point. So all of your hosts across all of your deployment are still telling the scheduler everything. Now, real quick I will say, you know, there are people in cells V1 that really like this two level architecture or may want it for scaling or because they don't want to change the filters and weights that they already have. So there's talk about changing the scheduler to kind of work that way. But what would happen, I think, is that the scheduler API would not change. And so for most people, if they want to do global scheduling across all of their hosts, they could continue to do that. The people who want this two level scheduling would actually plug it in in the same place. And instead of calling the scheduler twice, you would have Nova API, the conductor called to the scheduler, and then it would make a decision and call to a subscheduler. But from the Nova perspective, it looks exactly the same. Yes? So you mentioned the assumption on Neutron, assuming a flat network. So from a scalability point of view, the cells sort of solve the scalability of computer Nova. Is there a co-design with Neutron to do similar kind of segregation and scaling? Can you elaborate a little bit more on how that will happen? There's not, as far as I know, a plan for Neutron to scale its database or any messaging system in the same way that Nova has done. I think Neutron right now is more focused on scaling, I guess, what you'd call the data plane for that side and making it so you can have your actual physical networks be segregated in such a way and scale those. Your broadcast domain is a little bit smaller than being totally global. However, I'm not at this point aware of any effort they've gone to to do the same type of message cure database sharding that we are. There's been initial talks of conversation, but I don't think they're in a place to really push on that yet. So we can assume that in the future, cell may somehow map to the network topology that is created by Neutron. So Neutron sees all the cells the same as the one sort of giant pool of Nova nodes. Yes, their concept right now is what they're calling routed networks or segmented networks. And it's actually flexible enough that you don't have to map it to a cell in particular, but I suspect most people are because their physical deployment infrastructure is going to look very much the same on the compute side and the networking side. But again, it's just about kind of scaling the broadcast domains there, not making the Neutron deployment similar to the Nova cells deployment. And we are actually in any of the discussions we've had with Neutron, we're trying very hard to make it so that Neutron doesn't have to follow what Nova does in terms of cells architecture. We want them to be very free to do what they want to do. Thanks. Hi, two questions. You mentioned the aggregates. So host aggregates, server groups, all that is moving into the API database? Host aggregates are... I don't know about server groups to think on that. I'm not sure yet is what I'd say. If it makes sense to do it, it will. There was a very specific reason we moved host aggregates and had something to do with... I'm trying to remember now. Basically, we're using host aggregates as a way to map into this new concept of resource pools. Which is getting into some very low-level details of Nova. But that's actually what's going to be used to tie into Neutron to be able to make the segment to networks and Nova cells work together and also do a whole lot of other nice things for the Nova scheduler. So aggregates are a way to map that. If server groups don't need to be used for anything like that, they may end up living within the cell's database. Used for affinity and anti-affinity. So I think if the global scheduler is going to allow affinity and anti-affinity, you're going to have to have those up there. Okay. Yeah, it's one I haven't actually looked at specifically. But again, if the data is available to the scheduler from within that child cell, from within that database, or if it's pushed up to the scheduler, it's kind of one of those things where we want to balance not putting too much information within the API database. So if we don't need to push it up, we won't. However, if it ends up that we need it for some reason, we'll do it. I just don't know for that one specifically. Okay, and my other question, you just mentioned a little bit ago that compute nodes we're going to report up to the global scheduler about capacity, about what instances are on each host, just like in the normal, in the current architecture. And my question there is about scaling. Is each compute node then going to have a connection up into the global? Or you said there is no global message queue. So I guess the question is how do those compute nodes communicate that information up to the scheduler? Yeah, that's a good question. I don't know that I haven't answered for that yet. So getting into more of a long story, I will say the direction ultimately is that the Nova scheduler is going to kind of get split off to the side where it's going to have its own database and its own message queue. And it's really going to kind of be like separated from Nova. Even if it's not pulled out as a separate project on its own, we're trying to separate it as much as possible. So we may end up with a message queue that has to run just for the scheduler in order to continue passing that information up. That's maybe one of those unknown unknowns that now is a known unknown. Okay, thanks. I forget when this goes to, but we might be at time. Okay, cool. Thank you, everybody.