 Testing. Testing. Better? Hello. Yeah. OK. All right. Let's start. Let's repeat that. So this is HorrorStore is how we keep breaking the scheduler at scale. I'm Josh Harlow. This is Velobe. And another folk who is going to be in this talk but is not here today is Megal. Let's get into a little introductions about who we are. So I guess you can go over yourself first. I don't need to cover that. So I'm Velobe. I work for the OpenStack team at Yahoo. I work for the OpenStack development team at Yahoo. I'm one of the Magnum core contributors. Magnum is container as a service. And I'm leading cross-project efforts for the Quota Library, which is known as Delimiter. And I have with me the Oslo project lead. Yeah, hi. Hi, I'm Josh. And I'm now at GoDaddy. I'm doing OpenStack there. We have a public cloud, private cloud, all that kind of stuff. I was previously in the Yahoo group with Velobe before I moved out. Now the Oslo PTL for Newton. And I've been creating a lot of libraries and all that stuff. You can look at any of these if you're interested. Some of them are in Oslo. Some of them are not. Yeah, just been doing a lot of work there for the past five years. I think Velobe's been involved for two years? Two and a half, something like that. So we've been involved with OpenStack for a while. So anyway, why is scheduling important? So yeah, so why scheduling is important is everyone is aware of this Vice Professor Dumbledore. He said it is our choices that show what we truly are far more than our ability. So scheduling is nothing more than selection, which is eventually a choice that you make where you want to run your instances. May it be virtual machine instances, bare metal instances, or any container instances. So it's pretty important in the process of finding a right node where you want to run your instance and schedule your job. So especially in the context of Yahoo when I was there, I think Velobe has more up-to-date information than I have. Yeah, so what scale are we talking about? So Yahoo works at a tremendously huge scale. And we have tens of thousands of VMs. We have tens of thousands of OpenStack bare metal nodes. And we have 30 plus clusters in six regions globally. We have throughout the globe. We have different clusters. And we have across different regions. And along with that, we have tens of thousands of non-OpenStack bare metal, which we eventually plan to move to OpenStack and make it managed by OpenStack. And considering the traffic at which the traffic day in and day out, sometimes it is very consistent. But sometimes, let's say you have a football game or a baseball game, there might be spikes in the traffic. So serving requests for that huge web traffic is immense. And then as you can see in the scale slide, we have those many number of nodes to manage. So let's set a little bit of expectations on what people want out of a scheduler, because it sort of varies depending on who you're talking to. For a user, they want a fast experience. They want to boot quickly. They don't necessarily need to know where they're going, but they don't want to have a failure. They want to have a quick time to fail. They want to see that their instance has booted on a resource. Whether it's an optimal choice for that resource, they may not be too concerned about it. Of course, then there's an administrator case here where they also want it to boot quickly because they may have various needs that need to be satisfied by the instance that it works at scale. That's definitely important for Yahoo. Administrators need to know that this thing is going to work for not just one instance, but for many instances. It's also, of course, for their case, they want to make sure they configure it to match their layout, their topology, basically, which means that if they have, say, a rack that has certain restrictions on it, maybe it has GPUs and another rack doesn't have GPUs, they want to have the ability to segregate somehow of various concepts how to do that. And they want to make sure that those instances are used for where they're needed and there's a chargeback system or whatever connected into that to make sure that those higher cost instances are used effectively. Of course, then the developer, which is probably most people in this room, they want to have as many features as possible in a way, but they also want to have the boot quickly. They also want to have the accurate decisions. They want to have a lot of the same things the other two users want, but with a little bit more concern under the how of it and making sure that it's somewhat optimal, how optimal it will be is varies depending on implementation. So we're going to go into a little bit about what Nova does here, which it may not be known to everybody, it may be known to some people in the room, but it's an interesting little overview you can see here where the user starts to request how it goes into one of the internal systems of Nova. After the API layer, it goes sort of more into internal systems inside Nova. If you're familiar with these, you can sort of zone out right now, but anyway, there's a conductor process that will actually take over the request fulfillment, basically, and that will connect into the scheduler process. That will then get a response back to the conductor here. The conductor does various other things inside Nova, and then it will eventually go to a compute, which is typically a hypervisor. You'll see that reference used here. Just think of compute as hypervisor, typically where KVM is launching VMs or LiveVirt, and so then there will be some code here that will run, and then that will potentially reject the request or may not reject, depending on what resources available. So there's this loop between six and five, or five and six here, that will try to handle not enough resources being actually available. So let's go into a little bit how that happens. Do you want to go over this one? Yeah, sure, great. So as we spoke about the expectation, so every one of us, which is here, who is here goes through those expectations, or those sort of set of expectations day in and day out, may you be user, administrator, or developer, or any person. So resource usage tracking going more depth into it. So resources, what are resources? First of all, resources are, depending on what project you're using, it is, so for Nova, the resources are CPU memory disk, for Cinder, it might be volumes and snapshots. So in case of, let's say, in VM instance provisioning, first of all, if you see here, request number three, request goes to the scheduler, and scheduler has a set of host filters and wares, and it will, first of all, grab the list of all the inventory that is there, and suitable inventory it can find from the database, and go through the filters and wares, and then evaluate what host is suitable for me to boot this particular instance. And then once it has chosen that, it will do, returns the selected host, the fourth step. Then fifth step is, it will hand over to the conductor, and the conductor will talk to, basically, talk to the compute and mention about the selected compute, and the compute will eventually do the resource claiming process. That is, even though the scheduler has said that, I will consume, let's say, depending on your flavor, you want to create two gigs of, you want to consume two gigs of RAM, and 20 gigs of hard disk, or disk, and then two VCPs, hypervisor on hypervisor, compute one, compute process is the one who will be actually claiming those resources, and making sure that those resources are used from the hypervisor. So, Josh has previously covered the rescheduling part. So rescheduling, first of all, there is a retry filter. So rescheduling or retrying happens, performs well in a low load and low utilization kind of an environment where, let's say, first of all, you picked up a host, host one, and you went ahead and scheduled, tried, went to the compute, and the compute tried to spin up an instance, but the resource might not, the resources might not be present there, because there are multiple requests coming in, and they might be consuming the resources there. So what happens in that case, there is a, this six step happens that reschedule after claim resource failed, or other failures, that is, compute bails out that, okay, I don't have enough resources for whatever you're scheduled for, and let me pass back the call to the conductor, and let the conductor schedule it again, and retry the process again, and try to pick up a different host, which might be able to satisfy all the constraints that are specified for this particular request. So yeah, going ahead, so I think filters and rears. So the whole process of how this works, at least in the conductor or the scheduler part, is sort of interesting from a perspective of how does that actually operate. It's documented fairly well on the internet, but we can go through a little bit here. There's these filters that get this host list. They reduce it down by a certain set of conditions basically, and they then, they perform a little waiting on the output of that, and then that becomes the target host that will be selected, and then ask the boot of VM, and then it may fail due to what Velobe was just describing. Some of the important filters, we can show in the next one, they're sort of in different categories. Yahoo itself and GoDaddy, some people you can turn on various ones of these or turn them off, depending on how you actually want to operate your cloud. And with the typical ones that you would see are the resource filters. Those ones limit based on some kind of a restriction of they say cores that the hypervisor has. You don't want to overload it with hundreds of VMs when you can only have like four, or that's where you have flexibility in deciding how high you go for that number of how much do you overallocate. Same with RAM, this filter, you have various other filters that sort of segregate based on how your topology of your cloud is laid out. Maybe you have availability zones and you only want a boot request for availability zone with high numbers, GPUs in it to go to that availability zone. You can do image isolation, so I think this is used for windows as well. If you want to run windows in a certain subsection of your environment, you can do that via this method. You have a bunch of other ones. Affinity filters are especially interesting, but they're a little bit more of an advanced use case in that depending on how you want to enable your users to specify this kind of concept. One of the typical examples I think is you don't want to co-host like a web server along with its database. You want to have them segregated by different hosts. Some maybe higher segregation than that. Maybe you don't even want them on the same rack or the same power unit. So you can establish these kind of concepts with these affinity filters. How you sort of structure it is sort of specific on how you structure your cloud and how you turn on these filters, and it's complicated. More complicated in my opinion than it should be, but that's probably a whole different topic. What other ones there are? There are how many instances you want to boot on, some image properties. A lot of areas are very specific capabilities and matching capabilities you can do with these filters. So let's go into a little bit about, a little bit lower level on what actually happens here. So as Fulobu was saying, there's these request objects. This is the incoming request and there's another information maintained inside Nova. There's this thing called this host object host state. What happens during the request for a instance is that there will be a recall out to this get all, get these filters hosts, which will call out to get all the hosts. So in the VM case, that is a select call from the database to get the hosts. There's some other logic in Python that does other things on top of that. So it's not a purely a database call. And then it will activate these filters in order, whatever the config file specifies, and then it will select a host, do some more database interactions at that time, and then eventually send a call out to the hypervisor. So it's all this loop happens one at a time. And like in this slide, it is like all the time the scheduler will check, have I placed all the instance? If not, this loop will continue. So if you can see it is more of a streamlined approach that you select one host at a time and then till all the hosts have been placed, this process keeps on going all the time. So yeah, so now we will look at how the Ironic or the bare metal provisioning happens because this is also important. So in case of Ironic, we again this two set of inputs are request and the Ironic hosted. So again, Ironic, so this is what we have kind of implemented internally. And then this is Ironic hosted. I have Pranesh here who was one of the main developers who worked on it. And so Ironic hosted is constructed by calling to Ironic API, which basically talks to Ironic conductor and goes to Ironic DB and fetches all the information about the host which are relevant at this particular moment. And the information gets passed back to the Ironic host. The Ironic hosted has, via Ironic API gets passed to the NOAA world which is seen in blue on the left side, right? So whatever is in orange is the Ironic world, whereas on blue left is the NOAA API world or the NOAA side of it and the scheduler part of it. So request and Ironic hosted are passed to the filtered host. So first of all, we ask for get me n filtered host and then the filter scheduler runs, which in the sixth step, we get the n weighted host. So in the previous step, we saw that we just get one weighted host at a time, but in this case, rather than trying this retrial loop and we wanted to boot multiple VMs or multiple instances, bare metal instances at the same time. And we didn't want to have this failure mode where if we didn't select a proper node, go into the retrial loop again and again. So we had this claims node where we claim for m nodes where m is less than n. So let's say we got 10 nodes, but we claim for four of them. So at any instance, we will claim for four. So that call goes again to the Ironic API that goes to Ironic conductor and fetches, puts a claim on those m nodes from the set of n nodes and then it returns a valid claim UID and that gets, so the 10th step is claimed or selected nodes and all the claimed or selected nodes gets passed to the NOAA compute and then NOAA compute will eventually do all the, talk to Ironic and Ironic will proceed ahead with all the deployment and making sure the node comes up. So the important thing to note here is there is no in-memory update of the host state information and most of the information is directly fetched from the ultimate source of truth, which in this case is the Ironic DB and there have been various talks in the Mitaka release regarding Ironic having a separate scheduler or NOAA maintaining, NOAA using the NOAA scheduler. So we took this approach and yeah, so. So just a note on that is that this is somewhat a Yahoo customized thing that was done here. Yeah, and just to mention that the spec is still under discussion for the claims thing in Ironic and this is more of a Yahoo version of it. Yeah, just to make sure that people are aware. So there is ongoing work just to keep in touch with the community about how this is going and there is work going to change it to a little bit of a similar model where they have this claim concept to avoid some of the races that happen in the typical VM case. So some of those, just to go over what some of those race conditions are. Some of them are somewhat obvious from the previous talks here and slides but some of them are basically when you have a multi-threaded request going on and you have many different messages passing through your system, you have the simultaneous work that happens in the schedule where that cache that it maintains may not be up to date. So this is where the bad decisions can happen and that trigger the retry which is typically fine under low load or you're not trying to highly optimize your cloud where that retry may happen more often. I think that hasn't been triggered as often as it could be for most people because they keep their cloud resource issues at maybe a 50% level, I can't exactly say, but if you don't try to run highly tightly packed, you don't necessarily see all the amount of retries. So that's where it goes into this high load situation. The other one that's interesting to think about is the mutual execution. When these filters are ran, they may be using out-of-date information. So say you have the filters that are running on that checking the RAM usage. If another request is at the same time being processed that has also checking the RAM usage of a hypervisor, they both may say that that hypervisor has enough information free or enough RAM free, but when one of them eventually will have reached a decision that it wasn't actually true. So that's where the retry stuff comes into play, but if there's a way to make that kind of process happen more upfront, I think it's a better sort of experience for developers, users. Yeah, and it's a classic race condition kind of a thing that you have a same state and multiple threads trying to access the same structure and it's a classical race condition. So yeah, scheduling at scale. So this is the expectation on the left side that everything, if it is working at 200 nodes or 500 nodes, will work at 1000, 10s of 1000s of node. But in reality, when you try to scale beyond 1000, it just blows up. So the volubilism performance test, whether this shows some interesting stuff, we can talk about that. I also want to mention, I guess I'll mention it afterwards, but there's some other work, like when Yahoo was trying to get to lots of bare metal, I think there were some interesting stuff that happened as well. When you have lots and lots of instances, you start to see a lot of interesting sort of performance kind of areas happen. I think one of them was along with a resource tracker, other ones, we can talk, after he goes over this, we'll go into a little bit more detail, but there's a lot of interesting stuff that comes out of what we have here for these performance tests. Yeah, so let's test the scheduler. So performance test, so first of all, I'll go through the VM cases, the VM instances and later we'll cover the bare metal cases. So what we saw as part of our performance test was, so this invent, we had like 500 node inventory and then we had one filter scheduler for now, like let's start with one filter scheduler and let's see if one provides good throughput, then we can scale it to multiple and get better throughput, but with one filter scheduler, 500 nodes and 50 concurrent boots, what we saw was throughput of 2.8 and request per second, which is really not what we expect from the scheduler, then increase the number of nodes to 1000 nodes and kept the concurrency at the same level, that is 50 concurrent boots, same configuration, one filter scheduler, you just see 1.67 requests per second, which is again like it's degrading and not, it's not even staying consistent, not increasing but degrading. So if we analyze further and see that how much NOAA scheduler is costing because when you try to schedule an instance, there are a whole lot of different processes from NOAA that get involved. First of all, NOAA API receives the request, then passes on to conductor, then goes to scheduler, scheduler returns to conductor and passes on to compute and in between there is this messaging layer as well, which is a kind of glue between everything, every service and talking to each of them. So if we slice it out, the time spent in each of the different services, we see that NOAA scheduler is consuming almost 68 to 75% of the overall schedule process for 500 nodes and it takes, NOAA scheduler takes 85 to 95% of overall schedule process for 1000 nodes. So the question arises, okay, why NOAA scheduler is taking so much amount of time? What is it doing and why are this are moving so quickly? So if and in performance, it's like if you solve the biggest bottleneck at first, your system tries to improve and can be high performance. So first of all, call to get all host rates, get the host rates, the filter scheduler does this as we saw in the previous slide, cost about 88 to 95% of the cost. In the scheduler also, this is again further specification of that, that it costs in the scheduler also 88 to 90% is cost by get all host state. Majorly accesses DB updates the in-memory cache, which is the host state cache that Josh explained in previous slide. So the bottleneck is the cache refresh step. So this cache refreshing step is basically a two-step process. So even though I have a point here, the database calls are biggest bottleneck during scheduling. So the cache refreshing step, let me elaborate more on that. So first of all, what it does is it tries to pull the information, select the information for all the valid host in my inventory. And after that is done, once it tries to construct the in-memory host state objects and when the scheduler filtering happens, it tries to consume resources from that in-memory request, in-memory data structure that is there. And, but the actual claim of the resources happens only at the compute, which will basically update the compute node stable, right? But in case of scheduler, to keep track of what's going on in the system, it will keep track of that by updating the host state map or the data structure that it is aware of. And in updating, fetching the details and then keeping everything in sync, it just spends a lot of time. And another point was that this is where we saw, and even we had to do some work for the bare-metal stuff at Yahoo when we were integrating with our legacy system. Actually, when you have one, because we were actually trying to import lots and lots of machines into, it's not necessarily ironic at that point. That was, this was about a year ago, a little before we were big into ironic. But this would be a major performance step when it would try to update a cache of about 10,000 nodes periodically or during a request that we would sort of see, ooh, this is not responding as quickly as we expected. You sort of start to see these kind of, when I think we even went to 10,000 or higher, but when you start to try to do these kind of operations, that this step was especially problematic. I think we had to actually disable some of this refreshing step to actually try to optimize toward more of the larger scale. Wasn't exactly what I think the best practice that we wanted to follow, but it was what we had to do for trying to get more inventory into OpenSec. I think ironic is taking a little different approach for this, but we'll see where that goes. Yeah, that's a really good point to mention. So yeah, so next is challenges, scheduling virtual machines. So with filter scheduler, so as we saw in previous slide that scheduler cache implementation is, so there's a bottleneck in cache refreshing step. So time consuming database block during every decision making process and that consumes a whole lot of time. And with filter scheduler, you are grabbing the host state every time. So you might be thinking that, okay, why don't we cache that and why don't we use that cache information every now and then? So the community also thought about it and they came up with this idea of caching scheduler. So, but this caching scheduler is in its very naive state and it's not fully functional from whatever we tried. We felt that it's in a very naive implementation. So unsuitable for rapidly changing compute node resources refresh after every 60 seconds. So if you are thinking of clusters beyond 1000 nodes, so not most of the OpenSec deployment at this moment will be till 1000 node, but eventually they will want to go beyond that. And if you have rapidly changing compute node resources and the caching scheduler refreshes after every 60 seconds, which doesn't give a consistent view of resources all the time and running multiple caching scheduler. So again, like if you can't get a consistent view at a particular instance running multiple caching scheduler will only increase your problem rather than helping you solve that. So this approach didn't help or won't help. So next. So caching challenges again scheduling virtual machine instances. So challenges in Juneau, if any of you are in Juneau. So in Juneau node selection time is directly proportional to the number of inventory you have. So let's say you have 100 nodes in your inventory and it takes 10 seconds to select and for the scheduling process, if you upgrade to or move to 1000 nodes you will take 100 seconds. If you move to 10,000 nodes it will take more than that. So it is directly proportional. And so that's why in the slide we have like time complexity of this is polynomial. Like it will keep on increasing as you increase more number of nodes in your inventory. So boot request times out depending on the RPC timeout. So RPC timeout in case of VMs is mostly 180 seconds. So doing this in the shorter period of time if you want to increase more nodes is not feasible and the boot request might timeout. And as we saw in the expectation slide that the user will see that eventually he or she waited for so long and eventually he or she sees a failure because of timeout error or something that you don't want the users to see. So the solution that we implemented was we had a cache for aggregate to hosted data and this information, hosted doesn't have information about aggregates or aggregate information. So we created this, we had this cache and as a result of which we were able to do a quick lookup and filter ordering also matters. So how you structure your filter ordering is very important. So I have an example here. So let's say you have aggregate image properties isolation. This filter is mostly used to also understand boot different flavors like Windows, VM or different other things. So you want to put the filter which will isolate maximum amount of host in the front. So image isolation properties will isolate most of the host then the availability zone filter pick only let's say GPU specific availability zone and so on. So filtering order is also important. So problems are at scale. So some of the issues we hit with the ironic or well before ironic and as well as during ironic we're still working through or Yahoo is working through is there's this sizes of the worker pool that ironic has that I think what was happening was that they were getting overloaded and we had to figure out how many of these, how do we associate the number of requests that we want to boot for a certain time? How do we connect it to the number of workers that the ironic pool has? That was a tweaked I think it was basically increased over the default I'm not off the top and I had to show how many but there was a mapping that we had to figure out like for this many requests based on what we're testing, how many ironic conductors do we need to have? This one the next second one is sort of well known but is being worked on as well. The Neutron server has to also be able to handle the same number of requests that are sort of are going into ironic or Neutronova. So that when it needs to set up a network it needs to respond to a network call that you have to have a similar increase to the conductor pool. You have to similarly increase the number of Neutron servers or workers that it has, as well as the pool size for the database pool size. This is the number of connections that are allowed to go to the database. That required work on how to figure out what a good number was for that. This one the next one is sort of the event lit kind of issue. Most people are moving toward a model where they use mod WSGI which sort of avoids the number of or the limitations that event lit has when it comes to running WSGI applications. The other way to run that one is the worker concept. So you can either put in an API typically and you can either bind it or connect it to an Nginx or Apache or you can use the inbuilt kind of concepts where you have many different workers that the API server will run. So I'm pretty sure that we tried both approaches and we're moving, at least Yahoo was when I was there we're moving toward a mod WSGI approach for various reasons. We're also very familiar with Apache at Yahoo so that also helps. So other stuff I think this is sort of, this is a little bit of not ironic but it's sort of general stuff that we hit. When I was talking about the resource tracker there's this sort of piece in Nova updates the cache that has all the resources that has information about all the resources in the cloud. This one especially was problematic because it starts up on Nova when Nova compute is booting up or starting up as a freshly on a new instance or hypervisor, it will actually grab a lot of the information from the database or from ironic and load it into its own memory which took actually quite a long time because it has to populate a bunch of objects, work, do some database calls and from what I understood I didn't experience this myself but there was cases where it would take an hour or many, many minutes more than it should to actually populate this information and then when you're actually doing a scheduling request it has to do similar kind of computations that cause these kind of delays in processing, the delays in booting. That's where we sort of took out the resource tracker or took out some of that functionality to try to make it so it would actually be quicker. We started working around this claims API here that would avoid having to actually access and update that resource information by just having, it's not necessarily a lock on the instance but when we do a scheduling request it would actually get the information from ironic and then do some filtering and then instead of waiting for some period in the future to make a actual request to claim that resource we would instantly kind of claim that resources as soon as we could so that that would minimize the possibility of race conditions where somebody else would have to claim the same instance so sort of delaying the feedback loop doesn't actually turn out to be that beneficial so shorten that kind of period between when you believe your resource is good to be used to when it can actually be used and we did that via the claims API and ironic is working on some work to actually make that a little more official. I forget the name of it, it's called claims or claims in scheduling or something like that. Claims API. So yeah, then this other stuff you wanna go through these? Sure, yeah so there was a periodic power check that takes almost 60 seconds if IPMI is failing and we added, like I think there's a plan to go ahead with futurists so I have the creator of futurist here and then run the jobs in parallel and so looking beyond so what plans, what do we want to? So some thoughts on this one there was a bunch of work ongoing in various projects too how do we sort of make this scheduling problem better across projects? So those previous slides I'll talk to a little bit about NOVA, how we did the scheduling, how there's a piece of NOVA that's doing the work on picking an instance but when you started getting to question what about volume scheduling or what if I wanna pick a stuff or maybe not stuff is the best example but what if I wanna pick a rack that has some storage resource over some other rack, right? So a lot of these products have similar kind of constraints on where resources can go maybe an ideal world they don't because if you have something that's stuff that's universal everywhere maybe it's not as important but there's these fundamental questions and maybe we can make better decisions about scheduling if we're somewhat aware of where all the resources in the cloud are in a way it's topology kind of question and how do we sort of move away from a model of each project knows where its resources is because then it will schedule on them but it's hard to move out of this local optimum you could call it that. Yeah and the idea is to have a holistic view of how the cloud is organized and then make decisions based of that rather than every project having a separate scheduler in itself if there's a lot of duplication and then eventually end up being having a common model where everyone can use that model and if have a holistic view of all the resources and be decision of that. Yeah so there's this spec that I started I don't know maybe three or four months ago that's called a super scheduler because it sounded cool but it's also about how do we start forming this kind of ideas around how do we move from a model where we can have a more global view but also not drastically affect each project and each project definitely doesn't want to like take away I don't want to take away any of its capabilities but how do we move to having the ability to do that with also having the local ability so there's got to be a middle ground between purely a project making a decision about its scheduling decisions and having a larger kind of view on the whole system. There's ongoing work on this one if people are want to contribute and it's free to jump in. There's awesome stuff I believe in NOVA about resource pools and resource providers and it's ongoing there so there's various works on continuing on that so there's lots of areas in here I think there's a lot of potential especially when you start to bring in like new kind of resources like containers or lots of different network concepts that are probably I can't even think of but there's lots there that can also be scheduled so it starts just to my perspective it starts to feel that there has to be some kind of more organized effort in this area which is why that's that scheduling expect is also cross-project and people are willing to jump in and help out and try to see where we can push this system to its limits and make it better. Yeah so that's what like local making decisions locally we call it local optimum it's more of a greedy approach that whatever you see locally you consume it and proceed ahead whereas what we would like to have is like global optimum which is like considering all the possibilities and then making the best or optimal decision out of it. Yep. So feel the jump in and feel the ask questions and thank you guys for coming and let's do it we can do any Q&A or you can meet us afterwards is fine too. Yeah. It might be hard to do Q&A here. Yeah. We'll ask questions if you're free. Yeah. I have a question. So around kind of some of the schedule and you did performance tests. Mm-hmm. Sure. Was I, I guess targeted towards single, singular API servers and single conductors and single schedulers and follow up would be did you kind of do any performance tests where you had multiple or distributed schedulers or in conductors running at the same time? So that's that's a to-do thing that I plan to do next like have multiple schedulers and run multiple workloads. So the inventory that was shown here I had I have some programs where I fake the inventories and running multiple schedulers both for ironic as well as VMs is something I'm looking forward to and right now I don't have any data but I would like I'd be more than interested interested to share with you if whenever I'm done with it. Cool. Sure. A couple of questions. Do you think in the interim we should take a look at databases and the MySQL and see if there's an optimization that we need to do? And then in terms of multiple schedulers is there any thought into having a scheduler per tenant? So we can say you guys compete against yourselves because you're a service provider to our internal organization. We have multiple, you know different groups with different demands. We don't mind if they compete against each other. It's an interesting idea. So the first question I think the database question, yeah I think there's obvious optimizations there. Some of the thoughts that like when I was a Yahoo when we were starting to think about this was the filter scheduling stuff really you can translate most of the stuff that's in there into database like actual database queries. Most of the filters are just like user land, Python filtering that you could without with some of those are difficult some of the filters are somewhat difficult but most of them can be just like actually just poured it into a SQL kind of statement and that would probably help out in a lot of the database optimizations that easy kind of things to do I think. As for the second question I haven't, I didn't, I didn't haven't heard about that. I haven't tried it but it may, maybe it's possible. Like multiple schedulers you could probably shard some way to do it. Shard per tenant and then. Yeah, but in that case like each tenant will have an isolation of resources that it can see and the scheduler that you run for that particular tenant or that particular project will just manage as Josh is saying a sharded approach where you just have limited set of resources but then again you might have like if you want to share multiple resources how do you do that? And all those things. So it basically separates out the whole logic of multi-tenancy and having things at a one place and then having a single scheduler to manage all the resources in the cloud. Sure. I think I'm, how can scheduling interact with reallocation of resources which might allow you to go and move from this local optima to a global optima over time? So you're thinking about dynamic reallocation? Exactly, by live migration that sort of thing. So, so first of all, so retry mechanism is one of the things where you can know where if the first request didn't went through you can retrend, schedule it on other nodes. But then for that, I think you will also need to have some historical information of where the instance has been moved to to take the scheduling, to take that into account and let's say that node, let's say fault happens or something and that host crashes and then you had to move your VM or something then you need to have an information of that kind of. I think I don't want you to necessarily build that stuff. I'm just kind of curious how we could go and reuse some of the scheduling logic that you currently have in terms of filtering and whatnot so that someone who wants to do a further optimization because they've had these VMs running over time, these instances running over time actually have more knowledge about where they should be. I mean, it's possible. I mean, I don't think the current schedule or a lot of it has much insight into that. It's definitely something that can be done. Yeah, so that is. I mean, we do that sort of thing now but we probably could go and benefit from interaction with the scheduler. Yeah, so definitely something like utilization-based stuff or utilization-based scheduling, depending on how much is utilized on a particular host, you get those information, that data and then make more better decisions at the scheduler level that this has been consumed, this has not been consumed and let me schedule at something, some node which is not more consumed and make decisions better. Cool. I guess this is for further discussion with you guys. Thanks. Next up. Just a small question. You mentioned containers, what are the challenges over there? Yes, so... Oh, I'll have to come back six months on that. I'll have to come back to that one in six months. We'll see. I'll have another slide in six months. So right now, Magnum just uses whatever scheduling mechanism NOAA provides and whatever host is chosen, it just creates container on that particular VM or VM. But right now, as he's saying, there's more to come. So let's hope for this. There's a lot of work in progress in that, so I think I'll have to wait. It's gonna have to wait. It's literally to say too much on that, really. Because there's so many different scheduling concepts in different projects that are not even NOVA. Yeah. Right, so six months. Thanks. Sure. See you. Thank you. All right, thank you.