 All right welcome to the first afternoon session of the technical deep dive track. We had a couple of slightly lighter talks to kick off the track and now it's getting really deeply technical. Please let me introduce to you Peter Feiner, computer scientist from GridCentric and his talk is scaling the boot barrier identifying and eliminating contention in OpenStack. Please welcome Peter. Hello. Good afternoon everybody. It's my pleasure to see you all here attending my talk and at the OpenStack Summit. So I'm Peter, work for GridCentric and we're a hypervisor optimization company and our product works with OpenStack. So what we really care about is the end-to-end performance. So this talk is about one component of that performance which is the VM creation in OpenStack. So deploying applications as virtual machines has a couple of clear advantages. First of all it lets you carve up a big host and it makes your application capacity granular. So if you have a giant host and you might decide to divide that up into a hundred units of work you can you can run in those units of work virtual machines and these can be virtual machines serving capacity for totally unrelated applications. So this is great and with this model you simply increase capacity by creating more virtual machines. So nothing new here. So as your load increases your capacity at some point you need more VMs to handle what's going on. So the question you ask is when should you create more virtual machines. So you want to do it as late as possible. Say you're running a public cloud, excuse me, say you're a customer of a public cloud and you're cheap, you want to pay as little as possible to to run your service. So you want to avoid over provisioning and of course the same argument holds for private cloud. You want to have as much capacity to run other applications so you don't want to waste it by over provisioning. But you have a conflicting interest. You also want to run these virtual machines as soon as necessary. That is when the capacity starts to reach your load you don't want the new virtual machines to come up too late so that the new requests or whatever work these things are handling gets delayed. So to solve this problem you need to anticipate when the load will surpass the capacity. This in and of itself is a complicated problem or I should say it's an open problem and I'm not going to be talking about that today. The second part though is once you've modeled your capacity you need to factor into the VM creation decision how long it takes for a new virtual machine to actually start serving. And the question that I'm going to explore is how can we optimize this. That is how can we make new virtual machines come up quickly once you've decided to create them. So there are two distinct phases in the creation of a virtual machine really with any virtualization stack but of course we're talking about open stack. There's what I'm calling the VM creation time and then we sum that with the guest preparation time. So by VM creation time I mean the time it takes from when you type in nova boot until the time it shows on your dashboard of status or whatever that the virtual machine is active. And this is a significant event because it means that the virtual machine has actually started running and it's emulating the guest BIOS and then the you know the guest is booting and so forth. So the second part is the guest preparation time. So this is once the virtual machines actually started running so with QMU KVM there's a QMU process and it's churning through guest instructions or you know the however that's handled on the back end. And so now we're looking at the time for the operating system to boot and the application to actually start and start serving requests. So if you have a lean operating system like for example Ubuntu's enterprise cloud server Linux it's really fast. It'll boot in a few seconds and then if you have a stateless application say some web server that connects to a database running elsewhere you can be up and running in less than 10 seconds and serving requests. Or if you have a fat operating system say Windows and you have a big application for example you wouldn't run this on Windows say you're running Linux and you have like a big J2E application where before it starts serving stuff lean and mean it's got to do a bunch of JIT and maybe it needs to compute some indexes and this can take an arbitrary amount of time obviously. Now if you want this to be ready instantly you can use grid centrics product live images but this is something also that I'm not going to be talking about but if you'd like to know more come see our booth in the Exhibitor Hall. What I want to talk about is the VM creation time that is how long it takes for the virtual machine to actually exist and it turns out that this can take a long time and so what I'm going to do for the rest of this talk is do an experiment and then we'll explore the results and look at some techniques for improving the VM creation time. So let's start with the experimental setup. What we're going to be measuring is the time it takes to create virtual machines in parallel so we will make N creation requests in parallel that is we'll hit the API server within requests and then we're going to measure the time from the API request to the virtual machines being active. The platform we're running on is Grizzly and the compute back end we're using is Libvert plus KVM. We're using networking quantum with open V switch and the storage back end is QCao2. Now it's good I should point out I'm giving the details of the setup for two reasons. One is that it's nice to know the results for the latest and greatest and see all the progress that's been made over the years in OpenStack but the other thing is that these results only apply to this particular setup. That is the other components of OpenStack or other drivers for these various OpenStack components have different performance characteristics. Nonetheless the techniques I'm presenting can be used to analyze other OpenStack configurations and obviously I chose this setup or at least some components of this setup because this is the speediest for what I'm measuring. And the system this is all running on so it's a single compute host where everything's running all the networking is local it has 96 gigabytes of RAM, 12 cores which are hyper threaded there are two hyper threads per core there are 24 logical processors and it has solid state drives. Okay so let's look at the VM creation time. When we create one virtual machine on this setup it's pretty quick takes less than 10 seconds that's really good okay but the problem is that as we create more virtual machines in parallel so we're running different experiments say we create five in parallel takes 30 seconds 10 in parallel we're looking at about 35 40 seconds and so forth so this doesn't look very good creating many virtual machines in parallel can be slow and the problem with this of course is that say we're we're implementing that virtual machine as application paradigm that I described earlier your this limits the this limits your ability to to scale because if you want to scale a bunch or if you want to you know you want to respond to a big peak and load I will take a long time or you've got to increase your granularity which will which can be poor utilization. So as we see the the scaling excuse me the the creation time increases roughly linearly with the number of VMs we're creating in parallel so clearly there are some bottlenecks but one positive note is that this graph looks quite a bit worse without quantum when I started doing this experiments it was with with Folsom and with the Linux bridging Nova network driver and setup and so forth and the the curve would have been shifted vertically by about 10 seconds so later on I'll give some details as to why quantum made this a little better but yeah okay so all we know is that the virtual machines are being created in parallel and it's taking the amount of time it takes is roughly linear in the number of virtual machines so we think there's a bottleneck i.e. things are lining up to be created so one takes five seconds to take ten seconds and so forth so one thing that we could be waiting for is hardware that is if the the CPUs are pegged or for at a RAM or the disk is totally tapped then of course we're not going to be able to create more virtual machines in parallel because we don't have the we don't have the cycles to do it. Another broad category of possible bottlenecks are software bottlenecks so for example our locks held for a long time I know these introducing serialization across all of the virtual machine creation. The first thing we're going to look at is hardware it's much much easier to analyze hardware bottlenecks and to see if they're they're present or not and so we're going to quickly go through my favorite tool for you doing this called ATOP so let's take a look at ATOP. So ATOP as you could probably guess fits in the family of top tools like top iotop net top sure there are a dozen other ones this one is my favorite the reason is first of all it's got a really sweet curses GUI and you can you know manipulate it but the second thing is that it comprises all of the excuse me it includes all of the the measurements and statistics for most of the other top tools and it breaks these statistics down by the system wide statistics and per process statistics and you can also do it per per thread statistics. So the statistics we're going to be looking at are the the most straightforward system wide metrics and that is the CPU utilization so on this system there are 24 logical processors and it's not doing anything when I took this screenshot so it's idle 2400 percent of the time that is 24 times 100 so clearly there are some you know accounting errors because we have one free percent here so you can't complain about that. Then we have the memory so in this machine the 64 gigs of RAM and right now free is 58.5 gigs so really not a whole lot going on and then finally the disk is busy one percent of the time. Okay so the question is are we seeing this this apparent serialization of virtual machine creation due to hard work intention? So to answer this question what we're going to do is repeat the experiment but sample every two seconds the hardware metrics using ATOP so the very simple command atop-w log it's gonna output to that log file in two seconds every two seconds periodically it will output the statistics and we're gonna look at the hardware utilization for the furthest point on the curve which was n equals 20 vms being booted in parallel so I think it's important excuse me I think it's important to be precise about exactly what we're measuring just so we're on the same page here so we're looking at the RAM CPU and disk and now what exactly it means to measure these things every two seconds the RAM what we're the number that you'll get is the amount of RAM at that two second interval that happens to be used the CPU on the other hand would be reported say you see 50% CPU 60% CPU that means that 60% of the time the the processor the sum of the processors is not in the idle loop and similarly for the disk the percentage time busy is how much of the time it's actually doing something over that two second interval so at any instant that a CPU is doing something like the instantaneous measurement would be a hundred percent utilization but so you can't take instantaneous measurements you need to sample over or you need to collect the statistics over a period to get a to get it a meaningful idea okay so what we've reported here are the median and maximum in values for these statistics so starting with the RAM we basically see that we're just not using very much of it so it's clearly not a tended resource the CPU the median utilization is 14% and the maximum is 55% so most of the time that most CPUs aren't doing anything and even at the peak we're not hitting a hundred percent utilization not even close similarly with the disk the median is nine so often very little is being done or nothing is being done and at the highest of two second interval we still have 20% of the time where the disk is sitting idle so it's really important to look at these statistics because we could have some notions that say oh booting virtual machines the disk is obviously the that's what's getting hammered as the as those VMs are starting up or perhaps you know we've run out of memory because we're creating so many virtual machines and they use a lot of RAM you can you can check these these notions and do a sanity check by looking at hardware utilization metrics and so right here it's very plain that there's lots of capacity for parallelism so the answer must lie elsewhere so it's time to start looking at potential software bottlenecks okay so first of all what do I mean by a software bottleneck I'm gonna define it very broadly as something that inhibits parallelism across this collection of processes and it's usually some kind of lock contention if you're looking at a lower level performance details are you on a smaller time scale then there can be some stuff that's not explicitly in your software like micro architectural interaction that can cause contention but since we're looking at things that take dozens of seconds that it's going to be something in our software hopefully once we identify the software contention it'll just be easy it will be easy to fix unlike say if there were hardware contention where we could just throw more hardware at the problem or buy a faster disk or something we can't buy faster code so hopefully we'll just identify it and come up with some way of fixing it and luckily for us we have textbooks full of different locking strategies you know like read a writer RCU and so forth and we'll just find the right tool for the job and the technique we're going to use to identify the bottlenecks is tracing so I want to take a look at what tracing is exactly tracing involves running an application but during the during the run of the application you record events during the execution for example function entry and exit or lock acquisition and an event comprises the name of the event and the time at which it happened it's pretty straightforward and what you can do with these the sequence of events is visualize them as a as a stack of the extent of the the beginning and ending pairs of events so for example I've made did this by hand so these timings you don't reflect any actual system this is a trace of a stir dupe an implementation of stir dupe in the standard C library so for example stir dupe takes about six microseconds to run and and well stir dupe is running it calls three functions stir len to see how long the string is that we're going to duplicate malloc to allocate the return buffer and then mem copy which will copy from the original string to the return buffer so this is a fairly intuitive and straightforward format so we see the which functions were called and in essence we also see the dynamic call graph because you just go from left to right and you know as you go deeper we see which functions are called in one these these events are always recorded or in the setup we have here the recorded per thread so you'll see a call graph like this or a series of extents per thread most traces for anything worth measuring are going to be a lot nastier and bigger and hairier so we'll start looking at some of those okay so how do we go about tracing open stack what we did was added some trace decorators to nova and quantum and this was just guided by our intuition we thought what's taking a lot of time it's probably something to do with computing and networking and these events excuse me this decorator would it emits events on function call and return and before and after lock acquisition and it outputs to the trace viewer format this is a format that Google uses in their Chrome web browser and Android operating system for analyzing performance and if you're using Google Chrome you can just type about tracing into the address bar do it after the presentation so you can keep paying attention and you'll get to play around with the trace viewer so what you do is you sprinkle around some some trace points in your application and then you repeat your experiments and you hunt around the tracing results and look for bottlenecks and basically what you do is you look for bars that were for n equals one or n equals five that are short and then those bars that get really wide on n equals 20 we know that that's where the the problems are okay so the first thing we're gonna do is just start hunting around some of these traces it's too small to read what I have on the top of the screen here is the trace from n equals five and it takes about 20 seconds to create one of the virtual machines so this is a trace from one of the requests going through the stack and I'm only showing here the the events that happened in the Nova compute threads there there'd be thread there are about a dozen other timelines for threads that happen in the other processes but they there's not a whole lot going on in those and on the bottom we have the trace for the the longest n equals 20 VM creation so we see it's a lot wider at almost 100 seconds but the one thing to note is that these traces if I get them both on the screen basically have the same shape and visually you can just sort of see the the pattern of the colors each color corresponds to a function and we see that the same things happening we just have like a bunch of pointy things here and then there's a wide thingy here and then some more pointy things then we're done the same thing happens here but it's kind of stretched out so we have the pointy things and then that takes a little bit longer then we have the wide part with this nice stack of colors and then some more pointy stuff and then we're done okay so now we're going to use our intuition to guide the the analysis of these traces so what was the first thing that we said that could the first thing to look at for software contention it locks and sure enough just looking at this screen I'm going to zoom in for you guys there's a lock here can you read that lock here and we see this color I don't know what you call this color but this oliveish grayish color is repeated in a few spots and it turns out that that's the same lock and it's being acquired for about 25 seconds here and these extents don't measure how long the lock is held for it measures how long the thread is blocking waiting to hold the lock okay so that is clearly a highly contended lock because it's adding it's we could have booted you know five VMs at the same time or one of those five in the same time that we just wait holding the lock and the n equals 20 case okay so let's just bring up the detailed view here click on it and this is the compute resources lock in in Nova compute all right so we started our investigation so what is this computer resources lock and what can we do about it so we just hunted around and we found this resource accounting lock so Nova compute resource accounting is pretty straightforward it just keeps track of these are statistics that keep track of how much RAM the number of vcp use etc that this compute node is using and you use this to enforce things like quota and to provide nice statistics on the dashboard and so forth these statistics are maintained sensibly with the global locks that way if you have two VMs being created at the same time we're not trying to increment this any of these counters racially the problem is that this lock is adding about 15 seconds to the serialization of creating virtual machines now the reason why this lock is held for such a long time isn't because we're just so the lock in and of itself isn't isn't problematic until it gets really fat and this lock this lock gets wide this lock gets really wide not because of doing the arithmetic and maintaining the statistics that wouldn't take any time on this time scale that we're looking at but it's because of the new component added in grizzly which is conductor which is basically this service that does the actual database writes and Nova compute well it has this lock is transmitting the statistics to conductor via RPC so these RPCs are pretty quick they can take a couple of milliseconds or a hundred milliseconds but when you're doing 20 of these at the same time and everybody's lining up and sometimes you're doing a bunch of these in sequence while you have this lock 20 times 200 milliseconds adds up so how can we how can we deal with this we want the conductor because that you know there were solid design decisions their reasons for for introducing that and we clearly want to maintain these statistics properly so we still need this lock so we can't just get rid of it that would be that would be great okay so the solution we took to this problem comes in two parts the first part of the solution was to just to look at this code and see that a lot of the time when this lock is held and the statistics are updated and the new statistics are transmitted to conductor they're not actually changing so it turns out that in Nova compute every time instance update is called say you're changing something like that the title of the instance or you're you know you're changing what networks are associated with this instance or something the whole shebang is sent to conductor so what you can do is while you have the lock acquired you can just check the delta of the statistics to see if anything actually needs to be sent to conductor and if you do this then you avoid most of the you avoid most of the time that you hold the lock and you solve this problem in a lot of cases and so the net result is that the median creation time is reduced by about 10% when you create 20 virtual machines in parallel that's great second part of the solution which I haven't done yet but it'll also be great it's a coalesce the RPCs that do need to be sent to conductor so say you have three statistic update RPCs that are waiting to be sent they're waiting for this lock you could you you right before you send those right before you stand those new statistics you could just compute the the sum of those three statistics and send conductor one RPC so let's see what the the trace looks like with the with this improvement so before we had all of these locks one two three four and there's some more that you can't see they're too small then after doing this change now it's only apparent that two of these resource compute locks are taking a bunch of time the rest of them they're still there but they're just a lot shorter because we don't need to send RPCs so we don't need to we hold the lock for a very short amount of time so that's cool and if we were to do the the coalescing then these two RPCs excuse me these two the period the amount of time it was for the which this lock is held would decrease in these other cases because we can see it's still held for 4.5 seconds which is a significant amount of time compared to the amount time it creates it takes to create a virtual machine okay so the next thing we're gonna start hunting around for is Libvert now we know that Libvert is important in this setup because Libvert is the it's you know one of the lowest levels of the stack beneath it sits QMU and KVM namely devices and so forth and so what Libvert does is it starts the QMU process it creates the app armor profile for this instance and so forth so it does a bunch of stuff and if you're running if you're booting one virtual machine Libvert will take about three and a half seconds to do everything but when you're creating a bunch of virtual machines in parallel it turns out that Libvert can take a long time to do its business and the reason is there's a global lock in Libvert for doing anything with doing anything with QMU okay so we know about this global Libvert lock and we'd love to fix it but that's that's gonna be the next step people at Libvert are working on that and I plan to start working on that but in the meantime yes the question is there anything we can do to mitigate this problem so let's start hunting again okay so this is where we left off before now let's look for the extents where we're taking a long time to do Libvert stuff for example okay this one makes a lot of sense this is Virch Libvert domain create with flags so that's the that's the meat of actually creating the thing that takes 12 seconds okay you know that was a lot longer than before but we know there's some contention but then what about all these other Libvert calls like Libvert get Libversion six seconds that's crazy or Libvert number of domains 8.239 seconds okay that's also a long time to do something that might not need to take so long and then even more troubling after we've done the important call to create the domain we still have more Libvert calls in fact we still have 101 more Libvert calls so what's going on you know are these necessary and it's important to to ask these questions so we can see here these are all the Libvert calls I'll explain in a minute why these are happening in a different thread but we have all these Libvert calls that are happening and they're taking a long time and they're happening ostensibly after the virtual machine has been created okay so it turns out there are many short calls into Libvert to get stuff that maybe we don't have to ask Libvert about like the host name and these short calls can become long due to the global lock so the solution one solution aside from fixing Libvert is to avoid the unnecessary calls and it turns out with a little bit of work you can reduce the number of calls into Libvert from 248 to create a virtual machine to 7 and doing this reduces the maximum virtual machine creation time by 20% now unfortunately it didn't change the median the median creation time very much and the reason is the you still have to wait to Libvert to do the meat but some things were doing one little Libvert call on the tail end and the like get Libvert version after the virtual machine had already been created and that would block that would block the return from that API request so if we see the result of eliminating those Libvert calls so on the bottom we have the original or the before and on the top we have the after now in this Libvert calling thread we only have about seven calls so that's good that's an improvement and we can see this is an example where the maximum time has been decreased by 10 seconds okay all right the last thing we're gonna hunt for is event lit related problems now show of hands whose favorite threading libraries event lit okay so we have some intuition and whose least favorite threading libraries event lit okay so you're working on this and you're reading about the you look in the mailing list people have done they've fixed some performance problems with nova in the past by dealing with some event lit related stuff and they've done good work and it's gotten us a long way but perhaps there's more we can do so let's take a let's take another look at what's going on so for those of you that didn't raise your hands give a little background on event lit so event lit is a it's a cooperative user threaded threading implementation where essentially a green thread or in one of event lit threads is really a co-routine as those collection of co-routines and they're multiplexed cooperatively onto a single native thread what I mean by cooperatively is instead of blocking these threads will just yield to one another when they're gonna make something a system call that would otherwise be blocking so Python standard library normally just makes proper system calls and they block and Python threads are they're native and so you have you know one threads blocking the other threads can continue running but with event lit we only have one thread so event lit to solve this problem patches the standard library routines in Python that would normally block and it patches them to yield instead of block but the extent of this patching or the the comprehensiveness of this patching isn't a hundred percent because you can't patch stuff that's not written in Python ie or for example say libvert.so uses a send to system call to talk over a socket you can't patch that okay so to get around this in the past when stack developers used pools of native threads to do blocking libvert calls on behalf of the green threads and this is good this solved big problems but maybe there's more room for improvement so let's look in the trace to see exactly what's happening so like I said before there are all these libvert calls happening in another thread and this other thread is the this is the work of threads a native thread and these can block and here is the green thread so we have this big function call stack through the green thread and at the bottom we're doing say ver domain create with flags and then there's a corresponding function call in the native thread to actually do to communicate with libvert.so so here we see t pool which is dispatching to one of the threads and then here the works happening okay so this this trace matches the the model as far as you understand that's good but there's interesting thing happening I asked for the thread pool to do vert domain create but it takes seven seconds before it actually calls vert domain create so what's going on there and similarly define XML something like 20 seconds or 17 seconds between the the dispatching to the thread pool and the thing actually getting called so this thread pool has 20 threads and we're doing 20 requests in parallel so you know maybe we've run out of threads or maybe we've run out of threads in the thread pool that's not the case I tried increasing it still saw the same thing so it turns out that in event let there's one work queue per worker thread so say we have 20 native worker threads there are 20 work queues and the correspondence between a green thread and a worker queue that is which so green thread a will always submit work to work queue B this correspondence is fixed and it's just a function of the green threads ID so the worker index is some function some hash function of the workers ID mod the worker count and then you append the work to that work queue and so the problem here is that you can have two green threads that that want work done by a native thread and their hash happens to be the same thing and the probability is actually pretty high because the you know it's one in 20 that there will be a collision between two threads and so that's that's exactly what we were observing two green threads were submitting work to the same work queue and so the the second threads work wouldn't start to happen until the first threads work was finished so the solution is just use a standard you know use a global work queue really straightforward just two line change to event lit but you can see at the bottom of the screen here unfortunately all that buys us is waiting on libvert a little bit sooner so sad face so let's see what that looks like so whereas before say we did create with flags and then there were seven seconds before we were dispatched in the native thread here we call create with flags and then immediately were dispatched in the native thread so that's that but unfortunately the creation time it wasn't really shifted here good attempt okay so we've made a few optimizations we've made a few changes and let's look at the results so before this was the original curve and after here's the delta not a big change you know this isn't isn't my favorite side so the VM creation time the maximum time was reduced by 20% and the median time was lowered by 10% so these aren't infinitesimal these aren't meaningless savings but it's only 10 20% this isn't the this isn't the horizontal line that we had hoped for and in essence what these changes let us do is wait for Libvert sooner although the changes to the the resource contention lock did that's why the the median was shifted because we weren't waiting for Libvert there we were just reducing the total amount of work done but on the bright side if I don't understand that you basically need to fix all the contention problems before you start seeing the horizontal line so if Libvert had been fixed this line would have still been nice and linear because there would have been these other contention problems so once we or somebody else fixes Libvert which ought to be happening soon open stack will have this many fewer bottlenecks and the creation process for this particular setup will hopefully be horizontal so in conclusion low VM creation time is good it gives us a nice nice paradigm for creating applications and carving up carving up hardware and it's necessary for scaling if you want to do your scaling at the VM level VM creation time scales poorly due to software contention in particular the Libvert excuse me it so it scales poorly due to software contention so one really nice way to look at this is that the bottlenecks in Libvert were actually very easily fixed I only spent about a week fixing them it took me a lot longer to do all the tracing and whatnot so that's good Libvert excuse me opens acts a fairly clean architecture things are nicely isolated there's a few global locks Libvert still a big bottleneck but it's you know it's one bottleneck and tracing is a nice technique to help identify the contention so the future work is through the RPC coalescing for the conductor updates to eliminate the big QMU lock as it's affectionately called in Libvert circles and then to do some more instrumentation to look at other open-stack services like glance and Swift and Cinder different quantum drivers and see where the bottlenecks are and finally to perform more experiments so that's it you're right so the question is it looks like the Python decorators were added manually to the code how did you decide where to add them so it was an iterative process so I added them to the highest level points which was the RPC dispatching and the HTTP request handler so you saw these big bars that comprise the extent of the entire request and I just started to drill down so Nova has a few common design patterns like to have a manager in subclassing the manager so actually for that she's a meta class and so then every method in the that's implemented in any manager would be instrumented and then locks you know just based on the intuition that there's probably contention there but it was kind of an iterative process and once I got a detailed view and I figured out where the bottlenecks were that that was it yeah so I've used that in the past for other kind of performance measurement I didn't consider that approach for this and the reason so the reason why one might choose to use tracing as opposed to statistical sampling that is where you periodically just take a look at where the stack traces and then see based on the frequency of the samples and a function and a stack trace that's where you're spending your time you do this because of overhead so tracing every function call you introduce some overhead so with this method it was about one millisecond per function call but because Nova is doing things on a as far you know software is concerned kind of on a geological time scale like there were only 800 function calls over the course of 100 seconds doing 800 one millisecond traces you know there wasn't there wasn't a whole lot of overhead and that's the that's the easiest kind of output to analyze okay absolutely so the question is at the beginning of the talk I mentioned that quantum fix some contention issues that were in Nova network okay so Nova network and quantum open V so Nova network with the Linux bridges firewall implementation and quantum with the open V switch fire the open V switch implement a thing both use Linux's IP tables to implement the firewalls so that hasn't changed what has changed is that apparently with quantum the updates to the IP tables are coalesced so that say there's you know 10 virtual machines being created and they all need their IP tables rules in Nova network those 10 would happen in sequence due to a they be serialized due to a global lock and so 10 times a couple of seconds that I would add up with quantum their their coalesced and furthermore the network creation in quantum happens asynchronously with the the rest of the VM creation so those are just nice some nice improvements absolutely I think they'll be posted on the open stack summit site but I'll also post them if you google my name I have a page from a long time ago when I was a student I'll put them there and the code the tracing codes available on on github I made some branches yeah maybe a new question okay so the question is the observation was that this is on grizzly and the question is what does this change set look like on Folsom I don't know I did most of these patches oh the difference between the two was about 10 seconds just everything was shifted up by 10 seconds due to due to due to quantum there were other there are other things there were so that there there used to be a lot of event lit and kind of libvert blocking and database blocking problems in Essex long time ago when I first looked at this and the curve was way more horrible but the the thread pool took care of a lot of that but it's getting consistently better which is which is nice okay thank you very much