 Okay. So, hello guys. I'm really grateful to see all of you here today and today we are going to go through what OpenStack Performance Team is and one very specific experiment we have done during last OpenStack development cycle. So, about 1,000 nodes and we of course will cover who's OpenStack Performance Team, who's taking part in it, some quick round of introductions and I hope this session will be really interesting to you guys. But first of all, let's start with some round of introductions. So, my name is Dina Belova. I am a development manager at Miranti, currently focusing on performance and scalability researchers including performance and scalability researchers of OpenStack. So, Alex? So, my name is Alexander Shapochnikov. I am a distinguished engineer at the Mirantis and my main focus is working on the making, OpenStack working on a pretty big scale, like 1,000, 2,000 nodes. So, Matthew? Thanks. So, I'm Matthew Simona. I'm working at Sineria, part of a project called Discovery and this project aims to study OpenStack to see if it fits with the Fog or Edge Computing requirements. So, we are very much interested in scaling OpenStack and doing some experiments with multi-site deployments. Cool. So, first of all, our agenda. We'll go through what OpenStack Performance Team is, when it was kicked off, what we're doing on daily basis and what can you find useful for you guys. That will quite switch to this very specific 1,000 nodes emulation experiment and Alex and Matthew will talk about methodologies, test environments that were used, the observations and the conclusions that we made. At the end of the session, we'll definitely have Q&A session time and we'll be really glad to answer all your questions. So, regarding OpenStack Performance Team, originally it was kicked off during Mitako Summit at Tokyo and technically we are part of large deployment team working group. But comparing with large deployment team were focused not on running tests, not running OpenStack on huge scale, not on the operators type of questions, not on the development type of questions, not on the deployment type of questions, but on the OpenStack evaluation and defining methodologies of how can you evaluate your OpenStack, define if it's scalable enough, is it performant enough and if it can fit your workloads you want to run on it. So, during the time we exist, we run tests on various cases for various OpenStack components and they're underlying technologies and we did it against several test environments. For instance, at Miranto, we mostly are running tests on two test beds, one is 250 physical machines, the second one is 500 physical machines. Maybe we'll go through the environment shortly and of course the tools that we chose to run tests depend much on the test nature. So, if we're talking about control plane testing of OpenStack, about measuring API latencies of OpenStack components, the most obvious solution to PIC will be Rally, basically because it's configurable enough to cover complicated cases and even extended cases like density testing where we're trying to push as many OpenStack resources as possible to the cloud and check how very primary workloads behave between those resources and how is your control plane and data plane still behaving. If we're talking about reliability testing, we can use combination of Rally and OS false library that allows us to emulate issues with OpenStack like OpenStack services availability, hardware issues, networking issues and check how your OpenStack continues to behave on both country plane and data plane layers. If we're talking about density data plane testing, right now we're mostly using shaker tool that allows you to measure and evaluate OpenStack tenant and working capabilities and measure such values as latencies and throughput and all that kind of stuff. As said, we're not only focused on OpenStack itself but on the underlying technologies like messaging bus, database and for instance container cluster systems like Kubernetes. For instance, if your OpenStack is running on top of Kubernetes, you probably would like to know how your Kubernetes is working. So, in that case, of course, we use other tools more specific for that kind of tests. In short, OpenStack performance team is trying to evaluate OpenStack, define methodologies of how you can do it and we share our experiments and our experience and going to provide in performance documentation. So, this is basically part of upstream OpenStack documentation where you can find all our tools, methodologies, test plans, test results and I suppose you can check this link and find something very interesting for you guys because we have very cool researchers regarding reliability tests, regarding neutron features testing and regarding, of course, 1000 nodes testing that guys will be shortly talking about. So, Alex, please proceed. Thank you, Dina. So, 1000 nodes experiments. What is it all about? So, first of all, 1000 nodes means 1000 of compute nodes. That's the first thing. Most of the current companies are thinking about going with pretty big clouds like 200 nodes, 500 nodes and sometimes would be even 1000 nodes. It's not a big deal previously to build a cloud of 100 or 200 nodes but usually it requires a lot of the customizations, properly patching and at some point you're figuring out that you wouldn't be able to go with the newer version of OpenStack because you have to backport some patches and you stuck with some kind of very, very customized cluster or your cloud which you're not able to upgrade and you have to migrate workload to properly newly deployed cluster. So, at some point it's pretty tricky to go with a lot of customizations. So, the best way is to go with the original upstream cloud and during our evaluation process of current, which was during the Liberty release cycle. So, we basically checked out all the testing results that we have in Miranti Scale Lab and decided to give it a try, a code of 1000 compute nodes with standard settings without any additional patching of the core services. Take one and every service by one instance. There is no HA proxy. There is nothing additional for all balancing service. So, each and every service, only one instance and 1000 of compute nodes. As you probably know, there is common approach existing in OpenStack community to put a lot of the control point services as long as core service like RebitMQ and MySQL in some particular nodes called Controllers and that will lead to some performance issues on the scale of 200 nodes because these nodes usually are a bit overloaded even if it's very, very kind of top-notch hardware used. So, the best approach is go in Granova service and put them on some separate nodes and as soon as cloud starts growing, that's pretty good. But as long as you have everything initially combined, it's very, very hard to go and move that from the working cloud. And if you start to putting every single service initially on the different nodes, that will be pretty big control point over here from the beginning of the cloud life cycle. So, we decided to change the paradigm how we deploy it and we put each and every OpenStack service into the container. So, we use Docker containers. Well, everyone knows about the benefits of the containers and it's very kind of simplifies this ACD pipeline. It simplifies development. It simplifies upgrades, updates and even downgrades and rollbacks in case of failures. So, we put each and every service into the container and how it's helped us. We probably kind of, well, we get a chance to monitor each and every container for the resources consumed by this particular service. So, it's kind of Neutron, Nova or anything by that. And we also aggregate all the computer-related services like kind of Neutron agent, Nova Compute and LiveBird in one containers and basically scale that out on top of Mesus plus Marathon to 1,000 of these container instances. So, because we put everything in the containers, we got the benefits of using the pretty simple and the pretty commonly used toolkit of CAdvisor and InflexDB and Grafana to monitor and measure all the stats. And we basically, after we deployed everything, we run the rally pretty simple case. It's called boot and list, which actually spawned the VM. After that, it will pull for its status and when it's ready, it will just do the list of the VM. So, it's kind of mostly no API request. And do that in the 50 concurrent threads for the 20,000 of the VMs. So, that's pretty neat about the testing environment. So, you could properly switch to the describing of our Mirantis lab for doing that. So, it was initially it was Mesus plus Marathon and Docker for the tasks, which are OpenStock services. That everything was run on 15 nodes, which kind of have two sockets with E5 zons, 2680, one quarter of terabyte of RAM and almost one terabyte SSD. So, that was that we took the liberty release of OpenStock because it was during the Mitaka Summit and we decided to test and do that with the kind of, you know, stable branch without going with the development stuff, not to trying to play this kind of property, some features which wasn't completely developed at that moment of time. And the one thing we modified, the libvirt driver, we didn't use the fake one because the problem with fake one is it wouldn't retrieve glance in which it wouldn't do the kind of spawning of any ports, any preparation. So, we modified the original driver and just removed from there and mocked up the part which run the QM or QVM, but still reporting active state to all the VMs which they at some point should be run it. So, it actually do all the ports for the neutron retrieving glance, I mean retrieving image from glance and stuff around that. So, that's pretty about our deployment in Mirantis. So, made you probably produce their setup and their initial results. Thank you. So, yes, at Inuyah we were using pretty much the same idea as Mirantis. We deployed OpenStack on approximately 30 nodes of a platform called Grid 5000. It's actually a small fraction of the resource that can provide the platform. So, if you want some more information about Grid 5000, you can check the performance documentation weekly and you will find all the useful links to the resource we got and how to get an access if you want. We used also a containerized deployment. We were based on the Mitaka release at the time of the experiments and to be more precise we used COLA to deploy OpenStack on the platform. And actually we just modified, we just added some new features. We used experimentation on our platform like we need it, we really need to test different topologies of service. We configure services on the fly and so on. So, we had this in augmented COLA tool which is available on GitHub. You have the link on the bottom of the slide. As Alex mentioned, we do make use of the fake drivers on our side because it was easier for us to get this cluster of 1000 nodes deployed. But, as he mentioned as well, there is one drawback to do that is that, for example, the no-effect driver doesn't implement all the neutral operation on virtual machines. So, that means that evaluating neutral with no-effect drivers may be not accurate. So, that's why I think you use the modified driver to do that. Before we dive into more details on the results, I would like to add some more elements on the experimentation process we have. We measure all the resource consumption of the OpenStack services deployed with 1000 compute nodes. So, we made a set of experimentation with this deployment, with this type of deployment. And for each deployment, we consider three distinct phases. So, you have on the left part of the slide phase one, which corresponds to a phase where OpenStack is freshly deployed. No resource has been created yet on OpenStack. So, that means that no API calls have been made to create VMs or create tenants or create networks and so on. So, that means that during this phase, OpenStack is just idle and empty. So, measuring the resource consumption on this phase is quite interesting because you will see, you will evaluate how much it costs for all the OpenStack services to maintain the state of OpenStack. Next, if we move to phase two, phase two is basically the phase where we are injecting some loads into the deployed cloud. So, as it was mentioned earlier, we use Rayleigh benchmarking tool to do that. For the sake of simplicity, we use boot and list scenario with these two key parameters, so 20k iteration and currency of 50. So, basically at the end of the benchmark, you will have 20,000 VMs created by the benchmarks. So, if we move to phase three, phase three is a phase where all the resources have been created by the benchmark but not yet cleaned by Rayleigh. So, in this phase, you will measure basically how much it costs for OpenStack to maintain all the resources created by the benchmarks. There is also, we could have added phase four which will be the cleanup phase but it's out of the scope of this presentation here. You will find all the details in the performance documentation wiki if you want. Okay, so we decided to make this presentation on a pair service basis. So, we will go through different services like RabbitMQ, MyADB and so on. And we will present you some observation and conclusion we got on these services. So, here, we consider in this slide RabbitMQ in the phase one, so where OpenStack is empty. What we can observe here, basically we made a set of experiments with different number of compute nodes. So, we increase the number of compute nodes and we measure the resource consumption on the RabbitMQ server to see how the resource are varying with the number of compute nodes. What we can observe here is that the resource consumption regarding the course usage and the memory usage is increasing linearly with the number of compute nodes. So, we started from 100 compute nodes until 1,000 compute nodes which this kind of increase is quite kind of expected. There is also another key metric to check if you, when you evaluate RabbitMQ it's the number of open connections you have on the RabbitMQ server. So, here, this graph depicts the number of open connections during the idle phase, during the phase one on the RabbitMQ server. So, as you can see here, for 1,000 compute nodes, for 1,000 compute nodes, you will have more than 15,000 open connections at the same time on the RabbitMQ server. So, here, you must pay attention to the fact that you may hit some system limit on basically the number of open files you have on your system. So, you may have to tune your system to handle this kind of load. So, if we move now to the, yeah, if you move now to the phase two, so RabbitMQ under load. So, what we observe during our experimentation, so here RabbitMQ was deployed in a standalone mode, so we use only one server to handle the load of all the messages. The load that we observe during the rally benchmark was large enough, was big enough, sorry, but still it was tolerable. So, for example, we observe a CPU usage of, a core usage of 20 maximum if you look at the graph, you will see that in average it's more like 10 core use and RAM was increasing until 17 gigabytes of RAM. During phase three, after the benchmark stopped, we can see the effects of the periodic tasks. So, the periodic tasks are basically tasks that are made to maintain the states of OpenStack and all the resources created. So, the cost of this periodic task is evaluated here to be something like between three and four cores and 16 gigabytes of RAM. If you now move to the same evaluation for the database server, we just present some highlights. We won't go into too much detail here. So, what we observe for the database server, was deployed as a standalone server as well, like Rabbit MQ. We observe that for 1,000 compute nodes, the database footprint was small even for this number of compute nodes. So, you have the number on the three first items. So, few cores usage, less than one gig RAM used, less than 200 open connection at the same time. What is interesting to note here is that you can see on the database, you can see the effect of the periodic tasks. So, we measured the effect of the periodic tasks that OpenStack is doing and we evaluated it to approximately 650 requests per second on the database. So, this is the effect of the periodic tasks. If we check now the database on the load, the conclusion is that the database is behaving correctly. So, there is something, some observation to be made nevertheless. For example, if you check the graph on the top right corner, you will see the number of connection open on the database at a certain point of time. So, here we hit something like a little bit more than 2,000 connection at the same time. So, similar to Rabit MQ, you may need to tune your MayaDB instance to accept this number of connection at the same time. Memory is increasing the same way. It's expected because connection and memory are tightly coupled. Regarding the cores, the cores usage was quite stable with an average of 0.7 cores used. So, if we now move to the OpenStack core services, so I will start with the Nova scheduler and Alex will continue with other Nova services or Nova OpenStack services. So, if you deploy the 1000 nodes cluster, it is easy to hit a bottleneck regarding the scheduler because of the nature of the scheduler. So, the scheduler service is a little bit different from the other Nova services even if they rely on the same code. The difference resides in the fact that, for example, for Nova API, Nova API is able to leverage multiple workers to handle all the incoming requests. So, as you can see on the graph here, Nova API is able to use something like 10 cores at a certain point of time to handle all the incoming requests. But for the scheduler, it's quite different because the scheduler instance is stick with only one worker. So, that means that this worker has a lot of job to do like accepting the incoming request. It must take the decision, the placement decision for the VMs and it must also continuously update its internal view of the OpenStack compute nodes. The view is important because the view reflects the state of the system and the view has to be up-to-date to take accurate decision on the placement. So, here the scheduler is only using one worker. There is one worker around to overcome this issue is that you can start several scheduler instances. So, by horizontally scaling your scheduler, you will be able to accept more incoming requests. There is also one drawback to be aware of if you start another scheduler instance is that placement decision of two schedulers may collide on a single physical post. So, you will maybe increase the number of retries you will have to do to place a VM. So, this is the observation for the Nova scheduler. So, we'll take the mic to Alex. Thank you, Mathieu. So, let's switch to another conductor. So, that is one of the most loaded services in OpenStack. So, it's actually half almost no idle timing as long as you have compute nodes, of course. So, as you could probably see, in phase one there is almost kind of close to five-course load. So, it's only about regular reports coming from the compute nodes which have no instances at the time. And during phase two, a lot of workloads place it on the compute nodes and you see the kind of pretty interesting graph which consists of small spikes and pretty big huge spikes. So, small spikes, it's kind of periodic. It's kind of, it's maybe kind of a little bit kind of slightly kind of flat, but it's basically every one minute periodic. So, it's kind of reports from the compute nodes about the resource available since kind of overall stats. And the huge spikes, it's about ten minutes' periodic which kind of reporting the states of the instances. So, it's a shutdown, active, something like that. So, as you probably see, as soon as all the workloads place it, there is nothing changed in the conductor because it's still servicing the same periodic test, there is nothing, nothing got changed. Our resource consumption is pretty huge. Sometimes it's up to 30 cores, but usually it's somewhere about kind of 10, 12 cores if it's kind of, could be averaged. And let's switch to the second service. It would be NoAPI. So, NoAPI is pretty kind of predictable. So, it consumes a lot of resources during the workload placement. RAM is growing not so significant. So, kind of, it's pretty expectable. And as you probably see, it starts a little bit, kind of a little bit shaky at the end, but it's only because of that. As we mentioned in the rally scenario, we're using Novaboot and LIST. So, LIST is, all the time, LIST are basically returning a pretty huge list of instances. So, for NoAPI, it's a little bit tricky to handle this stuff. So, only because of that, the load on it's a little bit increases. But at the end of the workload placement, you see this pretty stable load, same periodic stuff, and kind of stuff like that. And in phase four, as you could probably see a little bit, that's very, very critical part. It's when the rally starts to clean up all the workloads. And because of that, that's such huge spike. I mean, that's the kind of confirmed thing. That's, well, you should care not about only workload placement, but kind of active workload putting and removing. So, if you have pretty active usage on your cloud, you should take care about the kind of NOVA also, because it requires a lot of resources. So, that's it. And let's switch to the kind of mostly kind of hungry service, Neutron server. So, most the consumption, as we see, from the Neutron server, it was about 30 cores. Sometimes it could be, as you clean up phase, it's 35 cores. And that's pretty interesting all the phases. So, first phase, as you probably see, a lot of the drifting and with pretty high loads. And it's basically the time when all the compute nodes got added to the cluster. So, during this time, all the compute nodes are reported, all the DVR routers got created, and stuff like that. And then, just before the starting of phase two, there is a spike that is the moment when the rally test prepares the environment and creating users, tenants, and most important networks. So, that's mostly API load. And then, the second phase could be divided into two parts. Second, this kind of growing, and after that, it's falling down and going pretty steady. So, first one, it's when the Neutron, I mean, basically, we're trying to fill out all the compute nodes. It's placing one VM per node because it's kind of pretty free, all of them are available. And it's basically kind of Neutron trying to tune up each and every DVR and DHCP on each and every server. So, only because of that, kind of initial load is pretty high because it has to do that first time on the each and every node. But after about 1,000 of VMs got placed, it becomes pretty steady. And it's kind of growing significantly. And after the overclothes got placed, you see that stable load about 10 cores. It's mostly about the RPC stuff from the agent about the status. So, it's kind of clear periodic. So, that's everything about the Neutron server. So, let's switch to the conclusions we have. So, what you would like to properly to do for the big open-stack quote based on liberty or new release and if, of course, you go in with the containers. So, as you probably know, you could limit resources in containers like cores, memory. And in this case, you probably would like to tune up API RPC workers because by default, service will grab number of cores from the containers that wouldn't work for you. Or just not to limit cores and know approximately how many cores you will have from some particular system. So, you probably would like to set up some particular numbers because it will be clear at least how big the scale for some particular services. Next, MySQL and RebitMQ wasn't a bottleneck at all. At least in terms of the huge resources, I mean consumption. There is a rumors about MySQL, RebitMQ instability. There is a rumors that maybe a lack of resources or something like that. It's maybe true if you're using the clustered solution with not enough performance between the, not enough usually network bandwidth and CPU or resources for the cluster of the kind of MySQL or RebitMQ. But for MySQL, as you could probably see, there is a very, very small impact on the resources. It's consumed, I mean in our experiments in Miramarantis, it was about three or four cores in the kind of pretty stressful scenarios. For RebitMQ, yes, it took a lot of resources. Clustered solution will take even more because it's overhead for the cluster, but it should work fine still. And main issue is the scheduler performance. So, as Meteor already mentioned, one scheduler is not enough and multiple schedulers wouldn't work for you because they will have a lot of the race condition in the putting the VMs on the workloads. As far as I know, in the Ocata release, there should be some improvements for the scheduler which kind of allow it to go and be at least working on the multisread course in some, there's a little bit different approach in the workload placement and scheduling in-between inside of the scheduler itself. So, just properly, you know? Yeah, so, before I go ahead, thanks Alex and Meteor and there is one thing I really want to emphasize that such kind of experiment can run on really modest type of hardware lab and if you have your own interest in checking some specific services or some specific workloads, not necessarily the ones that we were taking, you can do it yourself, use the methodologies that are written and described on the performance documentation. So, the first link is about 1000 nodes. We have our weekly meetings of our performance team on Tuesdays at 3.30 p.m. UTC and there is some set of sessions I really want you to emphasize. Check this week on OpenStack Summit. Later today, Red Hat guys are going to have really cool session regarding BrowBeat. This is a testing tool that can check OpenStack scale on performance using Rally and several wrappers around it. Tomorrow, there is a really cool session made by Mirantis Neutron team regarding OpenStack Neutron being production ready for large-scale deployments and they did really cool research around how many VMs can pretty modest hardware installation of OpenStack with Neutron handle and how do the workloads behave on those type of environments and how the data plan and control plan behave. On Thursday, we'll have our OpenStack performance team session regarding what was done during Neutron cycle and the Cata planning because this is only one of the experiments we run. We have a bunch of stuff to share and if you would like to know more, just join us on Thursday so we can jump to Q&A session. Any questions? Just please use the mic if you want to ask something because it's recorded. You have fake NOVA and Neutron agents which reported instantly, isn't it? No, actually no. You mean about the Inria or Mirantis experiment? I mean about the compute part of the... No, that was the actual agent. So they take time to respond for output of... Yes, yes, yes. Okay, thank you. That was the main point using non-original fake driver and modified one, yeah. Have you dragged the scale of Neutron servers? Basically, we initially we placed a lot of the servers combined across these 15 nodes in the containers and later we figured out just using kind of not the full scale like kind of 20,000 of nodes when we would like kind of work a lot with kind of 200,000 of VMs we take a look on which services consumption resource and after that we just moved all this kind of service like it was first we moved conductor then we moved the Neutron server and NOVA PI initially we separated MySQL and RebitMQ but MySQL later we returned it to the common pool because we've seen that the resources consumption not so big. Yes, as the container also. We just dedicate all the resources. We tried to use two of them but in our case we didn't use the load balancing I mean in terms of the HA proxy we used just DNS round robin so in this case it would be understand API request would be splitted but RPC would be consuming with both of them. So the load is kind of going just split it out to it's stable to both of them so that's also fine, yeah. What level would you like? Hello, I've got two questions using NOVA cells help with the scalability of the scheduler and the second one is did you look at how changing the frequency of periodic tasks would help with performance? Okay, so maybe I can. So in these experiments we haven't looked at the cells it's planned for the next cycle actually so here we only use the 1000 computers in the same pool so maybe in six months we will have some results regarding the cells. For your second question, yes we've changed the timing of the periodic tasks and it helps a little bit because of course the periodic tasks are not so doesn't load I think the periodic task was running like every 10 seconds for some part so if you increase that you can decrease the load on the database and all the load on the RabbitMQ as well so you can with 1000 load you can find the load of 100 clusters if you decrease the periodic task timing it helps. I could also add a little bit so about the periodic tasks that's the pros and cons of decreasing the time so if you increase in the time you decrease in the load in the state of the current cluster you wouldn't be able to pull it accordingly so for the 10 minutes tasks that should be something changed in another computer conductor to make it shuffled because as you've seen we spawn all the nodes and because of that we have all the spikes because if these nodes would be theoretically randomly launched they would have a different time and it would be ideally a static graph but that's not possible there's no such random stuff usually everything is spin up close to the same time so it's maybe worth a try to add the random time to this 10 minutes periodic because it's pretty huge on load for the one minute periodic of the conductor you could decrease it but the problem is that based on this particular task schedule the site which computer nodes have amount of research so if you decrease it to the 2 minutes then you will have the information only once the 2 minutes so scheduler may do the incorrect decision based on that so that's basically that time usually decrease it up to 10 seconds so to have more or less immediate information about the computer nodes you usually decrease it but it depends it could be pretty flexible on that so it depends on how many and how often you're placing your workloads on your cloud because you have kind of pretty monolithic cloud there's no changes but if you have some kind of CICD processes in it and it's kind of spawning a lot of VMs removing them in this case you probably would like to go with the pretty small periodic timings just a question about the like an architectural question so you didn't have in your setup NHA whatsoever no it's not you have just one rabbit ok so no mirroring of the kills which killing the performance of the rabbit we tried it's kind of it's not just killing it it adds a little bit overhead on the performance somebody else from Grantis was telling us on the session before your session that it's killing like with 3 members with the full mirroring of the kills it's killing well performance by half yeah that's true if all kills are mirrored if all kills are mirrored that's basically not the case that's the first case and as you probably seen we have 20 cores what even not top-notch kind of mediocre servers have about kind of well let's say 40 cores in this case it's still one main moment we used rebutting Q for all the services in case if you're thinking that well it's maybe not enough performance of some particular if you grab the new cluster situation with containers allow you to spawn a different cluster for the different servers just multiple rabbit mqs makes sense have you noticed or have you looked and seen their volume service this is actually like the problem is we seen their volume because from my experience there's a lot of customers of Grantis using completely different drivers for the sender and there's a lot of users who use each of them which storage did you use for this test we didn't use sender volume at all alright so that's in the room most of this no no no any more questions? just please use mine because it's recorded yes sorry so I had a special configuration of the database of the red band we used in this test by default configuration we did a little bit we have this small slide with the kind of settings that you probably would like to apply it's not mandatory but it's highly recommended to go with because you may kind of not fit into the kind of numbers that you already have so for database it's mostly about the connections we did nothing beside the connections for the nova api of course max pool size because there's a lot of connections to the database so you have to have pretty big pool size for the workers so for conductors we already mentioned that's usually kind of only number of workers for scheduler it's that's a general recommendation so to have a pretty well quite fast provisioning of the workloads you have to go with at least one scheduler per 100 components nodes and you will have this cons it will be close to 100% load so if this saturation is kind of pretty high you will probably see sometimes the rescheduling happening because of some of the schedulers will try to put different VMs on one compute nodes and one of them will be killed and rescheduled so that's probably it and about the linux you probably will have to go and tune up some of these settings on the older servers because there's a lot of connections going on so we changed the queue size we changed the max connection parameter that's things was mandatory but for we done that initially because we have everything on 15 nodes so it's kind of it's pretty huge amount of load but it's still true for the end that's probably it and we have run out of time sorry we just okay thank you thank you guys