 Hello. Good morning, everyone. And welcome to the fourth day of OpenStack Summit. My name is Ajay Gulati. And I'll be co-presenting this talk about quantifying the noisy neighbor problem in OpenStack environments with Nodir. The work was primarily done last year during Nodir's internship at Zero Stack. And we'll go over different kind of noisy neighbor environments focusing on storage, networking, and other shared resources. But before I go into the talk, I would like to just get a quick show of hands of how many people here are either OpenStack operators or administrators or even users. Oh, wow, almost most of you. How many of you have never seen a performance problem before? I didn't think so. So the motivation for the talk is essentially, if you look at private clouds, they are designed to run diverse workloads. And you would be running workloads that are compute-intensive, maybe memory-intensive, storage-intensive, or network-intensive. And fundamentally, a cloud requires consolidation and pooling of resources, where you pool in all of your resources to create this layer of infrastructure that you can now access using API calls. And not only just one person, different people could be accessing the same infrastructure through API calls and getting resources very quickly. And in terms of storage and networking, we feel that the overall sharing is much more critical as compared to even CPU and memory, because they are at least host bound. You know what's going on in the host. If you do over-commitment on a host, it's not going to impact other hosts in the cluster. But in case of storage, it's, in many cases, shared across all the machines. So now you could be doing something on one host. It can impact the storage performance. Similarly, in case of networking, it's very hard to isolate the network performance from one VM from another. And it's not just the host-level networking, also the rest of the physical networking in the data center. And the performance problem gets more and more critical as the over-commitment factor increases in either of these resources. Now, if you look at from the user or operator point of view, ultimately, we all want applications to get the performance that they really need. And if the application is really hurting in performance, that's when we go and we try to look at what's going on. As long as the application is meeting its SLA, you may be running pretty high on CPU, or you may be running a lot of IO workload. You are not going to worry about it so much. So in this talk, we try to basically answer three questions about the contention, where is there a contention problem in a cloud environment? And how severe it is? And if it is there, how quickly does it come as you start putting more and more workloads? And what are some of the best practices to reduce contention? And I think there's a lot more work to be needed both in terms of OpenStack APIs, as well as the back-end infrastructure providers to actually do good performance isolation. But at least the talk would give you some high-level ideas on what you can do and what to watch out for. So let me go over the scope of the talk, because performance is a very wide topic. And in 30 minutes, you cannot do a lot, given all the things that one can potentially cover. So in this case, we are going to use an OpenStack cloud deployment with both local storage as well as a form of shared storage, where there is application and sharing across host. We'll use networking with Neutron and OBS with DVR. And we do two kinds of evaluation studies, both micro as well as macro. In case of micro benchmarks, we primarily focus on networking and storage, given that these are ultimately the most shared resources in a cloud environment. In case of macro benchmarks, we run some of the workloads that actually stress a whole bunch of resources. For example, we run Hadoop and Jenkins. And these are the kind of workloads that stress different resources at different times. So they are not just focused on one resource, and you would see some interesting behavior as a result of them actually stressing different kind of resources. Another very interesting metric about the performance in an OpenStack environment or in a private cloud is the control plane performance itself. As you are making API calls, as you have more and more objects in the system, how is the impact of control plane performance changes? And we'll cover some of the data. And I think there also we need more in terms of improving the performance and making it scale less with the number of objects and entities in the system. So let me go over a brief outline for the talk. In the beginning, I'll go over the experimental setup what we are using for most of the workloads in the talk. Then we'll talk about the stress tool, which is a tool that we have designed to actually create different environments in the private cloud and do some interesting testing using those different topologies and environments. Then Noddy will go through the storage and the network performance evaluation using the micro benchmarks. And then he'll go over the application performance evaluation for Hadoop and Jenkins and a mix of both of them. And finally, he'll go over the control plane performance in terms of how the API call latency changes as you increase the number of objects and entities in the system. So let's go over the experimental setup. In this case, we are using a zero stack based private cloud environment. And I just want to go very quickly over how the architecture looks like so that you have some high level understanding of what kind of open stack deployment is this. Most of the findings in the talk actually are valid independent of how you have deployed open stack with a standard architecture or zero stack architecture. So in case of zero stack, it's a controller less design. You essentially have a bunch of nodes. All of them are essentially compute nodes which have compute, memory, storage, networking, all of these resources. And the control plane services or open stack services are running across these machines in a distributed manner. So you can pretty much have a block of one 2U block, which has four servers in it. And we have a control plane that runs across these ZVMs. And that essentially does the management monitoring, self-healing, all of these things relating to the private cloud services. And there is no specific dedicated controller in the environment. And if one node goes down, the control plane would detect that and would realize that there was a service running on that node. It would automatically start that service on some other node. And in case of storage, we have different configurations for storage. There is local storage pools which are based on SSDs and disks. And then there is also shared storage which is taking the local disks it's shared across these hosts. So for most of the experiments, we essentially use this minimum building block, which is a 2U node. It has four servers. And the configuration for each server, it has two sockets with eight cores, four HDDs, and two SSDs per server. So overall, the 2U block has essentially 24 drives in there. And each of the server has two 10-gigniks. And the OpenStack version we are running is based off of Kilo. And it's a very symmetric hardware. So think of this as like a cloud building block. And I think this is the world's smallest private cloud that you can get in one box, which is completely HA and fault tolerant. You can deploy just one 2U node. And you have a complete cloud running on which you can run about 100 to 200 VMs, depending on the size of the VM. In terms of stressing the environment or running different benchmarks, we essentially created a tool where we wrote a client for OpenStack in Golang. Most of the work that we do in the company is actually in Golang. And we designed and implemented a stress tool that uses these Golang library for OpenStack and can create different environments. And for that, we actually looked at some of the existing tools as well. For example, we looked at Rally. We looked at other tools from IBM, Yahoo. And what we found was that in some cases, the tools are designed primarily just for API performance testing. And in other cases, the tool is designed to do one kind of specific test, whereas we wanted to try different network topologies, where you create a topology and then you can put workloads on that topology across host. And there is a different kind of workload also that we want to be able to run, which is it could be a regular Jenkins. It could be Cassandra. It could be ELK Stack and other things. And we found that other tools were focusing on some specific set of workloads and weren't giving us enough flexibility to pretty much plug in whatever workload we wanted to plug in. So with the help of the tools, what we can do are some of the things like we can create VMs across different host or on the same host across same subnet or different subnet. And this is just a specification that you give to the tool, and it can create that environment for you. And we can also do operations like creating volume for VMs, attaching them to the VMs, detaching them, iso-searching into the VM and running some benchmark, whether it is a storage benchmark or a networking benchmark, collecting the results from each of the VMs and then running some automated tool over it to do the analysis. And finally, it gives us some plots to look at. And the tool also allows us to actually measure the API call performance. In order to deploy the application workloads, which is like Hadoop or Jenkins or others, there we use heat orchestration templates to do that work. Now, Nodhi will go through some of the experimental results. Good morning. So as I just said, we go over microbenchmarking and macrobenchmarking. So the reason to go and start from microbenchmarking is these are individual puzzle pieces which will come into play when you are running macro workload. So when we are running Jenkins or Hadoop jobs, so we would be able to analyze what part is getting bottlenecked and how we can do better. So let me start from storage. In overall experiments, I will explain what we have and what the experimental setup is, and I will deep dive into numbers. So in Zerestack storage, we expose, by default, four type of foals. Those are local SSD and a local HDD. And then we have replicated or reliable SSD and HDD. Our replication level is 3, which will tolerate disk level failure as well as host level failure. Replication level of 3 obviously gives us this fault tolerance. So the setup we use for storage evaluation, or microbenchmarking, as follows. And as you see in the figure. For storage benchmarking, we tried Ioblaser, FIO, and Iometer, and all this storage micro workload microbenchmarking tools. And we came to conclusion that for Linux-based virtualized environment, Ioblaser is well suited. So all the numbers we show here for storage are Ioblaser-based. And in Ioblaser parameter, you can specify different parameters such as queue size, read write, sequential or random workload, synchronous or asynchronous. And by combination of all these parameters, you can get more than thousands of data points. But in this talk, obviously, you won't have time to look at each individual component. Therefore, we highlight the most relevant part to our message, how storage or network gets contended in the OpenStack Cloud environment. For storage experiments, we use large flavor, which is 8 vCPUs and 16 RAM. You will see across slides this number. I will show it. Then our VMs are KVM-based. So this is the first graph to show how sequential and random workload differs on different type of storage pool. For the left-hand side, you see number of Iobs. They are in order 40 to 50,000 Iobs. And on the horizontal axis, you see block size. It's 4K, 16K, 32K. So when you run sequential workload on both, on the right-hand side, you see random workload. How many number of Iobs we get? If you compare both bars, you see that sequential workload does very well on all types of storage. While HDD disk drives do not do well when you are using a random workload. It's widely known, and it's by design, because disk is not designed to jump back and forth to do a disk sig, which will give a latency and drop your throughput. So the conclusion we make is for sequential workload, you can use any storage pool you have, local SSD, local HDD, or replicated SSD or HDD. But for random workload, you better use local SSD, local and replicated, because the magneted is more than 10x. The number of Iobs you get on SSDs are 10 times higher than disk. So make sure if your workload, if you know your workload, does more random workload than use SSD. Then we create the earlier slide I showed you is 100% sequential read, which is not that representative of the enterprise workload. So 70% read and 30% write is more representative of most of the enterprise workloads. And we look at the numbers, and we emphasize the same message, and as in the previous slide, by showing throughput numbers in addition. So you don't have to do multiplication by yourself. So on HDD, you see it's order of hundreds of megabytes per second. But in SSD, you get 10 times, more than 10 times, over around 800 Mbps throughput in local SSD and replicated SSD. So SSD back end, if you have in your private cloud, just like Zero Stack has, should be used for random workloads. This is the contention case. Two previous slides were showing you the base case, where we run single instance of VM and we get number of Iobs. This one shows what behavior do we see when we run multiple VMs in the same infrastructure. So there are many data points, but to highlight here on the replicated SSD, we have without interference case and with interfered case. So without interfered case means the previous numbers. What numbers did we get when we run single instance? With interference means we have two VMs stressing storage. And the numbers are average of the aggregate of two hosts. So as you see on the difference, when we show the delta, it's 4% to 5%, which is somehow noise and depends on the back end workloads you are running. So the message is, regardless how many VMs you are running on replicated pool, storage does not get saturated. You can create more VMs. The number of Iobs you extract from each local, from each pool, replicated pool, remains roughly comparable. But this is not all the way to the full saturation. At some point, you will hit, you will saturate the disk. You will hit the maximum amount of queue size, in which case you will get higher latency and throughput decrease. But the message take away from here is replicated SSD or replicated storage in general give you good performance. So if you are running many workloads, many VMs, stressing Iobs, you better use replicated pools so you are not limited to the particular capacity of the disk. And the other numbers we don't highlight, but in general, we show this as there is some variance. There is some variance depending on the number of Iobs you run and your number of runs. So we want to be able to control this further, but this is non-trivial. We need storage QAs to be able to control all these variants in finer granularity. In OpenStack, Libertary Release, we do have good networking QAs. And we are working and we urge community to work on storage QAs. What community has to do, as usual, is specify API so the storage backends can implement and provide storage QAs for applications so that we will be able to control all the noise we get in different storage pools as well. Now, we saw base case where we looked at the one VM and we looked at the contended case where we run two VMs. So the numbers are the same conclusion holds as the increased number of VMs to 10 VMs, 20 VMs. Now, as we mentioned in the previous slide, SSD pools are better. They give you 10x performance on random workload. And for workload, which is a random workload, even in enterprise application case, make sure to use SSD back end. And then having more pools helps. In Zersack, as I explained, by default, we go four storage pools, four types of storage pools. And VMs can consume replicated ones to keep the number of IOPS linear. As you increase more VMs, number of IOPS you get will remain same until you hit some bottleneck. In this case, disk bottleneck. And we will highlight more when we are talking about enterprise workloads. But one thing to be careful when you are having multiple pools is that you don't have to use replicated pools in all of your applications. Some of the applications come with their own replicated and reliable storage mechanism, such as Hadoop. Internally, they have replications at the level of 3, as you most of you know. So if you do, you're welcome to use replicated SSD or replicated storage for that case. But you will be creating unnecessary redundancy, and you will be wasting your storage capacity. And when you have pools, what we observed in OpenStack is that your volume and VM are not always local. You might, in OpenStack, let's you, or sometimes it happens that VM will be created in host 1 while storage will come from host 2. This is bad because of two reasons. There are many reasons, but if I highlight two of them, now every single IO you have will traverse the network. That unnecessarily stresses network. You could have the same in the same host, and it is bad for fault tolerance as well. If this node goes down, while nothing happened on the host where VM is running, your VM crashes. So you should make sure in your OpenStack environment, the pools you provide should always, the block you get for the VM comes from the same host. The way we did is, and this is one of the key take best practices, is you can create, there are multiple ways to do it. One way we did is the NOVA filter. So when you're doing allocation, once you know where host is going to land, you can make sure the block is allocated from that house. So you have host 2 volume locality, and you don't have this extra network stress and fault tolerance, unfold tolerant design. So that was about storage. Now let's look at numbers, and we will connect both of them in the enterprise workload environment. So network setup, as I explained, it does different host, different OpenStack subnet combination. It lets us put VM and run IPer, client, and server, and compare number of IOPS we get. So in this figure, you see we have one top of the rack switch, and VMs in the same color are from the same OpenStack subnet. VM in different color are in different OpenStack subnet. And VMs also traverse to top of the rack, but when hosts are collocated, there is no network stress. They pack us to not traverse through the network. They are local, they are CPU and RAM-bound of the host. In IPer, you can vary message size, runtime, and protocol, and see what numbers you get in different protocols. These are important because your applications, ultimately, use different protocols with different message size. So our numbers will have different IO size, which will represent your workload. So we use Neutron with OVS and DVR, and we use GRE for tenant isolation. All the VM flavors are extra large, meaning they have 8 vCPUs and 60 gigs of RAM. And results we show are mean of street runs. What the figure shows you is the overhead of OpenStack SDN router. So this is the same host. There is no host networking involved. We have two VMs, IPer client and server, talking to each other when they're located in the same host. So the red line or the bottom line, you see, is when hosts are talking across the router. So when you create OpenStack router, as you might already know, it will create three software hops, three additional to the existing ones, where you have Linux, Preh, OVS, and then router will have three more hops. And that obviously consumes more CPU cycles per packet. Overall, in all packet size, what we run experiments on, we have around 9% CPU degradation when you put hosts across to OpenStack subnets. So the lesson I will conclude in two slides together, this is the same experiment, but hosts are running across the hosts. Sorry, VMs are running across the hosts. And now, again, one is on the same OpenStack subnet, the other one in different OpenStack subnet. The number or performance degradation we see is comparable to the previous one. It is 12%. And then, therefore, the message is if your application is happy to coexist in same OpenStack subnet, you don't have to use different OpenStack subnets because you can. The hidden cost of putting your VMs in different subnets is that you will get performance drop. It's around 10% whenever, regardless, you create VMs in the same host or different hosts. So some suggestions to have a general, better network packet throughput is you can leverage existing technologies such as DPDK or VLAN-based tenant isolation but that comes with the cost of operation complexity and tenant limit number of tenants to 4k. We take the previous two experiments and combine them one and show that when you're running two VMs in the same host, you get around 10x more network throughput compared to the one they are running in two different hosts. So now, you can leverage this by putting your applications by application VMs in the same host. So color-quality of chatty VMs not only increases network throughput for two these VMs but it increases overall performance of your cloud because network, as we highlighted in the beginning, is a shared resource. And your storage IOPS are going from there and other applications are talking to each other across the network. And then if your VM is happy to put two VMs, if your application runs good in same host, always take advantage of it. And one way to do it is affinity rules and placement policies, smart placement policies. And we highlight here is the this is overall advantage of OpenStack, OpenStack private cloud. In public cloud, you don't have placement capabilities to take advantage of host local network for your application. So again, private cloud gives you higher control. So ultimately, it goes back to your application performance. This is the contendent part, just like in storage. We create two VMs, one client, one server. And then we keep adding more VMs in the same host. So what we show here is one VM and two VMs case. And this translates to more number of VMs but non-linear. So the throughput, what we observe mostly is OVS bound. And this is widely known in the community. OVS will do packet encapsulation, the GRE encapsulation and decapsulation, which will consume CPU cycles. And note that GRE encapsulation is not serializable operation. Regardless of number of CPUs you have per VM, packets will come to one CPU. That will always do CPU in-cap and de-cap. And that will become a bottleneck. And your overall VM throughput will go down. So as you see, in this case, single VM is not able to saturate 10 GBPS we have. With one VM, you get up to 1,500. And with two VMs, you can get to 2,000. But you can't get all the way down because of OVS becoming bottleneck and GRE encapsulation, decapsulation, taking most of your cycles. But overall, you can extract more network bandwidth from your infrastructure by placing more VMs in the hosts. Now, those were two pieces of the puzzle. Those are storage and networking numbers, which will impact our enterprise workload. Now, let's look at the enterprise workload, in this case, from Jenkins. And then we'll go back. We'll move to Hadoop. So Jenkins workload, as a representative or in private cloud, usually people use it for CI-CD. So they have a source code, which they build continuously and make sure it complies with their requirements. So VM specs, in this case, as a representative and one well-known workload, we got Linux kernel compile. So you get Linux kernel and put the Jenkins VM there and trigger build. So it continuously keeps building Linux kernel. And VM flavor we use for this case is the extra LUTCH, which has 8 vCPUs and 60 gigs of RAM with 50 gig local SSD. Just for comparison point, so you know where to start, we are showing Bayer Metal in only one dot. We are not running multiple Bayer Metals. So Bayer Metal, we get around 15 minutes to compile Linux kernel. While if you run the same workload with the same VM spec, same machine spec in a VM, you get around 23 minutes, and it is roughly 30% overhead. Virtualization overhead you pay in any environment. When we say any environment, that depends on your workload. And this typical workload, which is compute and storage intensive, will get you 30% overhead. Now, good news is, as you create more VMs, you create two, four, eight, your workload does not get contended. Because we are not using replicated storage. We are not using network. It's stressing local host resources only. And as long as we don't hit CPU overcommit, our numbers remain roughly similar. When we hit CPU overcommit, that's again only 30% increase. It's not a high penalty to pay, which might be fine for some workloads. And the message in here is, as you increase number of VMs, even if you go beyond CPU saturation of 200%, the increase you get is not exponential. It's roughly 260%, with overcommit of 2x, CPU overcommit. Now, this might be fine for some batch workloads, where you just want your workload to complete. You don't mind if it runs 40 minutes instead of 20 minutes. But if your workload is time critical, you want your workload to complete as fast as you can, then you should always make sure the CPU is not overcommitted. Because CPU overcommit, as we see, although not exponential, but increases your jump completion time in CI-CD environment. The next workload we look at is Hadoop. So Hadoop runs a TerraSort workload. Some of you or most of you might know TerraSort, where the Terra, thanks. So in TerraSort, it creates 100 bytes long random data. And then it does sorting, and it does verification. So we show time from the sorting phase only. Generation and verification are not shown here. So our environment is fovems. We have one Hadoop master and three Slays. And we run different workloads to sort 10 gigs, 20 gigs, and 30 gigs. As you just make a note to make it easier, when we go across horizontal axis, it's not necessarily we are doing the same amount of work. We are increasing it. If we create four clusters, amount of total data, your private cloud sorts will be 120 GB. So you are getting increase in the job completion time, but you are also getting more work done. But what we want to highlight from here is when your workload stresses CPU memory and network, how much increase you get in terms of job completion time. So y-axis shows you job completion time in minutes. But to sort 10 gigs of data in one cluster, two of four clusters takes roughly six minutes. As you put more cluster, your job completion time increases. And you are getting more work done as well. So one point to highlight here is when you have four clusters and you are doing 30 gigs of data, this is execution time of around 25 minutes. And that is the most contented workload. We see around 60% longer execution when we have four clusters running in parallel. Now, again, as we highlighted in Jenkins, once you overcommit your CPU, then runtime becomes nonlinear. We already showed in the Jenkins case the same conclusion holds in the Hadoop case. Now lastly, we combine both of these workloads. Enterprise workloads, as I just said in the beginning, are designed to run diverse type of workloads, which stress CPU. Some of them stress CPU and memory, just like Jenkins does. Some of them stress network, CPU, and memory, just like Hadoop does. Now, let's see the behavior when we combine both of them. So these two workloads recall that Jenkins is CPU and memory heavy. And they both have execution phase. For example, Jenkins, to compile Linux kernel, it first reads, it stresses CPU by reading many small files and generating object files from it. And then again, it stresses disk by writing all these small files. Then it, again, combines all of this and generates a final library. Now, this has CPU intensive phase, memory intensive phase, and CPU intensive phase again. Same is true for Hadoop. Sort does, you might well know Hadoop, what it does as a shuffle, map, and reduce phases. So what we want to highlight, when environment is not saturated, for example, this data point and right-hand side data point, the numbers you get are roughly similar. This is because your Jenkins workload is taking around four to 10 minutes to complete. Now, Hadoop workload, sorry, Hadoop workload is finishing quickly. So in five or 10 minutes, your Hadoop workload is done. While Jenkins usually takes 23 minutes, and it didn't quite reach the memory contention phase. Now, Jenkins and Hadoop are coexisting and starting at the same point. But because of their application behavior, they are stressing different resources in different times. This is fine. As long as you don't hit same resource in the same time, you don't get performance penalty. Now, that's a conclusion, and we will jump to the conclusion, another point, looking at this figure. Now, what we do here is we get the same data from the previous slide, when Hadoop and Jenkins are running together. And on the left-hand side, we show Hadoop-only case. Jump three slides back, and we were talking when Hadoops were running together two clusters, four clusters. And what we see is the behavior is not predictable. Sometimes Jenkins get impacted. Sometimes Hadoop gets impacted. In this case, our previous run, when we were using Hadoop-only, to sort 30 gigs of data in four clusters, we got around 60% longer execution time. Note that it is a similar number when we are running two Hadoop classes together with Jenkins. But it does not mean that Hadoop is not getting impacted, or my environment is running regardless of whatever workload I'm running. No, that's not true, because Jenkins is getting impacted. Jenkins is observing 20% longer execution time when it is running with six instances next to Hadoop. Recall from the maybe five or six slides back, Jenkins' six instances running together up to 10 or 12 when we do not see CPU overcommit was executing fine. They were completing within 23 minutes. Now, the conclusion, and we urge the community to invest more work on this, is the predictability is hard. It depends on your application, and it depends on your environment and background workloads. Better QAS, not only storage, sorry, not only network, but storage also helps. So we need better isolation to specify per volume, and maybe per VM also. So if your VM has four volumes, you want to be able to specify what QAS each volume requires. Or you specify it, and your backend storage driver provides that QAS. So you get predictable performance when you're running multiple workloads or diverse set of workloads in the same private cloud. Now, we showed you microbenchmarking and microbenchmarking, where the network and storage come into play together to impact the application performance. We also measured control plane performance. By control plane performance, we mean number of open stack objects we create. In this case, we are creating SOTY open stack entities. And entities are networks, subnets, volumes, and VMs. We create them all in serial. We create one object entity. We move to next. We keep moving to next until the workload is complete. So we show you base case compared to sequential execution. So the first run, we create SOTY object entity, SOTY open stack entities, and we leave it in the cluster. The next run, we create another SOTY entity. So your total object entities will become 60 and 90 sequentially. Now, what we see delta in comparison is 5% when you do the next round and 9% compared to the base case when you move to the last round. Now, it does not really depend on the architecture of open stack you have. JJ described zero stacks, highly-available architecture. But this is true even if you have separate form of controllers. Because ultimately, this all depends on number of objects you have in your open stack pride cloud, which actually does lots of message exchange as a number of object entities grows. Your MySQL entries will grow. Your health check for VM for volume for network will grow. Neatron will talk more frequently through RabbitMQ and so on. So what we want to emphasize is it's not relevant to zero stack environment. Welcome to run this tool on your pride cloud design. You will get comparable numbers. So one thing, two things can help. The first one is provisioning additional service instances. So if your service, for example, Neatron or RabbitMQ gets bottlenecked, you can create more instances. And you can get better control plane. But one thing we urge community to think of and investigate and would help in this case and scalability in particular is cross-project or cross-service instrumentation. Right now, for example, if you're creating one VM, it's not only Nova. Nova gets it. It talks to MySQL, does a bunch of RabbitMQ communication, talks to Neatron, talks to Cinder. So you see many components, many open stack services are involved in this path. What we do not have here is, say, in the first round, in the second round compared to first round, we got longer execution time. But we do not know whether it's because of Nova, RabbitMQ, MySQL, or Neatron. One can look at the log files and request ID and traverse all the way back and see, OK, my RabbitMQ is a bottleneck, or my Neatron is a bottleneck. I can create more Neatron instances and make my control plane faster. This is not a real-world task. OpenStack can specify. We already have in log messaging part request ID, so we know we can traverse it manually. But it would be useful to expose this API through monitoring units, such as telemetry. So one, you can query how long this request took in each component. So you can know which component is stressed and scale horizontally to get good control plane performance. Ajay? So to conclude, I think what we have seen is that the network and storage contention, it gets very critical as you put more and more workload in your environment. And over time, in the beginning, it may not get saturated, but you have to carefully monitor how your storage latencies are increasing and add more nodes to give it more throughput. Interestingly, the CPU itself, it shows less of a performance hit as you try to overcommit CPU. I think in case of memory, we didn't do that experiment. But it's fairly obvious if you do overcommit memory, the performance would go down very fast. And that is something that we didn't do here. In terms of control plane, we obviously saw that we need to scale the control plane performance as well as we add more entities. One interesting thing that we observed when we are doing a lot of experiments in the company is sometimes a lot of different people or scripts would hit the control plane. And we saw that you could actually choke the control plane itself by doing some kind of a DOS attack. And I think right now, things are not designed to prevent these kind of attacks. Obviously, it's a private cloud. You expect your users to be benign and not malicious. But doing some sort of quality of service or control on the inbound calls would also help. One thing that I think would help a lot in terms of getting overall better performance is having smarter placement policies. And that's really the advantage of a cloud that you control, where you know what kind of application you are running, which VMs would talk to each other, what are the different tiers. And you can get much higher performance by controlling the placement of those machines, both in terms of networking and CPU. And overall, I think people think of a cloud and they say, look, you are moving your application to the cloud. And the application needs to be aware of the cloud. You need to understand how a public cloud works or a general cloud works. I think what we really need to move towards is making the clouds to be application aware. The application should not be cloud aware. And that, I think, would be the ultimate goal if we can reach that. Thank you for your time.