 Okay. Okay. Could we start? Yeah. So I'm Boris. I'm working at Mirantis. And today we will speak about benchmarking OpenStack at scale. So actually we should determine what is benchmarking OpenStack at scale. So what is it, why we need it, and how we are going to benchmark. And we will introduce new project that is community-based. It's called rally. And so we will speak about what is rally and internal things of rally and examples and results that we already have now. So actually it's very interesting in benchmarking to answer on two questions. So how to ensure that OpenStack works at scale and how to quickly detect the problems and performance issues and then fix it and then prove that these fixes work before production. So there is actually a simple way. We could just generate some load from concurrent users, then capture K metrics and so average times booting VMs, deleting VMs, failure rates and so on. And verify that load works fine. And we will be happy and so on. But actually what is something went wrong and doesn't work as expected. So probably it's incorrect deployment setup or probably you don't use optimal hardware or bugging the code or you didn't read enough closed manuals. So and did you read manuals closely? No? Really? So when I saw this picture, I think that after an hour you should get some vision of how to solve this problem and there should be exactly some simple easy way to solve these problems. And so actually the three common approach is to use better hardware or deploy better or make the code better. But actually how we will know what hardware we should improve or what part of OpenStack is too slow or what to improve in deployment. So as you saw it's complicated thing. So some small companies like IBM, Yahoo and so on try to bring the light and the darkness and to answer on these questions and build really that is community-based project. So it's well integrated with other OpenStack projects and we also have Viki page on OpenStack org where we have all details about benchmarks, architecture and roadmap and so on. So actually we should cover two different types of users. It's just developers and cloud operators. So for developers we should run synthetic tests and stress tests that could give us fast results of for five, ten minutes. Does your fix works or not? And for operators it's interesting more real life cloud usage and how it will work for VIK, for example, on expected lot. And also reporting and presenting these data should be different for developers and for operators. So developers should know about bottlenecks. So and this is high level architecture of rally. So there is first part it's deploying OpenStack cloud. So developers actually don't know how to deploy OpenStack. I mean I don't know. So I wrote this in rally and you could use in rally different tools to deploy OpenStack. It's DevStack, Fuel, there is just plain deploying giant that just returns existing and points of cloud and this deployment should be somewhere deployed. So there is another thing it's server providers and for server providers that provide servers on that you are going to deploy. And it could be Amazon in future and OpenStack or LX containers. So and then we should run specified scenarios and set common parameters like amount of user, amount of tenants, concurrency, type of workload and duration. And the results should be processed and so we should get execution time, failure rates, graphics, profiling data and so on. And so let's see in detail what is organization of benchmarking scenarios. So there will be two separated engines. One is for developers and another is for cloud operators. So and so for developers we are interested in synthetic workloads that are run against OpenStack cloud and data for developers is low level profiling, tomograph results, graphs and so on. And for operators it's interesting real life workloads and data is SLA, bottlenecks, incompetence, so without any details. And so let's discuss synthetic tests for developers. So let we could try to provide large amount of VMs, a lot of VMs per second, a lot of requests simultaneously uploading images and combine these scenarios. And so this will show us bottlenecks and uncover some problems that we have in OpenStack. And also if we have such tool that you are able to repeat every scenario on specified parameters, everybody will be able to specify some amount of configuration and another developers would be able to run this without any problem and get the same results. So we could build golden standard for everyone. As the project is new and we didn't have support of deployment built in, so we used fuel and the number of physical servers is really large, it's more than 500. So and let's see the examples that we get. So we build 200 servers OpenStack cloud and try to run 1,400 VMs with different concurrency. So we see that the amount of VMs doesn't influence on running next one, but we see that different amount of concurrent users influence. So it works almost two times slower with two times more users. So another test is, for example, let's try to boot and delete VM with different concurrent users. And we are seeing that for 100 users, it works five times slower than with one user. So it seems like that this should be fixed. And we get some interesting profiling data from Tomograph and Zipkin. So for example, we try to launch just three VMs and it produces more than 300 db requests and more than 17 RPC calls. And for example, delete three VMs under high load. There is about one user, it's about one minute global develop on quotas tables. And this means that these operations was done not simultaneously, but one by one. And this thing actually is a big performance issue in OpenStack. So, and this is another part. It's real workloads. Why we need actually real workloads when we have synthetic tests. So first of all, real workloads are more complex than just boot and destroy VM. And workloads are rarely changed. And actually, workload depends on your business, not on OpenStack and other things. So, and as a result, we would like to get different data. So not how long works scenario, but how long work booting of a single VM in this workload. So, and what we are going to measure is how long was provision of them, how long was destroying them, how much time takes. And so, and use VM, he could be a very big amount of time, for example, one hour or one day. And in more detail, we are interested in what component exactly works this process of probably, or for example, provision VM or destroying VM. So, and then we would like to get such charts that shows how long works this operation in specific components. This will allow us to find the component that should be improved for first and then others. Yes. And okay. So, and also it's really interesting to have historical data about, so we made some changes in hardware, some change in deployment architecture, some changes in code or other. And we should know how it influence on performance and scale options of cloud. And I would like to say and present our roadmap. It's, we should greatly improve profiling capabilities to cover this case with how long it works in specific component and extend workload definitions. So, we will be able to build real workloads. And the second thing is to run different stress tests together to get to get such information like how influence, for example, big load on Cinder on speed of boot up of VM. And it's really interesting. And so support historical data and so better a correlation between business and this reporting. So, and actually, I would like to get some help from community because project is really big and interesting and it's useful for everybody that is interested in OpenStack. So, you can join, you could join our rally community. So, to join there is Viki page, there is project space and you can ping everybody. For example, my nick is Boris42. And so, are there some questions? I would like to answer. Yeah, too fast. Okay. We are capturing. Okay, the question is how we capture that VM was put. And the answer is simple. We are checking a pooling status of VM until it becomes active. Actually, we are using only Python Clients at this moment for testing. So, the only way to, and really could be not in the same network as OpenStack Cloud. So, and assigning public IPs, I mean, floating IPs is another operation. So, we are able only to check that the status become active. Yes. Yes. But actually, in real life, you have only this way to test, for example, if you are using Horizon, it will also, it will also pool this whole data. Any other questions? Okay. Actually, it's in our future road map. So, we have a engine that makes stress testing and measure one from beginning to end, this scenario. And the second engine for real workloads will be organized another. So, there will be actions and we will measure and collect data for every actions and then process this data. But actually, this is not so useful for stress testing. So, there should be separated engines to support these two cases. And I am working now on it. Yes. I have some plans around making something like paste bin. But instead of returning just raw data, it will draw all graphics and other things. Yes? Yes. Inside VMs. In future road map, it will be great. But now we should concentrate on OpenStack infrastructure and testing how OpenStack works by itself. The question was another, why we don't use it now? And it is not yet done. Actually, if we had some monitoring and probably special images for VMs that will work after booting, probably we could try to also connect all this stuff and combine. So, it's monitoring plus times and plus profiling data. At this moment, we don't use any profiling. So, it's impossible. Yes. Yes. You could build your own road map. I probably will approve it. Yes. Actually, it's interesting. But if we try to implement all at one, I mean to implement all in one time, it won't work. I mean, we should step by step extend functionality to keep this, to keep really pretty simple for end users. If you put a lot of functions, it will be too hard to use it. I mean, without any processing of this data and combining it. So, step by step. Details about OpenStack, what? I'm sure already this was... I have some other, but I'm not able to share it at this moment. Yes. I think that it could be integrated with OpenStack Garrett. So, there is special mechanism that read all reviews messages. And if there is special syntax, for example, run benchmark and specification of benchmark, and then it could be integrated to run this benchmark against some cloud and return results. Oh, sorry. What data? Sorry. Profiling data or... Yes. That result was from production OpenStack cloud. Could you repeat? I didn't hear. Sorry. Okay. First of all, you could use profiling data to see what component or what part of OpenStack start works slow. And then you could just read the code and understand why it works slow. Then you are trying to fix it, fixing, and then running one more time benchmarks. And if these times are not lower, you're running one more time profiling and seeing that your fix helps. So, yeah. Okay. We have already patches for OpenStack. And we are going to merge it to upstream soon. And it seems like it's pretty small patches that add tomography inside OpenStack. And then all logs and RPC calls and DB calls are sent to some backend. And in our case, we used Zipkin as backend that stores all these logs, RPC calls, and DB calls. And then aggregate this data and present in that form, like a site. Okay. It's a little bit open question. So, we're now analyzing what is the better approach to measure time in each service. I mean, there is this discussion. Or what was the question? I didn't understand. Yes. Yes, this is the open question. So, we could try to analyze logs or we will build some another message in tomography and analyze these messages. It's not connected with trailing. It's more connected with your project.