 Okay Hi guys Okay, thank you all very much for coming today and Before starting I would like to introduce myself. So for two twos who doesn't know about me. I'm Boris Paolovic I'm currently PTL of rally and OS profile initiative So I'm trying to lead in this direction and that is what I am working on in merantis and this is my friend Ronan and Could you introduce yourself? Yeah, good afternoon folks? Thank you for joining us My name is Rohan Kanade. I work for red hat and I actively contribute to the rally project I am a core reviewer for the rally project in today's presentation. We are going to discuss about How to evaluate rally how to evaluate vendors and drivers using rally and OS profiler and how to make meaning out of that data so moving on firstly, we will talk about why we are here and What exactly are open-stack drivers and how to evaluate them? Why is it hard to evaluate them? secondly, we will Do a small overview of rally and OS profiler and how to evaluate your vendor drivers using rally and OS profiler and finally we'll be Doing a bit will be presenting our kilo cycle goals for rally and OS profiler So open-stack as we all know, it's a really really huge complex ecosystem Nobody knows how it works. There's big services. There's lots of projects and lots of services Even I don't know how it works So But so going by the numbers Open-stack has experienced significant code growth majority of which has been the driver Majority has been the driver code, which is a really scary part because your driver code is increasing every cycle and Just to back it up Yeah, so going by a latest acolytic reports the Juno cycle has hundred plus Vendors hundred plus drivers in tree There's 23 new vendors vendor drivers coming in the Juno cycle and already seven new drivers have been planned for the kilo cycle there's there's too many choices and If you take a look at from other side all projects has the quite similar structure so there is a rest API that is facing by user that is any HTTP requests and I'll just use cinder as a basic project because it's quite I know common and simple For understanding and we have cinder API. We hand cinder scheduler. We have API of Tender manager. So this service is working actually on node that presents or resources like volumes and so on and Inside every project inside every manager. We have a bunch of drivers that can be used So we can use LVM drivers set for other drivers that are making actually calls to this operation system or some other No driver back end and there is a cinder database. So when we are talking about And when we are talking about Validating how exactly drivers works. We have a lot of issues because there is a lot of overhead in cinder API cinder scheduler I'm compare Provider that we are using DB and issues can be in different places And we cannot be sure that it's not the deployment issue It's the issue of specific driver and that is goal of our presentation First of all to understand how they work. So we will have different deployments and just run some load and get some data or and we can measure only API calls and time that is required to get these inactive state resources and The second thing is we will show how to ensure that it's not deployment issue that it's exactly call of driver and let's move to the next slide. So it's So we need to get familiar with rally and rally project is actually If we are thinking from user facing side, it's quite simple tool. You have just a couple of comments to register and rally your deployment and present credentials of it and to Actually run a benchmark task that will generate lot collect all data and present reports So it's really simple and I will show you That in next slide. So Okay, to Get ready you should you just it clone it from Stackforged to install it just one common Install from there is automated installation script and then you should source your open RC file of your cloud So with credentials and run just one common rally deployment create that will actually present these deployment for rally and next common will run from samples just Create and delete volume benchmark that creates and deletes volumes from different users and The last one comment will build a just a pretty HTML report with results that can we take a look at the next slide so Actually, we can get a lot of interesting information from this graph is how different our duration that is required from creating volume and deleting volume and as well we can get distribution and of operation so Okay, but what to do this with all this stuff, let's try just to compare LVM and set it's Really development environment. So it's on laptop done and it's just tech So I cannot say that these are real numbers But what we can see is that safe performs much better. So for example attaching volume to the Instance takes two times more to LVM and deleting volume takes 12 time more so and Why it happens doesn't even enough From auditory No, okay. I'll say so When we are deleting LVM volume, it's in there is putting zeros to All bytes of this volume before deleting it So this operation takes a lot of time when volumes are slow and big so But to understand that this is the root reason why it's slow we need a profiler open stack profiler, which is OS profiler library and So basically it it is based on Cilometer and I'm QP. So from different services We are on request Sending a lot of messages for every point in the trace tree We are sending two messages when it started and when it stopped With some other metadata that is specific for different points like if it is rest call And we are processing where GI middleware Then we are sending information about what was URL and other data And if it is RPC call it's interesting what is what was in message and if it is for example DB call then it is interesting what SQL request was and So Doing this we are sending from Nova API Nova schedule Nova compute or Cinder API Cinder scandal Cinder volume a lot of messages with profiling information to Cilometer every trace message contains free trace IDs. So one trace ID is related to request and It's the similar for all messages related to this request. This allows us to fetch all messages related To one request. The second is parent ID. So the trace point that was before and base ID is the idea of current point and in info we can put at least name. It's the type of the Point is wish GI or PC rest and Project information it can be know about Cinder glance other every project has services So we are putting as well name of service. So it's compute volume Nova schedule Cinder schedule and When it was started or when it was finished So and any metadata that can contain this or else and other stuff. So let's move to next and Working with OS profiler with When it is measured to upstream not in all project is it is measured but in Cinder it is measured You can just add these extra argument minus minus profile with secret key and After running any comments like delete some volume You will get this message with trace ID and how to display this trace information in HTML file and if we run this comment will get A file with profile HTML and if you open this file in browser will get full trace and on the left side you See the nested level So one two three four and this is quite huge trace ID. So in case of presentation, I have to crop it and but we already see here that there is a driver Cinder call and It takes almost all time of request 10 seconds and if we click on it We will get next thing that it was that Cinder volume drivers LVM LVM is driver delete volume method. So we know that these methods in code Exactly takes all this time and if we would like to investigate what part of this call takes We can just add more trace points in code and rerun this comment and we'll get more interesting information so Let's just go to the results and so we made a lot of work around rally and We it's already in a lot of project gates. So on every patch for example in Cinder glance Keystone and other projects and Neutron We are running special rally benchmarks and we are able to benchmark on every patch like we are running tempest and In case of voice profiler, it's already in Cinder Cinder glance hit and there are patches for Keystone and Neutrona, but they are waiting new cycle and So these are quite separator Separate initiatives and what we are going to do now is to combine them. So first of all There we would like to integrate rally and noise profiler together so you'll be able to make a lot and trace in the same time and Compare results without lot and with lot so to trace as one without lot one with lot and do this in one Simple action with rally like rally task start and this benchmark task so the So we would like to get as well as profiler in all in all projects and I hope it will happen and We would like really to support comparing results of rally on OS profiler. So I mean you can compare by your eyes, but it's not automated and When something is not automated, it's not so useful as it can be in the case if it is automated and as well simplify usages of voice profiler with DevStack so when you're running DevStack, you have to specify a lot of different and strange and Earth will arguments in local RC to run it with OS profiler And we would like to have something like we have debug verbose and we would like to have for example profiling enabled so And it will be please Okay, profiling in OpenStack gates So as we have rally jobs for all for a lot of projects We can have and have profiling with rally then we have do this in gates. So you will get not only Performance results of API performance, but trace as well. So this will actually help a lot developers to understand Does they change fix exactly the problem and issue that they are fixing and it's not just the Such deployment and such stuff Okay, so and yeah, I would like to concur the world twice So that is all quite short So any questions? We would be like we would like to answer on them if you have Okay, sure. I don't hear in rally So we have a bunch of tests for different projects like Novots in their glance for Nova is quite well covered. So you can do almost all operations with VMs and Actually, the goal of rally is not only to have a huge repository of Benchmarks but make it simple to write so for writing for example booting deleting VM You need just a few lines of code. We are working with Python clients and It's really simple framework to do such operations. So you are just specifying some set of operations from some user in Python That just takes a few lines of code and then you can use any load generator and different Environments like different amount of tenants users and generate load with it So but there is already a lot of benchmarks For I guess sure so benchmarking horizon with rally That's hard question so And So Actually, it's hard to benchmark especially if it has a juxtap and it has a juxtap If it is just plain requests some to horizon then probably it's not hard to do it with Doing using some kind of or leap or something else request leap and you can just do different request parts Answers and understand is it everything okay and do some set of actions? So that can be done simply But if there is a juxtap and other stuff that works inside browser It can be quite tricky to get All this data together sure so So okay, could you repeat your experiment? Yes Sure sure you can but as I said before Different deployments has different performance and has different running services and the results can be quite different But overall if you have for example DevStack installation and you have fix and you are running before fix You are running after fix you'll get quite good results if it fix actually this issue That's why it was done actually to repeat experiments simple Okay, so I missed the beginning of question. So So I'm afraid about lawyers So I am not sure that I am going to publish Any results, but I will happy to Actually have a repository of such results and actually we are doing this and we have some kind of user stories And if you have any results that you are allowed to share it will be nice to get there So criterias about what is the baseline for drivers to be inside core projects or I Think that I should discuss this with PTL of projects because I'm not PTL of Cinder and Nova and all other projects So not enough big dictator To do this So, okay, sure So yeah, sure So having a long task with a lot of benchmarks golden benchmarks that you can run and I ensure that your system Performs well or and compare them It's not only Lord generation I mean you are getting a lot of data and we are trying to get as much as possible and build on top of it logic So, I mean it's very huge work to make Standard-designed benchmarks for all open stack and or everything and for these huge work We have to have everything ready Like Lord generation processing results storing results all benchmarks standardization Typical task with the set of benchmarks that we should run a lot of expertise related to what performance of 10 nodes cloud should be what performance on 20 nodes clouds what performance on 100 now And when we get all these together, then we will have what you are saying, but it's huge work It's just work and we are working Yeah, but we are trying to do this, but it takes a lot of time Okay, any other questions? Okay, it seems like no more questions Okay So no questions. Okay. Thank you for coming. I hope it was interesting