 I guess it's time, so you're all welcome to this afternoon's Dell breakout session. And for the next 40 or so minutes, we'll be looking at how we measure OpenStack performance and we will specifically be looking at a new benchmark, an industry standard benchmark, which will be released out of the spec organization, and we'll be talking more into how that benchmark is going to be used and the kind of performance features that will come out. For today's presentation, my name is Nicholas Wako. I'm a performance engineer at Dell and we run these benchmarks in partnership. We have a very strong and tight partnership with Red Hat and the manager in charge, Douglas Jacques Obar, popularly known as Shax, is going to be part of this presentation and he will also take us through a lot of, will give us some information on how he's using these benchmarks. So just one more. Right. So again, welcome to our session on, we're going to zero in on the spec cloud benchmarks. So benchmarks are used in the industry all the time for comparisons. They vary in difficulty on benchmarks that you can download and run immediately and get a result, which you obviously can time things on your own. But as you get to an industry standard workload, first of all, there are some complexities and it's all about a committee and it's not just Dell and Red Hat. It's a committee that meets. They agree upon a standard benchmark and that supposed to basically model what happens in the industry, you know, running essentially in clouds with spec cloud. So if you're not familiar with it, so Nicholas is going to step us through. But to do that, we have members of my team that work with Nicholas. We have other folks from the spec cloud committee here. So sit back and enjoy learning about spec cloud. Thank you. We are privileged today to have the chair of spec cloud committee, Mr. Salman Basel from IBM. And he will be, I'll be calling upon him to give us some insights as the presentation goes as to the design considerations that happened, the thought process that was going on and why we made the decisions we made. And then we're also privileged to have Joe Tolerico, who actually works with Sharks. And Joe is an ex-chairman of the spec cloud committee and he's been very involved in the development of this benchmark standard. Sorry to interrupt you, Nicholas. So I've been asked to, we have to take the stage if we can. So put her up here. And I guess this is being videotaped. And so with that, we'll... All right. Now I'm in the limelight. So they will be tagging in from time to time. And the other person who is not in the room now is Zach Fadika. He works for Intel. He's also on the committee, been very active, been involved in code development and everything. So anyway, you can ask questions at any time and they feel free. It should be as interactive as possible. Right. So we'll quickly go through the spec cloud benchmark. And then after that, Sharks will also take us through how all these performance tools are being used, especially in the measurement of the OSP cloud. Right. So spec organizations, I don't know, the spec organization is one of the performance consortia that primarily looks and develops benchmark standards. It's got a wide membership and literally every who and who in the computer industry is part of the spec organization. Dale, as a company, is a very active member of the spec organization. And basically, it's a performance consortium, develops and designs benchmark standards. Now, spec cloud has been in existence for over three years. And its job primarily the goal was to define benchmark, define a cloud benchmark standard. And if I'm not mistaken, the benchmark standard that has been developed is probably the first within the industry, other than industry standard. The other, they could be others, but they are proprietary. This is one which has come out from a consortium of various companies. And apart from defining the benchmark standard, one of the goals was to identify workloads that can be used to measure the cloud and also to determine the run rules. So basically, what this benchmark does is it is zeroing in on infrastructure as a service. It measures both the control and data plan. Control, you have things like provisioning time, how quickly it takes to create instances. And then there's workload performance. And then it's, and we will go through some of the things in detail. We will look at the workloads that are being used. These are workloads that already most of them are actually they're all open source and they resemble real customer applications. And the other thing that we have to note is that it benchmarks the cloud, not the application. And it produces matrices, elasticity, scalability, provisioning and others which allow comparison of clouds. So you can use these matrices to compare the performance of the cloud. So basically benchmark model is infrastructure as a service and basic terminology so that we can all be at the same level. When I'm talking of an instance, I'm basically talking about either virtual machine, bare metal or container. Now those are the three instances that are supported by this benchmark. Now we have a concept of a white box cloud and that a white box cloud is one where the tester has full control and knowledge of the underlying infrastructure. A black box cloud is one where the tester does not have full knowledge. It's actually the inverse of the other. And this is what you typically get in a public cloud and where in many cases all you know is just the billing information. And then you have an application instance which is really a set of instances that run together to run a particular workload. So again we will be going into more detail to look at to see what an application instance is but it's really a bunch of instances that work together to run a particular workload. Any questions so far? Good. Those of you who attended yesterday's session, we had a good discussion on this slide deck. It's a very high level view of the architectural pieces of the benchmark. On the right hand side you have a benchmark harness. The benchmark harness used is actually cloud bench also known as the CB tool. So cloud bench has drivers and it also has a report generator and at this time I call upon the chair of spec cloud to give us an idea as to why we chose cloud bench rather than all the other drivers and harnesses that we had. All right so thank you Nicholas. As Nicholas mentioned that the benchmark has been in development for some time. Definitely took longer than we anticipated. At the time we started the benchmark development. The spec wanted to pick up a cloud benchmarking tool that was open source and that could meet various requirements. These included not being specific to one cloud or not only specific to say open stack but being able to deal with other clouds that are out there. Being able to create multiple workload clusters such as Hadoop clusters or Cassandra clusters or maybe some web application clusters in different configurations and settings. So those were the reasons that we used selected cloud bench as the tool. In addition spec has a number of requirements to ensure that appropriate data is gathered so that companies can perform a peer review of the submitted results. So it's not just that one company can go out and say here is my spec number but the results have to go through a peer review process that ensures some validation and sanitization before the marketing division start advertising a cloud as such. Thank you Salman. So cloud bench is the benchmark harness. It creates or manages the creation of instances and then after that has drivers that run workloads or that measure the cloud and after the tests have been run there's a report generator which picks the performance data and generates a report and you'll see a copy of the report. Now on the right hand side you can see the system under test and we'll go into more details with that. Now this is an instance, an application instance and the cloud bench can support almost 20 or so workloads but when it up selecting only two and there is also a reason why we came up with two. Joe do you want to chime in on why we chose YCSB and K-Means out of the several? Yeah so we chose Cassandra and YCSB to be a database workload that could scale out pretty easily and then we decided on a CPU intensive workload so we went with the Hadoop cluster and K-Means. So we felt like that was maybe not representative of all workloads but it is a pretty good mix of workloads that are out there today. There's a ton to choose from so we felt like this would hit the network, the disk and the CPU pretty hard and it does. Right and so YCSB is the Yahoo Cloud benchmark if you're information and yeah it will scale out in clouds so. Okay so a YCSB instance application instance literally has six Cassandra seed nodes and then it also has a YCSB instance all together it has seven instances and then K-Means runs off Hadoop we are using the Intel high bench version and it has one name node and then five instances. So life cycle of an application instance first thing the driver initiates the creation or sends a message for an instance to be created and once all instances that are required to set up an application instance have been created a message is sent out and the which shows that the application instance is ready for business and then after that data generation is starts and data generation will depend on the instance if it is K-Means then it will be a K-Means data set if it is YCSB it will be YCSB and after that has been generated you will get a test the driver starts kicks of a test and once the test is complete results are picked and then the cycle repeats and the cycle repeats for depending on the phase of the benchmark if it's a baseline phase it will only go for five iterations if it is an elastic phase it will go for much longer and we'll look get we'll get into that. Salman you want to add on that? Okay fine so this benchmark has got two phases there's a baseline phase where these two workloads run separately one after the other and after every run I mean it will run for only five iterations and then after that it will stop so what it really does is to get to depict the performance features over the cloud when a workload is running on its own then the elastic phase these workloads run at the same time simultaneously and once they start running they continue running until the benchmark is stopped and the other thing that happens is that during elastic phase the submitters keep adding on these workloads one after the other until you hit a predefined number of application instances so the tester can set a certain number of application instances or time will come when quality of service breaches occur and we look into the quality of service breaches and once the QoS is in any way violated the whole benchmark stops I know this is a slide that many people ask questions on so anyone has a question or is it under are you can you all understand it? Yeah I mean one of the one of the frequent problems when you're trying to measure a cloud is in fact without this type of methodology it can you can easily run workloads that they start up and you think they're running simultaneously and you may get metrics that are like wow this is a really fast cloud it was able to do 10 things actually sometimes you get what's called super linear because in reality if if the cloud isn't actually servicing all of the VMs concurrently they're actually running staggered and so you can see from the chart here that in fact this methodology prevents super linear scaling to a degree and you know basically is a is a valid mathematical methodology here to effectively measure clouds in both of these dimensions elasticity and scalability yeah we have a question there yeah can you explain in more detail the stopping criteria okay and next slide yeah I'm going to cover that yeah right so stopping conditions if 20 percent of application instances failed provision the benchmark will stop if 10 percent of them have runs of any nature it will stop or it can also stop because you wanted only a certain number of application instances and other the maximum or if 50 percent of them have quality of service violations and that is throughput came in for instance if it runs longer than a period of time that is a violation why csb throughput if it goes below a certain threshold that is a violation and why csb read and write times latencies if they are longer than you know a certain predefined they will also stop so those are the stopping conditions are you okay with that thank you now benchmark card you had a question I have a question so do we have the same you know qos violation defined per each benchmark separately or is there kind of a common thing you know common criteria across the benchmarks for you know for violation this is for the workloads the two workloads yes and it affects all of them effects both yeah okay yeah benchmark the matrix we measure scalability which is really the aggregate aggregate amount of work that is being done and the it is normalized by by a number that was obtained during benchmark development and it answers the question of how much more work gets done if n instead of one application instance is deployed elasticity really measures how consistently performance is being maintained as the load is increased on the cloud and it's a measured as a percentage an ideal cloud would have a hundred percent ill-assisted score and for scalability the higher the score the better then we have others like mean instance provisioning time and then we also look at the AI run success did all application instances run successfully that's a percentage two and then the provisioning success we also keep track of that so what we have here is a very high level aspect of the of the report this report is also generated or generated by the report generator and we are showing the primary metrics the primary metrics are the ones in yellow as you can see there is a scalability score how many application instances we are running and then there is elasticity and then the mean instance provisioning time those are the primary metrics then you have secondary metrics on the little box on the right hand side and then we also pick things like when the when the benchmark was run and then the set configuration there is a it's a big report this is just the high level point any questions oh yes we capture that there in the report but just not shown here yes so so the industry standard benchmarks especially if you're trying to characterize cloud clouds in a in a level playing field it in fact the vendors are free to participate and and and vary whether it's servers core counts memory disk IO etc so so as these get formed if you look at uh go to spec.org and see some of the history of all the benchmarks that get submitted and there's literally thousands of spec cpu and there are hundreds of thousands I should say spec cpu and you know tens of hundreds as the benchmark gets more complicated it's harder to execute and pass all the criteria so so it really is up to it's an open benchmark so if you want to participate you know join spec and submit results and and again this review process is it about a two week review for yeah so there's a two week review that people look at the more detailed report here on have you met this and I'm not talking it's fairly I don't know how many exact pages do you have an idea Nicholas probably like uh eight to ten pages of the full detailed report right that's true well actually it's about four yeah right depends on yeah depends on depends on that so I think I think you can basically vary the hardware components yeah you can um you that okay good hi can spec be used to measure network performance well um well the network network has um how fast your network is yeah definitely is not measured directly but it helps to have very high yeah because if you see like when VMs are launched those could be on a same hypervisor on a different hypervisor on a same rack or a different like two different racks so how these normalization would be done like this network throughput was on same hypervisor others was on network throughput of a rack and other was network throughput between two racks so how this normalization is done so I'll cover a little bit of more the component level benchmarking and things I mean traditionally that there's a there's a whole set of open source open standard workloads like net perf or iperf there's a there's an open stack version called shaker that can measure network throughputs and and again that's all fine I want you to you know kind of use those tools to build your clouds and and configure your system so spec tries to measure what is the cloud performance at an application level and build a level playing field for a real application scalability elasticity and provisioning those are the three metrics in in the yellow here and again it wasn't just red hat or it wasn't just IBM or Dell it's a whole conspiracy of of companies that come together to try to build a cloud benchmark okay thank you just a second yeah so I mean you're asking how it's normalized so like when Cassandra the YCSB workload launches we're not pinning the guests to a certain host we're letting the scheduler determine where they land so they could land on the same hypervisor as you mentioned and they'll get great throughput right but if they're distributed across you know because the the RAM filter will put it based on which host has the most available RAM right so you could end up where you have a VM on across racks same rack or an even same hypervisor so we're not dictating where that lands so it'll be normalized through the results because the results will vary based on how that AI performed okay thank you quick question I see the tests are running for only 40 minutes is there a tweakable parameter to enhance the longevity of the test so that you can emulate like a user scenario or a workload so what you're seeing here is just the last 60 phase and that's what is shown here but before that the baseline phase takes a lot longer and again those times will vary whether it's a whether it's a public cloud could take longer for the same period but yes this is just capturing the last 60 phase and yes Salman is adding on so the time how long should the benchmark run right a minute 10 minutes an hour 10 days an year all of those are valid numbers depending on you know I think you know in order to get some you know there's also how long it takes to measure stuff in some of the tests that have been done on public clouds which are not shown here and you know it has taken tens of hours for the entire benchmark run and as long as the cloud can support the provisioning of additional instances and there is no violation of quality of service limits the benchmark will you know scale as the cloud infinity scales but if the cloud has certain limits I know maybe it's a small cloud or maybe it has some provisioning violations the benchmark will stop earlier so it really depends on the underlying cloud benchmark has no limitations on when it would stop it's really a function of cloud so there is a parameter where you can set run this cloud benchmark for this time we don't have that do you want to go back to the stopping criteria again yeah yeah time is not a stopping criteria what is the the stopping criteria is what I explained in the previous slide so you cannot go there and say run for 10 minutes or run for 20 minutes no it's not there you can set the number of a maximum number of application instances as you see on the top so you can set it to let's say 100 or whatever you know you can have just 10 application instances so maybe 70 instances total whatever so if you wanted to run a short time then you can say run for six I mean get me six instances that will definitely take a shorter time than a hundred or 200 right and so if you do reconfigure different hardware platforms you may have to spend some time to to size for this benchmark you know obviously probably start out with fewer instances and and ramp up unless you want to wait for a year and that's how we typically do we start off with a few then you keep adding until you get to a size that is good for your configuration okay good questions any others otherwise let's move on possible next steps Salman you want to take a stab at that since you are the chair what do you have what goodies do you have for us so I would perhaps say this open stack let's talk in the realm of open stack open stack has thousands of configurations in probably hundreds of configuration files probably less than 100 around 50 anytime somebody goes out and wants an open stack cluster they have to figure out you know which configuration is right which networking is right which hardware is right wouldn't it be nice that certain hardware or certain open stack configurations are certified as being run at or has having achieved some scale that is measured according to some standard be that spec cloud standard or some other standard so that anybody who's looking to deploy stuff and use open stack and just go go out and deploy I think that's that's an open stack and I think that should be considered in the open stack performance group in the in the context of the spec cloud you know some of the things that were not explicitly measured here and are the focus of the next will be the focus of next release and we will invite more participation is how do we measure especially in the public cloud settings how do we measure cross region performance right when application is deployed across across clusters there are certain operations such as migration such as live migration they should not explicitly measured how do we account for for those and some some workloads such as object storage workloads are not part of the benchmark so how do we incorporate that as as things go along and then you know this is the is benchmark so if you're deploying an open stack cluster and you know if you have one configuration you can run the benchmark and get a score run another configuration other hardware configuration get a score and see the difference it's a standard and then but you know it's an it's still an is benchmark when you go to platform as service benchmarks things slightly change and so what what would account for those past related platforms and how do we measure them in a meaningful comparable and repeatable way is still an open question okay and nothing more useful to add to that any questions before Sharks takes us through a deep dive on yeah so we run private cloud can I just download this and make my own figures and then tell my users that we achieve two-thirds of AWS or do I have to publish and be reviewed before the numbers are there is a process you have to get a license from from the spec organization then you'll be able to download the kit and then run and then the publishing the result again there are some rules that you have to go through and the but if you want the result to be published it has to go through a review yeah but you don't need to necessarily publish so if I just want private numbers I just need a license and download it and run yeah you can download it yep okay great yes yes okay um there is a hardware vendor by the way but we also um sell um a solution a reference architecture uh that so in a way when we publish these results we are highlighting how our hardware solution I mean how our solution compares with other with other vendors so this is a system a solution in the sense that you have hardware software and the whole reference architecture which runs this benchmark yeah so can this benchmark be used to measure public clouds absolutely have we measured yes many of them do we find surprises a lot um of course I cannot say anything in the in the in the due to confidentiality but you know there are public clouds who perform nice on provisioning suck on runtime and vice versa go down provision on runtime suck on provisioning you go to some esoteric instance types lots of provisioning failures you go and do maybe some different networking not so not such a quite result so it you know I think I think from what we have done in internal measurements and and anybody who is a spec member can can look at what has been done in the public cloud space and at least in the in the confidential realm you can see how this benchmark differentiates across public clouds but that information obviously for obvious reasons it's not necessarily public unless the vendor chooses to chooses to publish it after going through a peer review process okay thank you yeah and again I think the the value of spec cloud can also be used internally as as vendors you know support different certified configs they could basically collect some metrics against each of those configs and keep them in house if if they publish so it doesn't always have to be the top result or the peak result necessarily so you can use it like sizing so uh we don't have this was the main topic is is to to share what spec in industry standard consortia we're proud to work with Dell as a partner here when we not just on the committee but also in in running some of the workloads on their systems I do want to just share a few features of redhead open stack you know this isn't marketing this is technical stuff that you know hopefully keeps us competitive and a good choice as an open stack vendor I'm going to share with you some of the things we do I think some people asked about maybe some component level testing and and some other tools that you can use in the middle here if you're not familiar with red hat and red hat open stack we do run a service called tune d they're back basically try to install a profile of linux's wonderful rich swiss army knife of tunables but there's hundreds and now thousands of tunables right and you as a user and a vendor don't want to necessarily have to manipulate all those so we have profiles and I'll step you through that time permits we can share a little bit of what we're doing you know around NFE and essentially dbtk acceleration yeah so tune d profiles so been at red hat a little over 11 years director of performance we do rel kvm rev and open stack and with the acquisitions of some of our storage we also do sef for open stack sef storage scalability etc so what we do in each of these cases through our lab results running industry standard workloads running application loads we we help manipulate the linux settings sometimes the kvm settings and an open stack potentially some of the nova things and so if you if we go over to rel osp over here we do install a virtual host profile which basically means the host running your hypervisors on the bare metal essentially should have what we believe a good mix of the proper settings these are user mode settings you're welcome to it ships by default if you're on rail six you have to add you have to yum install tune d and you know I recommend there's there are other profiles out there like for NFE there's one called network latency and low latency profiles so i'm not going to go through the whole details you know some of the other just open source tools out there so you can measure your components you can measure your cpu you can measure your memory under stress you can measure your network and measure your storage devices under stress test and essentially you have to build kind of your own metric if you will it's not a single metric like spec cloud has done for us so you know at the cinder level there's actually a thing called self benchmark tool and another useful and new addition to google's perf kit is now they support running perf kit in open stack and that too allows us to to download benchmarks run individual benchmarks and then report back results automatically so we do have some papers one of the things we use these these tools is red hats a software company we do have some decent sized hardware systems some of the guys in from my team here basically we want you the end customer or the partner to be able to sort of do your own scale test to your own i personally believe these these component level tests are are mostly configuration tests have you set up your network properly are you getting the most you can out of your self tier you want to run at the speed of the hardware you want the operating system to get out of the way and you hope that the control plane and open stacks management again stays out of the way from you you're running workloads in your data tier so there's acceleration there's papers on on how how we scale up you know sender neutron etc just a quick detour because we only have about five more minutes so a quick example of provisioning rally is a is part of open stack it ships with open stack it's a great way to we have two minutes so you get two more slides a quick example as you scale out though is it actually automates the launching of the vm the build and the run you can put workloads in it but they get skewed essentially the thing that i commented on before that sometimes your results not all the benchmarks are necessarily running simultaneously but it it doesn't have a synchronization script right so uh with that you can find out where you start becoming CPU over committed with simple loads like lin pack you can see what happens when you start over committing your memory and all of a sudden if a hypervisor becomes so over committed your hypervisor will swap and without the right swap space it'll actually introduce oom kills uh on the network side uh we actually use cloud bench um internally to to again give us that synchronization point of benchmarks and that allows us to to to set up you know evaluate of vxlan offload engines etc um more recently shakers been added into uh open stack and some of the guys on my team are working on contributing back me try to help maintain it in the future my final slide and i've got a little bit mung the powerpoint got an open source guy i guess because i use uh liber office and uh anyway so but the point is you can run these uh essentially fio or different i o workloads inside guest the chart on the right was supposed to show uh up to 64 guests running with ephemeral storage and read and write on the columns that degrade in response time or degrade in iops i should say where if you put that on a on a seft here we can actually maintain the iops across there is a seft benchmark kit out there and with that i'm going to open it up for other questions again and so i do want to stress by the way all the results we did in our lab we actually have this lab called uh you know bagel so the big big a cluster lab is what that was named after uh and it basically is now being used by the seft team as well and they're uh they're del based hardware so we scale up to 96 nodes and usually up to 32 seft nodes so yeah we are going to have only one question because we are eating into the next session yeah you gotta give me the big hook yeah one other slide you mentioned about ovs dbdk is that part of the spec cloud or some other in the tool no no so this is this is outside these are other benchmarking tools uh i didn't get to be able to share the results for the ovs dbdk stuff but um in general we're performance engineering that we're again sharing that uh features become available in open stack so you'll have to see me see me afterwards and we can talk if they're specifics but same principle we want to run at the speed of bare metal so we're running over two million packets 200 million packets a second you know on a on a single uh Haswell base system today with uh it was six network network 40 gig network cards and things like that so these are the tools you use to optimize the various components and once they've been optimized then you can run your spec cloud benchmark and the industry standard benchmark for dbdk is called vs perf by the way so it's part of open NFE okay on that note thank you very much you've been a wonderful audience till next time bye