 You are welcome to this panel discussion, which we are holding on performance measuring tools for the Cloud. It's going to be a panel discussion. We are going to read through some slides. There are about four presentations. And we'll go through them very quick just to level set. And then after that, we should have enough time to have interactions and answer questions from the audience and also share experiences from you guys. So basically, we'll be looking at the kind of tools that we use in measuring the Cloud. I'm just going to introduce to you the panelists and presenters for this session. Douglas Shack Shoba, commonly known as Shacks, he's from Red Hat. And then we have Mr. Wang. His first name is a little bit difficult for me, but he's from AW Cloud out of Beijing in China. And then you have Mr. Yuting Wu, who is also from AW Cloud in China. And then we have Mr. Das Kamhut, who is from Intel. And Nicholas Wako, you are the moderator for this session. And I work for Dell. Now, we are not going to go through any further introductions just in the interest of time. But if you go to the OpenStack website on speakers, you will get the details of who these panelists and what they are doing. The agenda, again, we are going to be looking at Rally and Cloud Bench. And that will be Shacks who will be presenting that. And then we shall also look at Rally and dichotomy. And that will be Mr. Wu. And then performance analysis by HAProxy, which will be done by Mr. Wang. And then industry standard benchmarks that we'll present. Without much ado, I call upon Mr. Shacks to actually kick off the presentation. Hi. Good afternoon, everyone. Thanks a lot for joining us. So let's go through this rather quickly, as Nicholas said. So our team, I'm the director of performance engineering at Red Hat. And if you've been using OpenStack and trying to measure performance in general, there's a lot of everything we do at Red Hat is open source. So as we work with communities, Rally is the first one. Share some quick, brief, roll through a couple of examples. Cloud Bench is another effort that all of these, again, you can download, go to the URLs, pull the benchmarks down. And it's really important sometimes to measure these. So I'm going to concentrate on the first two. But we do participate in industry standard workloads that Nicholas is going to cover as well. Real important, spec-vert, spec-cloud, TPC, TPC-vert. There's special tools for whether it's Cinder or, more recently, the PerfKit toolset that Google open sourced. We're real happy to see that now that has an OpenStack plugin. And we can use that with OpenStack. And so anyway, next slide. Or would you like me to? Cool. So again, one of the things Rally is very useful for. So in performance engineering, we really want to run at the speed of your hardware. So you configure a system. You configure controller nodes with networks, with disk and storage. A lot of people say, hey, put it all together. And I only have single drives in each of the servers. And I'm wondering why my provisioning is still taking a long time. Rally actually allows you to do, here's an example of doing automation of provisioning. And it'll provision essentially the VMs and come back with a success failure. And failures usually means that just didn't, the VM didn't respond in a certain amount of time necessary on your given server. So if you run this yourself, this is very important to help you size your own clouds to figure out how fast you can burst provisioning out your clouds. And if you guys want to see this, my team, Taylorico and Alex and others essentially have papers on how to use Rally. So the cons for Rally is it's hard to synchronize benchmarks together. So it will automatically provision and run a workload. But if you're doing, like in this case, 24,000 VMs, not all of them are going to log in at the same time. So they're not necessarily all running concurrently. So next slide. And so here's an example of running a CPU workload. And as you over provision the number of VMs, you start seeing that, in fact, the average response time or the average throughput of the application drops. That's expected. And then if you continue to provision, you can actually run the servers on your cloud out of memory. And so it actually gives you OMM kills. Next slide. So another example with Cloudbench, we actually were able to. Cloudbench actually helps you orchestrate the workload so that you have the same. It basically has all the VMs checked in. And it allows you to put a workload in this case. It's Netperf. But there's actually several other examples. You can take your benchmarks or workloads and actually embed them into Cloudbench fairly easily. It's a bit of a difficult harness to use. It's out there. You can download it. But it allows you to then run multiple networks. In this case, three different packet sizes on the right-hand side. And essentially the sum of you can do things like VXLAN offloads and measure the effects or other hardware accelerations. Second tool that came out, I think, originally from Marantis, in fact, was the shaker tool that also does a great job at automating network performance. And my final slide is here on disk.io. So cover CPU, memory, network, and I.O. So those are the four fundamental performance food groups, if you will. And so here's another example of putting FIO in a harness. Joe Tallarico did this. And essentially measuring the difference between ephemeral storage and a CEF tier that had dedicated storage across end. And so the average number of IOPS per VM, you can see it's a lot different running ephemeral versus a storage tier. OK, thanks. Thank you very much, Shax. The next presentation will be from Mr. Wang using rally and dichotomy. Good afternoon, everyone. There are three problems I would like to share with everyone. This one is what Israeli two can do. The second, the rally and the digital to me, and how to the rally to act to project. Firstly, what the rally can do? Rally is benchmarking tool that automatically under new interface, middle nodes, open stack development cloud, cloud verification, benchmarking, and profiling. Many companies where users do this. For example, Muntice, IBM, and Cisco, Huawei, AWS Cloud, and so on. A rally can help us to automation test and CEI with the Jenkins, and also help us to cloud verification and profiling the cloud environment. When we need the open stack environment for debug the code, in most cases, we will use a rally that's developed to the DV stack. Secondly, the rally and the dichotomy. When using the rally, do performance testing, in most cases, often need to try a lot of time to find the performance bottlenecks. For example, a single NOAA API handle, sorry, a single NOAA API can handle withstand the number of concurrent create incidents. The dichotomy is to quickly help us find the performance bottlenecks. How to use this? We can see the demo. Firstly, you can need to set a master value, hundreds. Take the middle value, 50. Then if the test results that are accessible, in intermediate value between 50 and 100 is 75. Otherwise, the intermediate value between 0 and 50 is 25. Using the dichotomy by analogous, and you find the performance bottlenecks. Lastly, how to play the rally in an active project? The rally tool is to use a variety of ways. For example, CEI, Waze, or Jenkins, production of cloud. In all business environments, that's all component working in the HG Proxy. The NOAA can respond to one-thousand concurrent to create incidents. This is an advertised result. Former HG Proxy and Crackling testing welcome my colleague Wu Yuting to share with everyone. Thank you. Thank you. Performance analysis HG Proxy using AW Cloud. Good afternoon, everyone. What I will introduce to you is using HG Proxy to analyze the open stack performance. HG Proxy is a free open source software that is usually used to provide availability, load balance, or proxy server for TCP and HTTP applications. So in an open stack environment, the common practice is to use HG Proxy to provide load balance for each open stack API services. You can get this from the slide. And in addition, we use the capability to provide availability for HG Proxy. And since HG Proxy is located between users and servers, so HG Proxy knows everything about the request. So if you enable the login on HG Proxy, you can get more information about the request. So by analyzing the login, we can get much information about the server. And we can analysis the performance bottleneck about your services. OK, I will show you an example. OK. The upper part of this slide is HG Proxy log line. There are many fields, but we only need to focus on several fields. First, we need to focus on the ATC field. There are field numbers in this field. They represent the response time or your request at each stage. First, you need to focus on the TR. It means the connection. It means it's the time in milliseconds that present the time that waiting for the server to return for HTTP response. And TT means time in milliseconds that the server accept and between server accept and the last close. So if for this value is too high, high value, that means that there may be performance of your service. And that may be a bottleneck of your service. Next slide. OK. The next you should focus on is the settings field. There are field numbers in this field. We should focus on the FCE and SRN fields. These fields represent the connections of the servers when this session was logged. FCEoN means concurrent connection of the front end. And SRN means total concurrent connection of your service. Many services have max connection configuration parameters. If this value is close to or equal to the max connections, that means that your service is overloaded. And that may be a performance problem with your services. So from this, you can. So what I want to say is that you can access the HAProxy to know the performance problem of your services. That's all. Hello. Thank you. All right. Thank you very much, Mr. Yutin. Sorry, I'm going the wrong way. All right. The next and final presentation is going to be on industry standard benchmarks. And I'll make it through it very fast. And one of the reasons why that's going to happen is because tomorrow, me and Doug are going to have a more detailed session at one of the Dell breakout sessions tomorrow, where we will go into more detail and more fun. But nevertheless, industry standard benchmarks define standards that are used to measure and compare computer systems and solutions. Oops. Keep going the wrong way. Now, there are two major performance consortia. There are several others, but the ones that we are going to focus on today is PEC and TPC. Between the two, they have a membership that is literally the who is who within the computer industry. One standard that is coming out and that is going to be specific to measuring the cloud is actually coming out of the spec organization. It's called spec cloud. It's an IAS benchmark standard. This committee, the spec cloud has a committee that has been in place for about three years. Today, we are actually lucky that one of the, actually the chair, the current chair of the spec cloud committee, Salman Basiz, is here. Salman, if you can, he literally did most of the developing and coding of this benchmark standard. And actually, I don't know that Joe Talerico is around. He works with Shack. They did a lot of work in developing this benchmark standard. Now, what it does is that it measures the performance of an infrastructure as a service cloud. It measures both the control and data plane. It uses workloads that you could look at and say they are realistic, mainly used. In the cloud. And it produces the kind of elastistics, sorry, the kind of metrics that you would expect for the cloud and those which actually can be used for comparisons. This slide is a very high level view of the pieces that make up this benchmark. On the left hand side, you have the benchmark Harness. The Harness has got drivers and a report generator. These drivers initiate the creation of instances on the cloud under test. The cloud under test is what you see on the right hand side. What they also do, they have drivers that create workloads which test the cloud. There are two workloads that are being used in this benchmark. There is K-Means on Hadoop and there is YCSB. Those two workloads are used in the two benchmark phases of this benchmark. The two benchmark phases, there's a baseline phase which where these workloads are run separately and individually and then there's the last 60 phase where these workloads are actually run simultaneously and the drivers keep adding more and more workloads until the benchmark stops due to some conditions. When the benchmark stops and ends, there's a report generator which captures the performance metrics and generates a report. Now, this benchmark has got a notion of testing configurations. There is what they call a white box cloud. A white box cloud is actually one where the tester is in full control of the cloud under test. They know what is under the hood and they can define it to a specific level. That is what you would commonly find in private clouds. Then there is a black box configuration which is what you would typically find in public clouds and in this case, the user or the tester does not have knowledge or control of the underlying reference architecture. What they normally have is actually just the building information. The two workloads that we're talking about, we have YCSB that was mainly introduced because it simulates social media and what runs in the background is the database is actually is Cassandra and then there's K-Means which runs on Hadoop, very CPU intensive and the committee chose Intel Highbench for various reasons. So what is measured? Basically, this benchmark measures the number of AIs that can be loaded onto a cluster before SLA violations occur. It measures the scalability and elasticity of the cloud under test. It is not a measure of instance density. There is several other benchmarks in SPEC and in the TPC that can do that very well for you. This particular benchmark is devoted mainly on scalability and elasticity but these individual workloads can actually be used to stress the cloud under test. If that's what you want to do, you can use K-Means to stress your CPU and memory and you can use YCSB to actually generate enough IO to stress the IO and the network. So like I said, at the end of every elasticity test you have a report and that report literally gives you the primary metrics. The primary metrics of this benchmark are the ones that are actually in yellow. You have scalability and the number of instances at which you are measuring that scalability score and then you have elasticity and then you have the mean instance provisioning time. Then there are secondary matrices that also displayed in the little box on the right and like I said, tomorrow at the Dell breakout session we will go into more details of what all these numbers really mean. But for now, this was really meant to be an interactive session and without much ado we'll request for questions, comments from the audience. What you do is capture my eye if I can see you. I'll get you the opportunity. You can ask a question which will be answered by members on the panel or the answer could also come from the audience. So for the next remaining minutes before lunch who is taking the first shot? We need some. Thanks, so the challenge out to you folks is again, these are open standards. So more and more we're looking for other ways to, I mean, so the last two are industry standard workloads. Those should level the playing field. There is a review process by the consortia to make sure that the benchmarks were conducted fairly and accurately and in fact, the vendors police themselves. In some metrics the TPC actually has price values in it but the other open source projects, I mean more and more we're seeing vendors like Intel recently shared their public cloud I guess their Intel Rackspace offer a sign up. So since maybe DOS can comment about the Rackspace Intel effort, I don't know. Actually maybe just for a start with the question. Do any of you guys have performance issues in your open stock environments? Everybody's is perfectly fast. Okay, so who doesn't? That's a good way to start. All right, let's go to the question over there. So confirming, so the most those benchmarks like the TPC one we can publish in a kind of a peer review type of approach unlike like VM mark, which you couldn't really publicly state. Is that correct? I think it's on but it's not on. It's just quiet. Oh, there we go. Yeah, so again, feel free individuals can, you have to join the consortia or you can actually, there is a fee to download the benchmarks. You can run them privately. In fact, we use the internal, the industry standard benchmarks to sort of beat up our regular operating system, our rev and open stack now more and more. So the consortia is we're trying to build a standard that vendors can actually publish again. So Red Hat, Dell, I think everyone here on the panel were not about trying to make a proprietary benchmark. And I think what we should start saying since it's an open source community is everybody sharing. So we didn't talk about Google Perfkit much but the whole purpose of Google Perfkit is to expose the reality of everybody's running clouds. And it's not about like standing up a fake synthetic cloud. It's actually about taking workloads that we all run today and hitting against a running production cloud. Or if you can create a synthetic running production cloud, that's pretty important. But the expectation we would like to see is everybody sharing publicly. What's the good of benchmarks if people aren't sharing? Like everybody said they have a performance problem but like who's actually partnering together to fix it. Is there anything that actually tests higher level things like Cloud Foundry or Docker type of workloads? Well, most of those benchmarks aren't designed for Cloud Foundry. Cloud Foundry has various tests. Perfkit is designed to test container workloads as well as FIO, as well as Cassandra and various things. So you'll see plenty of other tests happening. It's just what are you testing? Like startup of something or you're actually testing a workload? Well, mainly I want to repeat the process so that I can make sure that if I change something in the lower systems that I have a known number in my Cloud Foundry or my Docker type workloads. So I know that if I'm doing 100 of these units per second or whatever that number is and I make a change from like EMC to NetApps or someone else's storage or I go scale IO or I go Cisco switches. I can make sure that that didn't negatively impact that number is what I want to be able to share that way I can prove my customers which are my business units that we are actually improving the system when we actually have a known value that we're actually competing and comparing against. And so I think that's really important to be able to do that. I think the rally and Intel's high bench and Cloud Bench are out there to give you a kickstart to do that for free. I think Perfkit has advanced. Really excited to see at the beginning of the year Perfkit will essentially build a harness for you. It does support OpenStack now and you can select the benchmarks you want. If you can run spec for instance in Perfkit you still need to get a spec license for that work. Let's see, you kind of have to build your own. I gave you a couple of examples of CPU memory network and IO but kind of build your own little regression kit and also some of the rally examples that these folks had. It's not just running the load but you can inject problems. HAProxy, you can debug the problems. So we're not gonna be getting into monitoring and that's a huge field. There's lots of talks on that. But we are looking for, are we getting it right? Can we hop on? We recently did essentially not rent but we borrow Intel Rackspace, 132 nodes, scaled up things and there's a sign up so contact Intel on that. Thanks for saying that again. Yeah, so quickly on that we actually Intel is funded in two 1,000 node clusters that are designed for scalability testing. Obviously it's not gonna solve your problem but we can talk about your problem a little bit more. But basically anybody that's pushing upstream code moving it forward for performance and scalability you can sign up to the governance process. Red Hat did a run through. There's a bunch of people doing runs through but basically how do you really prove scaling? But to go back to how do you test that changes aren't affecting your performance. I'd highly suggest that people use like a CICD process for your infrastructure. Hopefully you have a non-prod space that looks somewhat like your prod but even when you go into prod, I mean why not do AB testing but you can run something like Perfecate, you run spec, you can run that and actually see what was the change. Did it have a functional change as well as a performance change but that should be pretty standard practice for running a cloud to see what's going on reality. The reason why I wanted an independent one was to make sure that my applications over time will change so I can't use last year's number to compare this year's numbers. I wanted a independent work load that I could generate and I could compare those numbers unlike VMMark where I couldn't legally compare my numbers to another company's numbers to see if I'm doing something right or wrong as far as the way we built something. Yeah, so I just encourage maybe Cloud Foundry folks to pick up some of these industry standards, share the loads, independently check. Publishing is, each vendor has their own little rules on that and so you'll see Red Hat, we're pretty much open so we don't have ULAs that prevent you from publishing and things like that, so not mentioning any names but other people do have ULAs that don't allow us to publish benchmarks against us. Anyway, so I do think it's important for regression testing, I think each vendor's software should be reasonably regression tested and in these CI environments, in fact, there's some additional tools we hope to open source that we're developing that kind of wrap some of these tools into more of a CI automated way of every time there's a release, not just for our own testing but we'll open source that so you'll be able to pick that up in the future. Okay, do we have any other questions? Comments, experiences? Okay, so everybody said they're having performance problems. Does anybody want to just call out how big are your clusters getting when you start running into performance problems and if you were to pick one thing for the community to fix, what would it be? Anyone? No one? You guys love it, performance is perfect. That's fairly cool, I mean another whole dimension so we shared some data plane level stuff but measuring the control plane of OpenStack, I think that's kind of what rally and some of the other efforts that are built into tools and they actually now ship, it's a full project rally for instance but putting pressure back on the control systems and actually making sure vendors configure the control plane properly for 10 gig networks and don't just give it the same, your computer here might be this tiny, well tiny but smaller nodes and essentially how big do you want to make these others? So again we have a few papers on that on how to use these tools, happy to share that with everyone but it's also important to try to force errors and problems. I don't know if you guys have experiences you wanna share from Premier Cloud? They use rally a lot. Yeah, okay guys, I think we are almost there and yes, Salman. Thank you. So I guess one particular problem that everybody is running into is as the scale in OpenStack has, I don't know, thousands of configurations. You do permutations, this is probably in gazillions. You configure some hardware, you don't know how it works so it will be really useful to have hardware configurations and OpenStack configurations that can be certified to say they run at this much scale where scale is defined at a certain level. I think that's something that's not quite there. I think Intel's Big Cluster is a step in this direction but we need to have this kind of configurations hardware certified at scale so that somebody, there was a talk earlier by Volkswagen who was setting up Big Cluster. They ought to be able to buy the right hardware which is certified, fix the configurations and be ready to go rather than doing the performance testing on their own. So I think something to think about as we go towards scale and perhaps for the next OpenStack summits. Yeah, so the purpose of those 2000 node clusters is to expose everything that we do. So anybody that goes into it, they have to commit to share all their results. That includes configurations too. So basically what we'll start seeing is more reality. Like I've heard some people say they run 1000 node compute clusters with OpenStack. I don't believe them because it probably doesn't actually run anything or it's like actually all a bunch of regions. So the whole purpose of these clusters is to really show reality. If you come in and do a 200 node cluster test, we want you to show the scalability, how did you pull it off, what's the configurations that you did and it should make it a lot easier. We're not actually doing certifications. That's much more complicated like legal problems but fundamentally open up what's actually happening and what these tests show. Yes, Arkady? Arkady. Hello, quick question. So right now you pointed out various different benchmark on both infrastructure of the OpenStack itself as well as application. So what happens when both infrastructure as well as applications start moving from the virtual machines to the containers? What do you expect will start happening with a benchmarking methodology to accommodate that? Well, I'll give a quick first perspective but feel free to join us, Salman. Absolutely, we hope we've designed the benchmarks so that they're almost neutral. All these, most of these benchmarks we're running are actually bare metal benchmarks in the first place. And so as a benchmark team, we're constantly running bare metal. Four or five years ago, everything we ran, all various applications, much deeper suite of applications than shared here. Databases and stock exchange loads, et cetera. We were on bare metal and then virtualized with KVM. And then now we run them the past two years, we've been running on atomic containers. So shouldn't require any major surgery, especially if you're running containers in an OpenStack cloud or an OpenShift on OpenStack or whatever container technology you're gonna do. We hope there's very little or no change. So if you invest in some of this upfront or back into your own private cloud area, hopefully you have an investment that you can move forward and the Cloud Foundry example, for instance, we believe if they have, we're trying to run these agnostic to the implementation. One change we've seen though is if you do performance TCO with virtual machine units, obviously when you switch to containers in bare metal, you have to rethink what's your normalized unit. So a lot of people do analysis and say, this many virtual CPUs equates to this amount of performance and this much TCO. Once you switch to bare metal, that changes, that's one thing, but that's okay, that's just a new math problem. And then the second one is, you'll definitely see a difference in IO because when you go bare metal, you have a little less IO issues that the hypervisor introduces. Okay. Yeah. I will just add that from the perspective of spec cloud IS 2016 benchmark, there is no restriction on the type of instance. So for example, the instance can be a bare metal, a virtual machine or a container. And it's up to the cloud provider to test which one to use and then you can measure the results and compare the difference. So that's exactly to address that point. Okay. The one with the last question so that we can complete right on the dot of time. I think it actually is lunchtime. So we can just say thanks. All right, thanks. Thanks for coming. And again, tomorrow from two, I think probably in the same room, we are going to go into more detail on the spec cloud benchmark. Thank you guys.