 Hi everybody, so we're going to talk about OpenStack BrowBeat, which is a performance and a little tool for OpenStack. So I'm Joe Tellerico, we have Alex Kross, and PsyCynderMillenni. I'm going to skip the agenda for time. So what is OpenStack BrowBeat? I'm going to go over what it isn't. It's not a new workload. We use the best of breed of OpenStack workloads today. So we have Rally, Shaker, and Perfkit. We take metadata about your overcloud. We use TripleO as well. We don't need to use TripleO. We can use other installers if somebody would help contribute that. But we take metadata that describes your overcloud, combine it with result data, and ship that to Elasticsearch. That way you can compare your cloud's performance. We also have plugins for Collectee, Graphi, and Grafana, which we'll go into as well. So I'm going to talk a bit about the workloads we have. BrowBeat is essentially an orchestrator for the workloads, too. So we have Rally. It does most of our API performance testing, like boot instances, create a bunch of networks, routers, and as such. So for the data plane, we have Shaker. We can do throughput, latency, request response, TCP, UDP, and all kinds of topologies. So Shaker is going to spawn all of that for you, and BrowBeat is the one orchestrating it. And we also have Perfkit Benchmarker, which is a project open source by Google, and it has a bunch of workloads, 40 or odd workloads that can be pulled in easily. And they're already pulled into BrowBeat, and you can just make use of a simple YAML-based config file to run whatever benchmarks you need. So this is the BrowBeat Quickstart. So you can get started in less than 10 steps here. So you basically get Cloner, repo from GitHub, go into the BrowBeat Ansible folder, and there's a shell script we ship that is going to generate the host file for you. So you need the host file. You need to know what the undercloud is, what the controllers are, what the computers are, because some of our Ansible playbooks depend on that. And then you're just going to basically edit some of the group variables like your DNS and whether you want to collect D on your nodes or not in the all.yaml, and then you just kick off the playbook to install BrowBeat. And to set up the dashboards or the monitoring for live monitoring of your cloud, you basically just kick off two playbooks that's one is to install collect D on all your nodes, and the other one is to generate the dashboards and upload it to Grafana. And then you go into the performance testing. Once you have the monitoring set up, you basically just do a dot b dot p y and either rally, shaker, or perf kit and the workflow. So we also have some playbooks that have checks like the number of file descriptors or max connections for MySQL and assets. So you run the checks playbook. It'll dump out a file of all the errors and things that you should be looking at in the box. You run our workloads. That is the stress test part of it. And all the results are shipped to Elastic Search. And Kibana is the front end to it. So you analyze all your results. So based on your analysis, you tune the cloud and that's the workflow. Redest your cloud, analyze, and tune. I need to go over to Alex for this part. Yep, hello. So I'm Alex Cross. I'm generally known as Mr. Dashboard because I like to put together all of our graphs on Grafana. So you probably heard Sai mention there's monitoring aspect to it. So we use collect D as the agent that we install on your overcloud nodes. We even install that on the undercloud. So your undercloud, your controllers, your compute nodes, and your seph storage nodes. And then if you have object storage nodes or block storage nodes, you can install collect D on that as well. It ships metrics over to Carbon and Graphite. So I'm not going to get into detail on that because it's a large project. But that'll store your metrics. And then Grafana will be able to talk to Graphite's API and expose those metrics in a beautiful method. So the analysis part is really just digging through the dashboards and looking at the actual, you know, looking at where your bottleneck is. So next slide. So here's some examples of the dashboards. So we have several different types of dashboards. Some of them are per cloud. So this is an example of the per cloud dashboards. So I basically generate dashboards for, as I like to call it, the performance food groups. So we'll have all of your CPU metrics for each of your controllers, for example, with that dashboard there. Or we could have a dashboard with all of your memory. We have a dashboard with all of your disk IO. So you can see all of those metrics right there on the dashboards. If you don't find a very specific bottleneck there, let's say you didn't see SolometerCollector eat all your memory or something like that. Or you didn't see whatever benchmark you run consume all the CPU you have. You might want to start digging in a little bit deeper. So we have a general analysis dashboard where you'd select what cloud you're looking at and what node you're trying to look at. And then there's all these, these are all unexpanded tabs with literally hundreds of graphs. And each of them are very service specific. So you can go into the per process section and you can look in, you can dig in on Keystone or Nova or Neutron and try to see what is using all your CPU, what might be using all your memory. So there's other varieties of dashboards too. This one I've kind of uniquely put together with some of the data that we collect. I can show the instance distribution of some of the testing that I've done. So this is some large scale testing that I've done with telemetry when I was trying to scale up to 10,000 nodes. I finally got to it at the end you could see there. You'd have to go back and look at my other talk to see whether or not we had successfully collected on that. Here's another dashboard. You could see we can actually graph the latencies out of Apache. So we'll graph the max 99th percentile average and min latencies as well as the get count or the count of the requests and we'll separate it by request type. Last thing I want to mention, there's lots of other dashboards that I put in there. There's lots of plugins that I've added to our collectee config. There's just hundreds of metrics. We even try to simplify logging. I capture, for instance, this bottom dashboard there, an example of capturing info-worn error messages so you can quickly see when you see the cloud. If you benchmarked it too hard, you can find what service went down first and then start to dig into the errors. Because personally, I don't like jumping into every log file and start searching for errors and then looking at timestamps. So I'll just go through a bit of the workflow here. And this is specifically about the results storage retrieval and analysis part, which is the most important part if you wanted to performance analysis. So you basically run BrowWeet. You run the workload you want like Rally or Shaker or PerfKit. And there's a config option for Elastic Search. If you want to send data to Elastic Search, you basically turn it on in the config file. And what this does is it kicks off Ansible. Ansible goes into all of its controllers, computes, gathers system facts like kernel, hardware. And along with that, it also gathers open stack specific parameters like how many Neutron workers you had, how many Nova workers you had, and stuff like that. So that is dumped as metadata. And you get result data from the benchmark. So what BrowWeet does is it munges the data. It combines the metadata with the result data. And with our Elastic Search connector, it ships it to Elastic Search. And Kibana is the front end. So you'll be able to do all your slicing and dicing of data and visualizations through Kibana. So we talked briefly about metadata. I just want to highlight why metadata is important here because it actually adds value to the result data. Because if you have a result and you don't know any of the test parameters or how the environment was set up at that particular moment, it's not going to really mean anything for you. So I don't know if you can clearly look at what is here, but basically filtering on Neutron API workers. So Elastic Search is only going to show you all the results that had only Neutron workers set to 32. So it'll also help you validate your tunings. This is much better than hundreds of spreadsheets which you have to work on. So this is pretty neat here. So continuing on with dashboarding, so we don't want you guys to have to create your own dashboard. So within Browby, we ship dashboards for you. We have an Ansible Playbook that would actually install the dashboards for you as well. So we have a performance CI dashboard and we have a generic dashboard that you can choose specific scenarios and rally specific concurrences and times and narrow in on the data that you want. If you're specifically looking at Neutron, you can just gather on that. And as I alluded to, we have a performance CI that runs on the nightly triple O builds. And this is publicly available to you. Unfortunately, I did not put a link in here, but we'll add the link before we send the slides to OpenStack Organization. But anybody can go to this Kibana dashboard. You have to click a gotcha or capture to get in. But you can start looking at how different versions of OpenStack are performing. Specifically, if I wanted to point out something here, we found an issue with Neutron where a commit got in and all of a sudden the router create and router interface add spiked up about two weeks ago. So now we're investigating that. So we never have caught that previously just running rally statically. We saw this from trending data. Thank you. And does anybody have questions since we have time? So Kibana is going to be used for our result data. So everything rally, shaker, perfkit will ship all of the results to Elasticsearch and visualize with Kibana. Grafana, collectee, graphite is for system data. So we're not using, probably it's not using today for logging data yet. Yeah, so to kind of, I guess, further clarify, we use Elasticsearch and Kibana really for result driven data. That's the result data that we're getting out of rally, perfkit, shaker goes right into Elasticsearch and we visualize with Kibana dashboards. With the Grafana dashboards, we're visualizing the data that we're collecting with collectee. Some of that data is just, the majority of it is just system performance metrics. There is some application level metrics like you saw I had Apache response times in there or request times. So the question was, is this connected to any kind of VNFs or workload data? So today, no. We are working with Pbench. We do have a commit for Pbench with Pbench Uperf. And there is Pbench T-Rex and Pbench Moongen coming out where we could orchestrate a building of like a packet forwarder and capture that metric and send it to Elasticsearch and be able to compare if a kernel changes or if a OVSTPK version changes, we capture that already in our metadata. So today, no, but eventually, yes. If you have workload you would like to integrate, we have a model for plugins in Browbeet. So if you had like a, like we heard today, SIPP, if you wanted to create a server client SIPP thing, you could add it. Oh, totally open source. Any other questions? Thank you.