 Hi, today I'm going to tell you about storing continuous benchmarking data in Prometheus. So when I ask you, how do you benchmark your systems today, what are some answers you would give to that? Well, when we ask people, they usually talk about either what they benchmark, what kind of data they use for the benchmarking, what kind of load they generate, where they benchmark, whether it be on a local laptop or on AWS or somewhere else, and how. So what kind of automation do they use? Do they use the CI-CD system, do they use Kubernetes, do they use Publisher? But today I'm going to talk to you about the question of why. Why are the benchmarking results that you see, the benchmarking results that you get? And we actually have some non-benchmarking tools to determine this question. In the developer world, we have profiles, whether they be heat profiles or CPU profiles. In terms of operational systems, we have tools like EVPF or observability data. But the problem with all of these tools, the first problem is that they work on one component at a time. And that doesn't work well in the cloud-native environment, where you have a lot of microservices working together. And all the performance problems actually arise in the interaction between components. And you will never see this if you look at one component at a time. The second problem is that these tools are not integrated into your benchmarking flow. And so often what happens is developers get into the cycle where they make a change, they run a benchmark, they see that something is slow, and then the next thing they do is they try to reproduce this locally on their local machine in order to use these other tools to get an idea of why the performance is the way it is. Then they analyze this data, they make code changes, and then the cycle starts again. Right? And so the question is, can we reduce this friction? Instead of having to run benchmarks, then download everything locally, and then run benchmarks again, can we do something smarter? Well, I wouldn't be giving the silk if the answer was no. So the answer is yes. And the way you do this is you integrate Prometheus into your benchmarking setup. So why am I talking to you about this? Well, I work on a system called Prometheus Scale. It's a remote write system for Prometheus. So it's where you can store your Prometheus data. And we benchmark this system using the techniques I will tell you about now. So how does the architecture I am proposing look like? Well, you have your benchmarking setup where you are benchmarking the single system. You have a load generator that generates the load onto the system. And then you send the data back to Prometheus, which can then visualize it with Grafana. If you have multiple systems, then you just have either the same or different load generators apply load onto your two systems, and they both send data to Prometheus. To get a little bit more complicated, if you are running these systems on two nodes, that's also fine. And you could get additional node metrics using a node exporter, also sending data to Prometheus. So now that we can see that in your benchmarking setup, in addition to the traditional benchmarking results you would get from your load generator or from a measurement tool, you also get application metrics for the application on the load, application metrics for other applications in your setup, as well as node metrics. So now you could start asking much more complex queries. You could ask end-to-end questions about how does the system perform kind of the overall throughput or latency of the system, right? Or you could ask infrastructure queries like how much CPU or RAM is the system using while performing this benchmark. This allows you to intend and answer queries about, given the certain amount of load, how many resources do I need, right? The second advantage of this setup is that you can store your results. And you do this by splitting the setup into two components. One environment is restarted for every test. That contains your systems on the load. And the other environment, you just keep running as a long-lived thing, right? And over time, you will just get the results for new runs into the same Prometheus server. And so this allows you to track all of these things I previously talked about over time, as well as insert information about each benchmarking run. You could get information about benchmarking parameters. The parameters you provided to your load generator, as well as application parameters. Now, if you've ever done benchmarking for a long enough time, you'll know that coming back to a benchmark after a month, you already forgot exactly what CLI parameters or other kinds of parameters you provided to it. You also forgot which exact version of the system it was that you were benchmarking. With this setup, if you expose this information through simple Prometheus metrics, you now store it side-by-side, and this data can now get out of sync. In addition, you can perform easy historical analysis, not only about how fast your benchmark ran and how much load it could handle, but also about things like resource usage or other types of metrics. For example, you could track the catch hit of your system over time as your system evolves. You can also do this using the Ropana panel, so then you will have nice graphs over your system as you continue developing it. Another advantage of this kind of setup is that you reuse your existing observability setup. So if you think about it, the question of why is fundamentally a question of figuring out what's happening inside a black box. You have a system that is running, and you have to figure out why it is. It's running as fast or slow as it is. The inside here is that this question of why you need to answer both when you are benchmarking your system and when you are operating the system. The same kind of information that will give you this answer in an operational setting will also be useful in a benchmarking setting. So if you've already built an observation system for operating your system, you can reuse it in making inferences about your benchmark results. You can reuse a few components here. One is the metrics that you instrument your system with, and also the Grafana panels and the method of analysis that you have already developed. And so you get into this kind of dichotomy. If you have good observability signals already, then those signals will give you a lot of insights about what's going on in a benchmarking run. However, if you don't have that, this method will force you via benchmarking to add additional observability instrumentation into your application because you were needed to figure out what's going on. And so this creates kind of this nice cycle where improving your performance also improves your observability. And so this is kind of a positive feedback loop, if you will, where metrics improve performance, improve metrics. And so in summary, this approach has three main advantages. One advantage is that you get kind of historical analysis and storage for free. You don't need to deal as you often would with Excel spreadsheets where you record your benchmarking results. You get a view of multiple components at the same time. And this will also force you to improve your observability signals, or the observability signals will give you insight about what's going on. In the end, you get a nice end-to-end view of the system as well as an answer about why. And so how do you get started using this kind of a technique? Well, it turns out that Prometheus uses the same technique itself. So what I would recommend is starting to look at a tool called Prombench, which will spin up an entire Prometheus cluster with some load for Prometheus. It loads the data, monitors the cluster. This tool is already integrated with GitHub CI CD, and it will show you how to do this in your own setups. But I believe that this technique is really just the beginning because observability is more than just metrics, right? It involves logs, it involves distributed tracing. And so if you take this idea and you add other observability signals in, it will improve your understanding of your benchmarking results even more, and this is actually the vision of Prometheus where you can put all these signals into the same system. Right now, it supports Prometheus metrics and open telemetry traces, but more signals will come in the future. And that's about it for the talk, and we have a booth outside and we are hiring, so if you are interested in working on these kind of systems and developing them, let me know. Awesome. Thank you very, very much. And by now, you all know the drill. Any questions? Raise your hand or even stand up. No? Yes. So with benchmarking is there, do you have a preferred tool for generating almost like reports or like artifacts that communicate the inputs and outputs of this load testing that you've been doing? Well, the great part is that you would just, all of the inputs and outputs just become Prometheus metrics. All you have to do for inputs, say, is you instrument your load generator to generate Prometheus metrics. Those Prometheus metrics get stored in Prometheus, and you just create a graphano or whatever kind of panel to take a look at it. For outputs, and this is, I think, the kind of thing that's not always entirely clear. If you think of old-style benchmarking, there's one result. But in actuality, if you look at these complex microservices-based systems, there's no one output. The whole behavior of the system is the output. So yeah, you could measure something like request rate or latency, but it could be a much richer set of outputs, including how much RAM, CPU was used, et cetera, et cetera. All right, thank you very much. Any other questions anymore? Bjorn, can you check Slack for me? Nothing on Slack. Any more questions here in the room? Then we were very quick, and that means we get an unplanned coffee break for 15 minutes. We reconvene at four. At five to four, I would ask Metal Matze to get his microphone.