 All right, are you ready? We're gonna do some fighting here. So I guess you have to pick and choose your best fighter here, because we are welcome to this session. The telemetry show down. It's Fran Bitt versus the open temperature collector. So before we start, let me introduce myself. My name is Henrik Rexett. I'm a cloud native advocate at Dianatrace. So I've been joining Dianatrace since three years. And when I joined Dianatrace, I had the chance to join this fantastic community, so the Observerity community. And I'm pretty much an Observerity geek, so that's why I started a YouTube channel called Is It Observerable? So it's out there, check it out. And yeah, otherwise, most of my career I spent it doing performance engineering. So testing, breaking, tuning, having fun. And that's why I'm still, performance is in my heart. So that's why I still produce content for performance engineer on the YouTube channel called PervBytes. All right, enough of speaking of myself. Before we start, I have a small disclaimer. So first of all, to prepare this talk, no birds or no telescope has been harmed. I was really gentle with both of them. So keep that in mind. Second thing, this talk is not aimed to blame any projects at all. The idea of this talk is to give you some tips, some numbers, so that it will be easier for you to pick and choose which agents you have to select. And most of all, I love the Fluent community. I love the Open Temperature Committee. So here it's more about helping all the committee to provide better agents, better support, and so on. All right, so let's select the fighter. So we have two fighters on the left side corner. We have Ryu, if you remember that game. So he comes from the Fluent Dojo. And Fluent D was there out there. Fluent Bit was released, mainly very popular in the logging space. So very, it has proven itself. And recently with Fluent Bit 2.0 and 2.x, in fact, it started to support metrics and traces. So very interesting. And on the right side of the ring, we have Ken, which is coming from the Open Telemetry Dojo. So of course it's the Open Standard, you all know about it. And they provide a component called the collector. I'm pretty sure all of you are aware of the collector. First, we're supporting two ingest traces. And then once metrics are supported, is supporting metrics. And now recently it's supported logs. So the big question is, okay, so both is supporting metrics based in logs? Which agent should I choose? So this dog is here to help you for that. So we're gonna have several rounds in that fight. And we're gonna compare a few things. So first we're gonna compare the design experience, the plugins that are available. And then we're gonna jump into details on the log pipelines, what are the features that we have from both ASides, metrics, and then traces. And last, I did a lot of tests. So I'm gonna share a few numbers. And yeah, the idea is to figure out who is lighter into the resource consumptions. And at the end, we will share at least the conclusion for you. And again, flinbit 3.0 as released today. And now we have more features. So this is a benchmark with flinbit 2.x. So keep that in mind. Probably with the 3.0, the numbers will be different. And the conclusion will be different. All right, so round one, design. So both agents has the configuration file. So once you load the agent, you have to load this configuration file. It's the pipeline. And for each pipeline, it's gonna be the same thing. You will have to define what we're gonna receive. So input in the flinbit wording and receivers in the collector. Then we're gonna process. So process means modifying the data, enriching the data, dropping data, whatever you want. So in the collector, it's named processors. And in the flinbit world, it's named sparsers and filter. And then last, once we have done our job, we wanna send it to ObserveBackend. So we're gonna use an exporter in collector and output in the flinbit world. One important aspect is that since flinbit 2.0, they have introduced this notion of processor, which is also being part of release 3.0, where when you receive something, you can have in the same thread when you're receiving, starting to filter. So you can already drop, rename, very efficient. So multi-thread approach. And you can do the same thing on the output. So just before sending out your data, you can do some last minute change regarding the backend that you are targeting. Now, when it comes to the pipeline itself, the collector, of course, supports all the signals. So you can do whatever you want in the traces, in the metrics, in the logs. And we should expect the same once continuous profiling is out there to do also some transformations. In the flinbit side, in flinbit 2.0, at least, to the X, in the log, you can basically receive, parse, filter, and then export. But when it comes to the metrics and the traces, it's more like a proxy or a gateway where you receive something, you're not able to modify anything, you just send it out. So that keep it in mind because you will see that a lot of things, the constraints that we have through it is mainly related to this point. Now, in the flinbit design, the way you design pipeline in flinbit is quite convenient. I mean, that's my opinion because when you receive something, you can tag. So, oh, this is Kubernetes. So I'm gonna tag a Kubernetes. And then later on, when you design your pipeline flow, you will apply plugins, filter or parser, and you say, oh, this plugin will only be applied for the tags that are Kubernetes, dot star, or whatever. So at the end, it's a very lightweight pipeline that you have, and just by playing with those tags, you can do very complicated pipelines, a lot of complicated transformations. So it's very convenient. And at the end, if you look at the plugin, the pipeline file, it's a very light, I mean, compared to the collector, it's gonna be very shorter in terms of steps. On the collector side, you don't have this notion of tags, but they have introduced the notion of connectors. You're probably aware of that and specifically the routing connector. So routing connector acts like a switch. So say, oh, based on the resource attributes that are here, we're gonna trigger that pipeline. Otherwise, I'm gonna trigger this pipeline. So at the end, you can do the similar complex pipelines in the collector, but it's just that you will have more pipelines to design. So at the end, the pipeline file, the structure will be much more bigger than the one from Flimbit. The Flimbit pipeline used to look like this. So you have a section, so input. Then you name which plugin. So the name of the plugin will be, the name tail will be your plugin. Then filter, you see name, Kubernetes. So that's a plugin. And then you configure them. And then like, remember, there's a sequence. You start with parsing and then filtering. And then basically it's a sequential approach. So when you design your pipeline, you're already designed to the flow of the pipeline. Now with the Flimbit 2.0, 2.x, in fact, and three. Now we have a YAML structure file, which is much nicer from my perspective, where you have inputs, the input section, the filter section, the parsing sections, and the output sections. And you can see here, there's a processor here or whatever. You have a processor step where you can attach, okay, for this log, I wanna do some transformations. So basically on the same thread when you receive, boom, you're already applying the modification. In the collector, the approach is a bit different. The collector in your pipeline, you first basically list the plugins. Oh, I have some input plugin, no, sorry, receivers. And so you define all the list of all the receivers that I wanna use, you configure them. You do the same thing from the processor. You list all the processors that you wanna use, same thing from the exporters. And at the end, at the pipeline sections, this is where you define the actual flow. So I'm gonna start with this processor, then this, then a Kubernetes attribute processor, and then batch processor, and so on. So it's in the pipeline section. So at the end, in terms of design, we have this similar experience. It's just that FlinBit has an approach that the collector doesn't have. But at the end, it's pretty much equal on that round. Now round two, logging. So logging, what do you expect from an agent when you are collecting logs? First, there are standard protocols that need to be supported. So UDP, TCP, Flint, Flint is the protocol from the Flint community. Open telemetry, of course. Syslog, Kafka, because probably you go through a Kafka queue. And of course, read from the log file. And the good news is both agents has the same plugins. I mean, collector has obviously more plugins, that's true. But for the main use cases that we need, those plugins will cover most of our needs. So pretty much equal in terms of plugin coverage. In terms of processing on the logs, what we wanna do is of course to enrich the logs, to add maybe Kubernetes metadata if you're running in a Kubernetes environment, to parse the logs, of course. And also to do some batching, when before sending it. So both has the same plugins, not named the same way, of course. I would say from the parsing perspective, FlintBits has lots of features. So the regs, the JSON, the fact that you can build your own new last scripts to parse, there's a wasn't plugin as well. So I think in the parsing aspect, even if the collector has the OTTL, so open telemetry transform language, it's kind of more easier to design your parsing pipelines, for sure. So I would say for parsing the logs, FlintBits has a slightly advantage, but both has the same features. Now let's cover the metrics. The metrics, of course, what we wanna do is to collect metrics with the most common protocol. So collectD, statsD, of course. Prometheus, so having the ability to collect from Prometheus endpoint, and also collect metrics from a host perspective. So both has the plugins, that's perfect. I would say that the only disadvantage with FlintBits is that there's no support for scrap config. So you can collect the data, it is no way of doing relabeling, no way of doing metric relabeling. So that, from my perspective, was a big disadvantage, specifically on the collector side. Oh no, no, the FlintBits side, sorry. And then in terms of parsing, of course, we expect the same thing, I wanna enrich the metrics, I wanna drop the metrics, add some metadata. On the collector, you have some plenty of options. All the options that you need, you wanna reduce cardinality, you can do this. You wanna rename, you wanna transform, you can do that. But in FlintBits, again, like I said, no way of modifying the data in FlintBits relax. This morning, it's been announced with a version three, we should be able to do that, which is fantastic. And one, which is with FlintBits.x, the big disadvantage is no way of reducing cardinality. And this is gonna be quite painful, especially if you wanna reduce the cost of your metrics. Having that option is really, really important. And all the option is to convert. If I'm collecting Prometheus and I wanna convert to Delta, depending on the storage, I have no way to convert that in FlintBits as of now. So on the metric side, I would say that collector is clearly the winner on that side. Now on the traces, on traces, of course, you wanna collect traces, open telemetry, open sensors, Zipkin probably also from Kafka. So both has the only, on the traces size, the FlintBits support only open telemetry. But one big important note, it's only support open telemetry on OTL-PHTP protocol. So if you have a GRPC utilized in your environment, then you have to figure out to translate the protocol. And of course, the collector doesn't have that pain because it's been designed for traces initially. When it comes to traces, what do you wanna do? Of course, you wanna enrich the traces, adding metadata, drop traces, and do some sampling decisions. And as I expected, the collector has everything and FlintBits, like I said, is just a gateway, like a proxy, so the traces will come in and go out of FlintBits. Now, so traces at the end, the collector is clearly the winner. Now around five, performance. So I did some various tests and as we saw here, we have obviously different features supporting on each agent. So I wanna do just a pipeline with logs, a pipeline with logs and traces, and then a pipeline with log, traces, and metrics. And I wanna compare it with FlintBits and then do the same thing with the collector. And then with the collector, you can do processing in a different way. You can do it at the source with a file log receiver, or you can do it afterward on the processor. So do it with a transform processor, for example. So there's two way of doing it. So I thought, oh, maybe I can test both way and figure out which is the best solution for us. So in terms of pipeline, because FlintBits obviously doesn't have the support for metrics and traces, I decided to do most of the complex tasks on the logging side, on the log pipeline, and then on the metrics side and the traces, it's gonna be basically receiving, maybe adding a few attributes, and that's it. Same thing for the metrics, receiving the metrics. I'm not able to do it with FlintBits, I'm just sending it back to the backend. I also enabled, on both sides, of course, the telemetry data to observe how those agents behave, so that will give you some insights. For the tests, first of all, in the QR code, there is a repo with all the tests that I've did. So if you wanna do it on your own environment, please feel free to do that. So how do I achieve those tests? So I picked two demo applications, the Hipster Shop, the open telemetry demo, and then I deployed several exporters in Prometheus. So Kepler is one of them, then I deployed the CubeStat metrics, node exporter, I have Istio, so all the envoy is producing metrics as well, and then, yeah, let's do it. So like I said before, FlintBit is not supporting hotel JRPC, so to be able to have a comparison, I have injected in this environment a sidecar collector in the open telemetry demo, to be able to switch protocol and send it to FlintBit. So that's the major difference between both environments. In the collector, you don't have that sidecar collector running into the different pods. Now, let's look at the tests. So I did the rom-top tests, and then I did some soak tests. How do I do the rom-pop? I run 50 users, I add 50 users to each applications every 30 minutes. So at the end of the test, we have 200 users on Hipster Shops, 200 users on the hotel demo. And by applying load, of course, the app will produce more logs and will produce more traces. So that's how I'm measuring. So you can see on the top, for you, it's gonna be on top left, it's the number of logs received during the tests. So when you look at those numbers, so the blue graph is FlintBit and the black one is the collector. So in terms of CPU allocations, we are a bit better on the collector side, but it's not a big revolution. We have the same type of patterns in terms of utilizations. But, and then when you look at the memory usage, of course, FlintBit is 15 megs and the collector is about 95 megs. So in terms of resource, we have a clear advantage with the collector, over the FlintBit, sorry. Now if you introduce the traces, we can see on the top, we have the number of spans coming in that is aligned with the load. In terms of CPU utilization, we have a bit the same behavior than before, not a big change. And then in terms of resource, in terms of memory, the difference is lower, but still the collector is consuming a bit more memory. Now what's gonna be interesting is the next slide. So two hour tests, now I'm introducing metrics. So you can see that we have about 500 K metrics coming in during the tests. And the collector, you can see that the behavior of the CPU has is increasing before the previous tests. So we are a bit higher than the FlintBit compared to the other tests. And if you look at the memory usage, in the two hour tests, the collector jumped to 1.5 gigs around that. So okay, now let's do a soak test to see if there is an issue with the memory. So now I do a test constant load, 50 users during 24 hours on both applications. And then if you look at the CPU usations, you can see that after a few couple of hours, we reach two cores of consumptions on the collector. And in terms of memory, I jump into about 9.5 gigs, it crash, restarts, and I say, hmm, that sounds like a memory leak. So I opened recently an issue about that. So let's hope it's gonna be resolved. And then I say, okay, so how can I, what is causing this? Is it metrics tracing logs? All right, so let's look at, so I did the test with a soak test with only logs, and then I did a soak test with only testing logs. And I can see the collector here, the resource usage is super stable. So it's from the moment I introduce the Prometheus receiver, then the numbers that you saw before start to, the memories start to jump into the roof and something for the usabilizations. So how can I delegate the metric collections? I said, okay, I want to go further. How can I do that? Well, for those who followed the OpenTemperature project, this has been an amazing feature, to be honest, I love this feature. It's called the target allocator. So it's only available with the OpenTemperature operator. And what it has is when you deploy the collector, so you have a CRD, you deploy a stateful set, you can enable this target allocator. What it's gonna do is it's gonna take the scrap config that you've defined in your pipeline, it will push it to the target allocator, so the discovery of the metrics will be done by the target allocator. And then once he had discovered the metrics, say, okay, so we have a three replica, okay, so I'm gonna split the jobs. And then each collector at the end takes, okay, target allocator giving me my jobs, and then the collector doesn't have to discover those metrics, he's just gonna scrap it. So I said, okay, maybe he's gonna resolve our problem. So for this, I had two collectors, one for the logs and Damon set, and one just for the metrics as a stateful set with the target allocator. So let's look at the results of this same thing, 24 hours tests. And you can see that the CPU utilization is about 200 millicore, similar to what we had before. And in terms of memory, we jump into max to 800 megs. Oh, shh, okay, thanks. And then the target allocator is about couple of millicores and couple of memory. At the end, it's cheaper to run the allocator than just letting the collector scrap your metric by yourself. So here, no memory leak, no problems, super stable. Then the last piece that I wanted to cover is the processing. Why should I do the processing? Of course, the industry says, always do it upfront. But then I said, okay, let's try to look at the numbers. And if you do it on the file log receiver compared to transform, what I saw is that the memory usage of transform is less important. That was surprised. And then into the processing time, so that's the graph on the bottom. I measured the time to receive, the time to process, and the time, the overall time. The time is the same. So no change into the processing time. So again, no conclusion about that side. Just say filtering upfront is the recommendation of the market, so please do that. So then you will have less processing tests to do. As a conclusion, I would say, if you only have to deal with logs, Filmbit is amazing. But if you have metrics, if you have traces, and obviously you're gonna have it, then the collector with the Prometheus receiver, the old fact that you can reduce, the cognitive IT you can convert is obviously the best player. So that's my conclusion. In terms of resource utilization, I would say the Filmbit has an advantage as of now. But again, it really depends on use case. If you do only logs, pick Filmbit, that will be my recommendations. Small teaser to, is it observable? So that's the YouTube channel. I have plenty of episodes, and the details of this benchmark will be released on the channel in a few weeks. So check it out if you want to have more details on this benchmark. All right, if you have any questions, I will be more than excited to answer all of your questions. No questions? I guess it was the brutal presentation after lunch. I'm sorry for that. It's a good point. So the question was, do I plan to extend this study on other agents? I am, if it's runs in communities, yes for sure, I will try to do that. And what I'm more excited is to do the same benchmark with, by utilizing the Filmbit 3, because now we have the actual processing available to figure out if I start to do some heavy processing on the metrics and the traces if I start to have the same behavior that we observed with the collector. But yeah, that's a good point. I think there are other agents out there. It deserves also to include that in the benchmark. Any other questions? Hi, thanks for this. This is very useful. I wanna ask if you have any hypothesis why the difference in memory consumption? Is it just because auto-call is not in C and Flambit is in written in C? Yeah, so clearly the fact that you have an agent built in C is an advantage for sure. It's gonna be super. And then the processing time I measured is two nanoseconds compared to milliseconds in the collector. So Flambit, when you parse, it parses super fast. So I think yeah, that's clear an advantage for sure. Do you think there's space for another collector written in Rust maybe or some other memory safe but performant language? And also I think that could be a good point. And also I think the major difference is the Prometheus receiver is heavily impacted, the behavior of the collector. The way I'm collecting, I'm scrapping every 30 seconds. So I have lots and I will have a couple of relabeling rules and the problem is that you, in general when you collect those metrics, it takes all the metrics comes to the collector. So they say you have one million and then you start relabeling. So this is heavy for the collector and that's, I guess once Flambit introduced the scrap config approach, they should have the same type of constraints for sure. One point aspect is that I use the hotel contrib version which has all the plugins. So in terms of memory, I was like at 95 megs when I started. If I have a distro with less plugins, I guess I could reduce slightly with few megs. But again, there's still a difference in memory usage. Oh, it's over. So thank you for your time. I hope you enjoyed the sessions.