 Well, my name is Eduardo Silva. Today we have this quick presentation called Connecting with Prometheus World, Logs and Metrics. I know most of you might be lucky enough that you are not messing it up with logs, right? But I'm coming from the logging space, and I think we are kind of the small causing on this ecosystem. And as far as the community, we have seen a lot of requirements on how people, how users, companies can have a more unified experience, right? Between logs and metrics, but not just from the data analysis perspective when it hits the backend, but before that, in the right side, or also in the left side from the pipeline, where the data collection happens. My name is Eduardo Silva, and the creator of this project called Fluent Bed, one of the maintainers. Also, I founded this company called Caliptia, which is a first-mile observability company. We founded this with Andrea Gupta, which is Fluent D product manager, and having the CNCF for a while, I used to be at Treasure Data with a whole team that created Fluent D. OK, before talking about all this unification process and this story, we need to talk about how logging works, right? Maybe you are familiar with it, but it's really good always to make some analogy and how things interact. Basically, in logging, you have one application that ships one message. That message can be sent to the standard output, some kind of stream, can hit the file system, and most of the time it's just a raw text message, right? But when you have this message and you want to do data analysis, you will notice that there's some difference compared to metrics, right? Here, we don't have a fixed schema. Most of the time, we don't have a structure. Even if it's in JSON, it's just an array of bytes, right? But we need to do some computing in order to handle those messages properly. And when the application generates more messages, even if you're in a container space, right? This will be trapped by the container engine, and they are going to group them in a JSON file that you will end up consuming for your own purposes. And this is the fun part, right? For example, most of the time, you just hit the file system and you need some kind of engine that take these files or listen for these messages over the network, start processing them, enriching them, drop some of them that doesn't match some criteria to be able then to send this data out to your own backend, because you just care about data analysis. But if you want to do data analysis, you have to collect the data, process it, and send it out. In all this process, which can be summarized as like an input and output, there's a couple of things happening, right? So if we are reading files, we have to deal with log rotation files, copy, truncate. We have to deal with if the file was rotated, keeping monitoring for a couple of seconds, then make sure that you close all the file descriptors and everything works smoothly, right? Now when you're trying to send the data out, you have to make things happen, right? Network outage, power outage, a bunch of things. And some services, even on-prem, sometimes don't get very responsive enough. The normal is that we have more data from applications, and the problem is that backend applications, storage engines are hard to scale, unless we're using a hosted solution. But the concept here is that we create this project fluently and fluently a bit to solve this problem, how to collect data from multiple sources of information and how to send this out to any kind of backend database, right? Thinking that this is the kind of the right and vendor-neutral way to work in the login space. And FluentVent and FluentD are used widely, right? Most of this traction started to grow when AWS, Microsoft, and Google started using it in their own infrastructure. Nowadays, just from public repos, FluentD and FluentVent are deployed more than two million times a day, just from the public repos. We don't have any stats from the private, but it should be 10X more. Okay, so now looks at metrics. Let's talk about this unified experience. And we have seen that in the market, there are different kind of tools trying to accomplish this, right? There's some kind of fatigue in terms of people having multiple agents for different things. And here, we're not trying to propose something to replace everything. From the Fluent perspective as a project, we always think and have this mindset that we should listen messages, data, or anything, from every source and be able to send that data to any destination, even if that is a competitor, right? Avoid the vendor lock-in. So that unified experience is what users want. And for years, they told us, hey, is your agent able to handle metrics too? And yeah, we kind of do it, right? On our own way. Is it a proper way? Maybe not. But also from a maintenance perspective, yeah, we have some experience with metrics. So why we cannot try experimenting and creating a solution that works with both scenarios, locks and metrics. So now it's a quick intro about our journey of these years into the metrics work and the kind of value that we're trying to bring connecting with the Prometheus ecosystem. As I said before, a Fluembed project specifically, since the beginning, I'm talking about five, six years ago, we always had this metrics collector. Fluembed was created originally for Embedded Linux. So we created CPU collector, memory collector, but all those metrics were handled as locks. Right, what that means? No fixed schema, just a simple structure. In our world, we don't use JSON. When we consume the data, we use message pack, which is like a kind of binary version, but there's no enforcement of a fixed structure. So if you have a couple of key value pairs, that's fine. And for example, if I want to gather CPU metrics, I just gather them, but again, them as JSON logs. No, it's a metric payload. So, and that's where we support the network thermal docker metrics and Fluembed metrics, of course. Now, if you will compare both, there are some in terms of two spaces, locks and metrics. For one side logs, we have unstructured and structured messages, right? In metrics, you always have a fixed data model, which is, give us a lot of happiness, at least for the logging world, right? We have a lot of pain because when we don't have that structure. From the other side, in logs, users cares about filtering, data reduction, data enrichment, Kubernetes metadata, or any kind of things, and be able also to do aggregation, take all the data, buffer the data, process it, and then send it out. The metrics, I would say, from the collection standpoint, aggregation is quite optional. In logs, we don't have a predictable size for the data, right? We can get as much of data as we want. If one of the developers got a lucky day and just enabled debugging in their application and just went home, on Monday, we're going to figure out that we're going to have millions of extra messages, but that we didn't need it, and we started paying for them on our Splunk database, or another one. So it's hard to predict, right? It's hard to control. In metrics, yeah, it's a fixed data model. You can predict at a certain point. And, well, in logs, we have map Booleans, integer floats, pretty much like a JSON. And in metrics, of course, you have more defined times, counter, gauges, histograms, and I think that for our use case is quite good. Now, when we're thinking about how to bring value from a fluent perspective to the metrics space, we wanted to think, so what do we want to improve in the market? Do we want to rewrite everything from scratch? Right, do we want to implement our own data model on top of what the market is using? And from our mindset, the answer is no. We did our research, and I think that personally, if you want to start adding value to the users, to the companies, you have to integrate with the standards that you have in the market and not reinvent the wheel. And if you're going to reinvent the wheel, at least make it compatible with what you have in the market, right? So we decided to stick with Prometheus and Open Metrics. We say the whole industry is using Prometheus. If you're going to be a good citizen with Prometheus, we have to use the same protocols, the same style of metrics. There's the years of experience, so why not to keep on that? And for us, Prometheus ecosystem is quite interesting because you have a metrics spec. You have these con notions, collector, spotters, and also you can shift the metrics over the network. With usually that is a problem, right? If you don't have the right data salarization format, shipping data over the network might generate some issues. And this is how from a Fluent Bit perspective, we say, okay, we're going to integrate with Prometheus, but that's what it means, right? Because I'm not planning to ship logs from Fluent Bit to Prometheus, but I came more about what kind of enhancement we can do in the ecosystem. So before to write any integration and everything, we decided, hey, we have all our stack to manage logs, but also we're missing this piece to manage metrics in Fluent Bit. Fluent Bit is really in C language, so it's not like we can just go ahead and consume all your Golan SDKs, right? It will be a tremendous overhead for the project, for our perspective. So we started this project called Cmetrics with Apache license, pretty much the same thing. And we started, like it's really in C, we started to support basic stuff, counter, gauges, and type it, yeah, we need it. And histograms are currently under development. We had to make it labels aware, it must support atomic operation. And actually, I can say, yeah, it's pretty much a copy of Prometheus Go Client, yeah? We took a look at it, yeah, we need atomic operations, we need labels. And if you are going to start doing metrics stuff, we should take the best practices from Prometheus. So the Cmetrics project itself is about, we care about the content, not the transport. The moment that you separate both concepts, you can get a lot of advantage. For example, for a Fluent Bit project that aims to be vendor neutral, that aims to handle metrics, okay, we are going to be a good citizen with Prometheus, right? But what about some user says, hey, I'm using InfluctsDB, or I'm using an endpoint, I'm using Chronosphere when M3, I need remote write. So how do we play with all these different kind of payloads, connectivities, and security things? So we say, okay, Cmetrics will be all about how do we create metrics, how do we manage its value, and how it can take this metrics context and convert it back to any kind of payload that it want. So we extract the whole problem. And also we care about, okay, let's support the same thing that Prometheus names is subsystem name descriptions for help and labels. Yeah, maybe some things will go away with open metrics changes, but that's fine. Now, let's take some simple C example, right? What we're doing here is just creating a simple context of Cmetrics, gathering what is the current timestamp, creating metrics with two labels, right? Increment the counter, and optionally just getting the value, right? I'm not doing anything fancy here. This is just a definition, incrementing a content, retrieving the value, and printing that value from the standard output. Now, another thing that we can do here is like, for example, with, using InfluxDB, we can use this simple API to encode my context to an InfluxDB payload, right? And we're going to that specific output. So this is the way that we work in FluentBit. So we try to have a agnostic core that we take data. We unified one format and convert to the other. Cmetrics does the same thing. We did the same thing for Prometheus Sporter, right? I have a Cmetrics context and I want to expose this information. But this is the difference. Do you remember that I said, we don't care about the transport, we just care about the content. Here we are not using any kind of HTTP protocol of anything. It's just payload, just content. Also, we can do remote write. This was a quite a more extended work, but we could accomplish using the same Prometheus protocol buffer files. So with the same specs, we are now able to generate this kind of a Prometheus remote write payloads. Okay, and now we get back to FluentBit. So now we have a small library that handle metrics that convert payloads. And now, okay, now we jump into the metrics part. Sorry, in the network metrics part. How do we use it? And the first project, which might be a bit controversial here, is like our users ask it to re-implement NodeExporter, but in FluentBit. And what means re-implement? Users usually have FluentBit for logging and they have NodeExporter for metrics. So they ask it that FluentBit sends it has a pluggable architecture if we can re-implement the same metrics collection inside FluentBit to avoid having an extra agent. This is not just about just competing one project with the other. It's about just to add a more flexibility to the users who want to have this kind of unified experience. Of course, NodeExporter has a lot of, a hundred of collectors or more. We try to focus just on a subset for this year, like CPU, CPU frequency, just for Linux, NET, NETSTAD, AUNAME, VMSTAD, blah, blah, blah. And actually we got a really good reception, but that is the collection side. So we got FluentBit, we text the metrics, we create an input plugin that's gathered all this information from the proc file system. And now it's time to do something with it. We'd get the data and the input. Now we have to expose that information somehow. So using the same engine, that we showed some few slides ago, now we can use our own built-in HTTP server and expose the payload of this Symmetrics context in our HTTP server. But here's a difference. When you have an exporter, most of the time the data, when you get the request, it's just a gather real time. On that moment, you get the data, right? Here in FluentBit, we use a different approach. It's like the input plugins runs at its own interval of time, a scrapping metrics, scrapping information, pushing that to the output side and in the output, for example, the HTTP output plugin, we have this kind of cache. That is refreshed at one every two seconds. So every time that we get a client that's going to scrape this metrics, like Prometheus, it's just going to get the data not on demand, but the last data that was generated by the input plugin. And from a FluentBit configuration perspective, this is just a few lines. You just write, you can have multiple inputs, filters and outputs. We're just saying in the input, just use not export the metrics plugin, right? As you say the attack to it, scrape interval to two seconds. And in the output side, use the Prometheus exporter output plugin, right? And you can just get the information without any problem. Suddenly, we have to skip the demo because we got this Linux problem today. So I don't have the environment to show you the demo, but hopefully, maybe we can start, we will try to post something online so you can watch how it works. Now, as a summary of in our metrics journey, what is supported today is FluentBit. It's not exported metrics for Linux. Our FluentBit internal metrics now are supported using Bayesian metrics. And in the output side, we can send metrics to InfluxDB, Prometheus exporter, Prometheus remote write as a kind of supported outputs and forward. Forward is a FluentD and FluentBit protocol to send data over the network. So if you have another remote FluentBit instance that speaks forward, it's able also to receive those metrics and reassemble all the context metrics there. And then you can do whatever you want, right? Ranges, add labels, or modify the data. Now, what is the current ongoing work? And I never expect this when we started this metrics project is that people starting asking for not exporter metrics for Windows. And this is crazy, right? You always think that everybody's running Linux, but when you go to the market, everybody's using Windows too. Windows server, of course. So we are writing Windows exporter metrics. We are rewriting Nginx metrics exporter as an input plugin for FluentBit. We already support collectD and startD, but as logs, now we are implementing those to support as native metrics receivers. So you can have your own notes that are sending startZ or collectD metrics and just receive them in FluentBit, convert it to metrics automatically and expose them in Prometheus exporter or just send it out over to you, prefer back in use and remote write. Another work that is ongoing right now is the ability to convert logs to metrics. Now you might think, okay, when is this needed, right? Well, pretty much for example, if you're reading, I don't know, Nginx logs and you want to count the number of 200 responses that you're getting or trap the 500 ones, you will be able to create this kind of metrics based the parsing of the log and generating these updated values. And right now we are working to have native metrics support in the output side for Datadog, Splunk and Amazon CloudWatch. In FluentBit, we also is not, we started us just afraid to take data from one side, send it to the other, but we added many things in the middle, like filters, Kubernetes support, but also we added an stream processor in it. We wrote our own SQL parser and you can run your own SQL queries on top of your data that is flowing through memory. There's no database, no indexing. Kind of a stream processing, processing is very similar to KSQL. Now that allow us to do fancy things like, for all the data that matches a pattern, take it, create a new stream and send it out to a different destination. Now the same thing we're going to do with metrics, right? We want to do, create this kind of metric processor that allow us to take the metrics and do some fancy processing, aggregate them or do whatever we want and maybe chip this data to a different processing, different endpoint and this will open the opportunity also to create alerting or any kind of other implementations. Now the common question is the future, open metrics, open telemetry. And as I said at the beginning as a project we are trying to be vendor agnostic and spec agnostic but we try to implement right away with what the industry is using. Now it's permit use and open metrics. Once open telemetry gets GA in metrics which I think will be soon and looks is a few years away, we are going to integrate also with open telemetry for environments that the application should be in native metrics in open telemetry format. So that is the current status of the project. I'm glad to have the opportunity to share this here. And I would like to know if you have any questions. Oh, thank you. If you don't like logging, that's fine because it's boring. I like logging. Big logging fan over here. Thanks. Are you exposing flue and bit metrics that it generates telemetry about itself via C-metrics? Yeah. Awesome. So now you can chip flue and bit metrics to a remote write endpoint or send it to, I don't know, Infusivi. Yeah. Hey, Eduardo, thanks for the talk. Super interesting. And I was just wondering if you had any plans for basically creating some way to extract metrics from the actual logs, like user logs flowing through Fluent D itself as it's collecting application logs from end user systems and kind of your thoughts there and what's going to go on in that world. Let me check if I get the question right. If we are planning to get extract metrics from logs. Yeah, yeah. It's super interesting to see. So having support for plugins like NodeExporter and the Fluent B internals and also like other collections, like you mentioned, StatsD and CollectD. And I was just wondering if there was any likelihood that there would be development around saying I have logs that tell me every time I have an error. So like the error log rate and exporting that as a counter or something like that as a metric. So just kind of those kind of use cases. Yeah. Actually, that's one of the ongoing work. Yeah. It's a similar example that I did with NGNX logs. When you want to count, I don't know, 200 errors, error rate and all of that. And that's when the collection standpoint, but also you want to do this kind of metrics processor that can help us to take some action based on results. So the answer is yes. Yeah. Awesome. All right. Thank you very much. Okay. Thank you very much.