 Hello everyone, my name is Eduardo Silva, and welcome to this session called Hydroboot Plus Low Resource Usage Outlogging Journey. As you can see my email is on the first slide, so feel free to reach out anytime for questions or any kind of follow-up that you wanted to do. As I said, my name is Eduardo, I'm mostly on Twitter and GitHub and the main link is Twitter. I'm the founder of Caliphia, which is a company which provides food support and products on top of the FluentD and FluentBit ecosystem. Feel free to reach out through the website too. I'm also the creator and maintainer of FluentBit project, which is part of the CNTF FluentD subproject. So in general, everybody wants performance, right? If you want to achieve high throughput, you think about performance. But everything has a little cost, right? So and when people say performance, sometimes think about what's the number of records or events that they can process by a unit of time, right? And after that, you start realizing that maybe the setup that you have or the strategy that your tool is using might not be conformant or maybe it's not following the best practices from a configuration perspective. If you think about, I want to consume low CPU or I don't want to exceed this amount of memory. And that is, it's really hard to come up with an ideal solution. But I can talk about how do we solve this problem in FluentBit and where are we going with this? So our journey started years ago. We had this project called FluentD. And FluentD is really good for service, it's really good for aggregating information. We have a huge ecosystem, right? Where with people and companies has contributed more than a thousand of plugins. And it's really great. So I'm saying this because maybe I would say that 80% of you may be familiar with FluentD, which is CNCF project. But also here we're going to explain and talk about the journey of the sub-project of it, which is part of the ecosystem. And when talking about logging, there's a couple of things that we need to clarify, right? Logging by itself is not cheap. And we need to understand that logging in the past used to be pretty simple, right? An application just chip a message, this message. Or maybe use a syslog or syslog or any kind of service to handle the message for it, right? And most of these messages are text-based. But if you think about what is the end of this message is that it allows you to perform some data analysis. And to do that, you need to take all these messages and centralize them in a database. So then you can create some kind of schema or run some queries on top of it. It doesn't matter if you store it in a sequel or no sequel kind of database or document oriented. Actually, the pattern is pretty much the same. You want to have some structure, which is on top of the text message that was generated at the beginning. And one of the challenges if you think about throughput performance is that every time we have more data because we have more applications, right? Everybody is now deploying microservices, the coupling, everything. And it's really hard to keep control of what's the logging rate of the application. What kind of messages did this send in, right? Sometimes you don't want to just care about specific messages like info warning errors, but maybe you don't want to debug messages. Maybe you can raise your hand if you have been managing some system where a developer by mistake just enabled the debug mode for one application. And you start to see all this increase of load of log messages in your pipeline, which is quite normal. But that affects performance, right? Because that is our goal. And as I said earlier, so if we think about the workflow of how these work together, is that we have this application that generates a simple log message. And sometimes multiple of them. As an example, let's take this simple message that comes from a NGINX log access log. Actually, what it's doing is just writing a message which specifies the IT address, a timestamp on when the request was generated, plus the information from the protocol, the method, the URI, the protocol version, the response site from the server, blah, blah, blah. But this log message also is not unique, right? It could be different. Sometimes applications generate messages in different formats, even coming from the same application. Sometimes you have multi-line application or think about stack traces, right? So you can think that this thing is a bit complex. And when you have this application generating this message, I would say that 80% of the time is just a raw text message. It doesn't have any structure. It doesn't have any schema. Nowadays, we are seeing people and companies trying to accomplish. Let's try to log with JSON or trying to unify a specific format. But in reality, that effort has been around for years. And it's really hard to say that the industry is going to align to one specific format. It's really hard. It doesn't happen in the last 20 years, right? But I would say that nowadays, because of parting capabilities, maybe JSON is like the middle point between all the options available. I'm not saying that it's best. It's really slow. But at least it's something that we can start with, right? And when we generate this message, this message gets to this, being handled by syslog or syslog or syslog. Any kind of services running in the system. And now we can have many of them. But also we think how this works, because our goal is to perform data analysis, is that we have all these files with all these records, maybe from different applications, maybe from the same one. So we need to have this kind of engine that is able to listen for these messages, or realize that they are there, and then be able to take them and send it out to any destination, like Amazon 3, Elastic, Stackdriver, Splank, or any kind of service, right? Nowadays we have multiple options in the market. This is just an example of one of them. There's no preference from my perspective. Actually every user might choose their own vendor or vendors for each one. And in the engine side, there's a couple of steps. It's not just take the data in and send the data out. If you think about throughput, that is quite easy, and that is quite fast. There's not so much possible work on that, take the data in, send the data out. But actually a real log processing, log processor, needs to come, deal with different kinds of things, like for example, collection of logs that come from different sources, not just a file, maybe from system D, from TCP, UDP. Take this data, optionally try to apply some parsing to convert it on a structure method to a structure, maybe to JSON or maybe some binary format that we can deal with internally. And sometimes most of these messages also need to have some metadata on it. If you are receiving some messages over TCP, likely you would like to write that IP address from where these messages come in from. Now, if you are, we're in KubeCon, right? So if you think about messages coming from your pod, you would like to have your labels inside the same record. So then you can group them and perform your query without any problems without losing context. But also there are other cases where you want to perform some data reduction, right? If you're dropping a lot of messages, and these messages have like, as I said earlier, information, debug messages, maybe you don't want them, right? So you want to make sure that everything that comes as a debug messages in your pipeline, just drop it, right? If you think about a storage that charge you the build, right? Because of the amount of data that you ingest, right? You're maybe interested in to perform data reduction and not send the whole information. Also as part of the role of the log processor is to perform buffering. And buffering is the capability to take this data and optionally maybe store it perceived in a person's way on this, because you need to restart the service or maybe the service just crashed, or you have to get a hardware failure or net audience. So you want to make sure that you don't lose any data. And finally, be able to send this information to different destinations. And these destinations are important is some of them are just to handle the information as binary blocks, right? Because they just want to store the information that I'm getting. But for example, in the case of Elastic, you just care about how to query my documents. Ideally, I want to have all my documents in Elastic with a good structure in the JSON map so I can query or have kind of index through them, right? Same thing happened with Stackdriver and same thing happens with Explan plus others. In general, you can think that we have an input with different sources. We have this engine in the middle and we have the output. So explaining these concepts is really important to understand performance, right? Because we need to realize how this works behind the scenes, right? So in general, simplifying everything is an input and an output. But we need to explain what is happening inside the input. For example, inside the input, you have a lot of IO. IO means maybe you are opening a file from the disk, right? Or maybe you are listening from a new network connection or maybe you are collecting metrics locally from the protocol system or just receiving them from a third party service. In the engine, it has a lot of work. As I explained earlier, it's like you have to parse information. You have to offer options to filter this data. Like enrich it or discard data also to serialize this information because if you get the data as route data or binary data and you're going to do all this parsing, this filtering, you need to have a unified model for your data, right? And having a binary format is ideal to do any kind of data processing. This buffering to store the data so you don't lose it and be able to have a routing logic to decide how to send my information that is coming from a specific source to a different destination based in a pattern, how to scale it twice and so on. And in the output side means that how do I send this information that I just got in my engine to a third party service? It's like I need to deal with network setup with payload formatting because every output destination expects a different payload format. Most of them are working with JSON nowadays, but for example, the difference is Kafka. Kafka has its own format for the network, but all of the others elastic, actual log analytics, even in Splunk, use JSON. And the output side needs to be with delivering content in different formats or also be able to understand if I'm sending this information, I need to expect for a specific return code because if something fails and my server maybe says, you know what, I could not produce process with information. We need to retry. The engine needs to be able to retry through a scheduler. So something that started as an input output has a lot of tasks that works internally. And now I'm going to introduce FluentBit. I know that just talking about it, talking about FluentD, and I'm sure that you are really familiar with this. Actually, FluentD and FluentBit are from the same family from the CNCF, and FluentBit is a CNCF subproject under the umbrella of FluentD. The good thing about FluentBit is that it was always designed to have high performance with FluentD, FluentD is great, but also is written in Ruby. And being written in Ruby has some downsides from a scripting language. It's really easy to extend, it's really easy to scale, but if you want to optimize some resources, it's really hard. And I would say that it's not just a Ruby, but any kind of language that starts implementing a system level application that starts coupling a bunch of dependencies, right? Actually, you start focusing mostly on usability or how to extend it, that optimizing every single component for performance. FluentBit is written in C language, and we try to reduce as much as possible the dependency and try to build our architecture and deal with most of the stuff on our own. And I think that from a community perspective, we have to get a really good feedback and really good traction from this. Nowadays, the users use FluentD, the majority of them, while others are using FluentBit, and we have a third option which both are using, well, they are using both, they are using FluentBit and they are using FluentD together. So let's talk more about the pipeline internals. We call a pipeline to everything that it closed data, right? We're talking about logging, we're talking about technology, right? We're talking about that kind of pipes. So understanding how the data flows is really interesting. Now, if we talk about system code, if we analyze in general, all the input interfaces to collect data from IOD's metrics or so on, you end up using system codes like open read, so can read and bind, and both functions to perform a memory allocation and memory management in general. This is pretty fine. This is not a big problem. I would say that most critical part is how do you manage the memory while consuming data, while processing the data, and while sending this information out. As part of the engine, we can call it a processor. Here we're splitting the screenings in two parts. Actually, the engine needs to be able to do parsing, right? Because we mentioned that we want to have this constructed information to a structure format, right? So we could take a JSON, parse a JSON, or apply a regular expression on top of this data, because my data is quite custom, but I know how to group them, right? As the engine example, we know that the first field is an IP address, the second one is a timestamp, and so on, or log format for a golden message, CSV, etc. Also, as part of filtering, you might want to do some kind of data enrichment. In Kubernetes, of course, you want to open all your labels, or sometimes your annotations, to every single log record. But also, you want to do some kind of data exclusion based on some patterns. And internally, I would say in our case, we sterilize all our information. Every single event, that's a metric or a log record, we use message pack, which is a kind of a similar to a JSON binary, but quite more performant. And for buffering mechanism, we offer memory and disk, which we are going to explain in a few minutes. Also, as part of this processing work, we need to be able to roll this data, right? We have a logic to roll this data, meaning, like, for every data that comes from a different, from a specific source, you want to send it, for example, elastic search. But also, you want to archive all the information to Amazon F3. Yeah, this is quite common example. And we have found some cases also that a enterprise companies are using Splunk heavily, and they are deciding to just send to Splunk, not 100% of the data, but maybe 50%, right, which is more critical information that you are going to rank where it's already, and all of the other two archive systems, right? And also, as part of this processing engine, we have all this scheduling, we have our own schedule, because if you're collecting data, you need to have a lot of timers to say when to collect data from which place and which kind of collector, but also have this logic that if some output destination failed and asked to reply something, be able to handle that task and implement a reply in place. As part of the output side of this pipeline, we mentioned that everything is about also network setups. So we need to perform DNS lookup, connect to a endpoint, and you want to implement, of course, nowadays, TLS. So actually, as part of that process, you want to handle the TLS handshake, and ideally, you want to maintain this a connection alike with keep alive, right? If you are closing a connection and open a new connection every single time, that will be really expensive because the TLS handshake process is really expensive because of the round trips. We mentioned that also internally, we have all this data in binary format, right? And this binary format for us is message-to-pack. But when you are talking to Elasticsearch, Elasticsearch does not understand message-to-pack. It understands JSON maps or well, any kind of JSON object. So what you have to do or what we do in our Elasticsearch connector is take this message-to-pack binary message and then convert it back to JSON and then send it over the network, right? So we need to do the payload delivery, write all the data, make sure that all data was delivered, check for return status, and report to the engine what was the final status of about all this process. So now about optimizing all this data process and IO in general, it's really interesting to understand how things works at the deep level. So as I said, for managed messages internally, we use a binary representation of the data using message-to-pack. And also, when dealing with how to handle this information, what is sterilized internally at runtime, we have a really good implementation with buffer management, with an hybrid mechanism, with memory and no file system. We group all the records in chunks, and this is something I'm going to explain in the next slide. About data serialization and message-to-pack, here in the screen we have a table which compares how many bytes compares to using JSON versus message-to-pack, right? For example, for a new message-to-null in JSON, this means four bytes in message-to-pack is just one. If we go to a more critical example with a map, which is in the last line, with a key called 40, and when a value new, actually it uses 11 bytes. In message-to-pack, that is five bytes. This is just a comparison of how one format works against the other. Of course, message-to-pack is quite small than a JSON payload, but I would say that this is one of the advantages. The other advantage is that in message-to-pack, you don't need to parse every single byte per byte. In JSON, you have to do it. You have to create some kind of index across all your maps, and then you can know where the data starts and where the data begins. In message-to-pack, you can know that you have an array beforehand with X number of items, so you can jump between them. When you are dealing with data processing, when you are removing some message, sorry, so key or adding more key or doing any kind of modification, it's quite more performant and easy to do it. Internally, when we get this data, we talk about data serialization. Now, imagine that we got a text message as a JSON map. What we do is convert this JSON to message-to-pack, and then we have this notion of tag. A tag is just like a label. For every information that comes from a specific source that has the same tag, they are all grouped together under the same chunk. A chunk is just an amount of bytes that store many records. Usually, we handle chunks around two megabytes, which has been really efficient for most of the use cases. Consider the chunk, which is a more critical unit of data, granular data that we have in the fluent bit pipeline. Now, how this operates, for example, in memory? When the agent starts getting data, what we do is to create just one chunk in memory. What we do is start appending the events on that chunk until that chunk gets to two megabytes. Then we simply create, if we get more information, we just get a new chunk and we do the same procedure. It doesn't matter if we have the same tag or not, we just create blocks of chunks in memory. At one point, we have a bunch of chunks linked in memory, which is pretty efficient. It's pretty fast. Actually, memory buffering is a faster implementation. But there's one downside, right? It's not persistent. It's not persistent. So what would happen if our service is restarted? What would happen if it's a service crash? Yeah, you're going to lose that data. Also, there's one more critical part. If you're running in containers, likely your container will have a limit about how much memory that process in this case, fluent bit, can consume. If you go over that, right, well, you reach the limit, the kernel is going to kill your bot or your container. So there's a kind of special situations where you just want to run in memory, right? But I would say 90% of the time, you want to use a secondary option with a secondary option, which is a file system. In the opposite, what it does in the file system is not storing the things in memory, means that we're going to start storing the things in the file system, right? But this, of course, is more intensive on IOS. But here we have a couple of optimizations on how do we find the chunks on how do we store them, right? So every time that we get more information, we are getting a new chunk and just appending this information to the chunk. And to what point that we have many chunks in the file system. But there's a curious thing here. Here, this is a couple of simple examples where we have five files, right? Five chunks. But if you think about that we have 2000s, 30,000s, because maybe you are not able to send the data because the destination is down, right? You will see a, but I'm going to have all these chunks open. I'm going to use a single file descriptor for each one. And I would say that now we are going to talk most about a databases concept, right? We don't use any external database, right? We implemented our own layer for this specific use case. Actually, we think that databases are really good and better proof, but they are for generic purposes, right? Here we are dealing with this kind of log information that we need to group in a different way and we need to deal with file system, memory, but based on our pipeline logic and not a generic database logic. I'm not saying that databases are bad. I'm saying that for our purposes and performance gains, our solution, this is the solution that we implemented works quite good. And now we are going to jump into a more elaborate complex, and this is what people really use in production. Actually, it's not just a memory of file system. What we have is like an hybrid mechanism where all the data that we are creating, of course, goes to memory. But in the pipeline, we say for this input source of data, for example, you are taking log files or listening for messages, you can say just keep up in memory. We have the concept of up and down. You can see in the screen that we have different colors in gray for down and a light green for up. Means that everything that is up in memory is ready to be used or maybe it's a challenge that you can append more data, right? But we protect the memory saying that, hey, you can have a memory up to, for an example, a hundred of chunks, right? And if each chunk is two megabytes, well, you can make the map your own. But after that, we start storing all the chunks in the file system. But how do we correlate this between all the chunks that are in memory plus file system? Actually, what we do is to use a memory map files, which is a really common implementation for databases, where we map a file that exists in the file system as if it were in memory. And actually, that is really, really performant, right? We are not using common system calls like read and write, because we try to avoid the context copy of the data between the kernel and user space. So using memory map files is quite fast. And we can have the control on how the data flows. And if we get ingested, for example, imagine that the output side is down, and we're getting more data in, you are going to face this concept of back pressure, right? If you don't have a file system, you're going to store all your data memory. And at some point, your process is going to crash. But what we do is storing memory as much as we can, based on our own configuration limits. And after that, continue writing to files. Now, if you look at the image, you will see that in the file system part or on the disk, you will see that some blocks are gray and others are likely. On this case, on gray, we are saying that these chunks are not up in memory. They're just in the file system. And when the memory part gets empty, not get empty, but start delivering the chunks, we get more room, more space to work on the pipeline. We just take the chunks and load them up in memory. And this has been implemented for almost two years. And there are various level designs where we can have, I don't know, 200k messages per second without any problem, but you can increase that based on configuration and your own, there are many variables in your environment. And then one thing about this, all this data processing, for example, with binary representation, we can get the CPU and the CPU consumption low, right? Also, all this IO2Ds, all this memory is quite more performant and it's a really scalable design. Right now, this is better proof. This is a fluent bit. It's quite used heavily on most of major cloud providers like Google, AWS, Microsoft, Azure. And as a team, we have this kind of companies that work together. Actually, we have working meetings with most of them, where they contribute also to the project. They are part of fluent bit and they're trying to improve the whole fluent ecosystem. Nowadays, this is a scalable design. I would say that it's most scalable and chip implementation in the market right now. You might find another solutions, but you will find that one thing is data rate and the other is data consumption, right? So if you want to send the data fast enough, yeah, just avoid the filings, right? If you want to always have a low CPU, a low memory consumption, maybe you can slow down the ingestion, do something. Also, a good thing to mention for everybody is like, don't trust what I'm saying. I would say that in general, when running these kind of things, you should run your own benchmark. And don't trust in benchmarks that are published on the website, right? We need to trust in our own benchmarks. You might, every project has not had their own baseline, but we encourage every single user to run your own benchmark because every single configuration is different. Every use case is different. Now, as an expert, we're going to do a quick demo on how fluent bit operates. And with the time, we're going to just explain a little bit of the configuration, how do we achieve to send in just something conservative numbers like 100,000 records per second, which is most suitable for all cases. Actually, that configuration is using file system buffering, right? Together with this hybrid mechanism. So I'm going to switch back to the terminal. Now, in my left pane here, I have the my fluent bit configuration. This is a bare metal server on a different place. And we're going to do is basically just tell one log file and send the data in JSON format to a remote HTTP endpoint, right? The remote HTTP endpoint is here on the right. And this is just a basic TLS with HTTP server that is able to understand the JSON records and also expose a primitives metrics. Okay. And on the third one, I'm going to run the load generator, which is a script that will generate around a thousand lines per second, right? Each one will have one kilobyte. Okay. So to get started, I'm going to run fluent bit. And now the configuration file. But let me check that I didn't add a full cleanup of everything. Okay. We're good to go. Now I'm going to start Prometheus because we are interested in to collect the metrics and run the HTTP server. And as soon as I start the load generator here, we are going to see that the data, the count of records will start increasing on the right. So let's start. You will see also that in the fluent bit pane right here, we're seeing a bunch of AS status messages that the data was processed. And now also you can realize on the right that the number of records keep increasing. And actually it's increasing quite constantly for a hundred thousand records. So let's verify how everything is working on the Prometheus side. So I'm going to my web browser and start typing the address of Prometheus endpoint. Okay. I'm here. I'm going to write my promql query, which will be fluent underscore records total. And we're going to use a five minutes. And we're going to graph this information. We're going to get down so we can just get the new data that we are getting now or just the latest minute. Okay. So this is information that we are processing at the moment. Actually it's quite constant. This was the start of the process. If we continue running the query, we will see that we are constantly around a hundred thousand messages per second. Usually on this scenario, we don't see any back pressure. We don't see any problems. We might see sometimes that the data goes down for a few seconds, but could be because of networking. But then usually it recovers without problem. Okay. In the normal scenario, like in a Kubernetes cluster, the data rate will be different. Actually, you might expect to have, I don't know, a couple of notes sending data to your own database. Right? It could be Elastic Splang, but there's no generic setup for each one to achieve a perfect rate. Actually, it's quite complex to determine what would be the ideal requirements from a hardware perspective. But we can see that with this way, we can process data without any major problem. Okay. So we're going to get back here to the terminal and we see that everything keeps running without any major problem. Okay. So I think that we are good to go with this. And I think that now it's time for some Q and A, some questions. So please feel free to write your questions right now in the chat or live during this session. Thank you so much.