 Hello everyone, my name is Eduardo Silva and I'm here to talk about Kubernetes and logging on how to do it and write or how to try to make it right. This session is for November 19th, QConUS, and as I said, my name is Eduardo, I'm a maintainer of this project, which is called Fluent Big, which is part of the Fluent D ecosystem and it works as a principal engineer at Treasure Big. Understanding the workflow and logging is really important. If you understand how data is collected, how it's processed and then chipped, it's more easy in general to how to troubleshoot and understand where is my data, how you can process your data. So let's take a quick overview on how this works in the Kubernetes world. If you think about the cluster, this cluster has a master API, then you have all the containers, which are called nodes, and your applications are deployed in a cluster which is called POT, and POT can have one or many containers. So when you deploy your POT, you also can decide that how many replicas you can have for your own application. And sometimes this replicas does not fit in just one node, they are distributed across different nodes, and this kind of distributed environment is quite flexible. But also it generates many challenges when we try to solve logging. So if you think about this diagram that we have, how do we collect the information from different POTs and different nodes and correlate with data data? It's a complex task. Imagine that one application just generate a simple message like this Apache message, where we have an IP address, we have a timestamp, and then the HTTP message, I mean the HTTP protocol. This simple message generated by a POT needs to be processed and collected. As a set and application is generated as a POT, a POT has a definition, like a general file, right? So imagine that in the left side we have the definition for this POT, and this POT is generated in random messages. That is fine. All these messages, by default, can go to the file system, or maybe the Kubernetes environment is configured to be used with system V. So it's a bit flexible of where your data will be, but at the end we aim to solve the same problem. This information, when it's generated by the POT, then goes to a stream. It can be standard output or standard error. And depending on the configuration, if we think about file system, all this data will go to the file system, to a path like SLAM slash barbecue or containers, with the right POT name, event space, and more information. Then this message is not just a text message generated by the application, because we also have some metadata that comes with it. For example, what is the stream that this data is coming from, and at what time this data was generated? So you realize that a simple message for any kind of a node can start getting more data and more data. But after that we need to also correlate all the information. Maybe I would like to know what's the POT name or which node this data was generated from, and maybe when I deploy my POT, I set special labels to say, hey, maybe this is a production environment label or this is, I don't know, label color equals blue. Because at the end when you centralize all your data in storage, what you want to do is data analysis. And to do data analysis, if your application was distributed, was generated from different places, you need to have this special keys, or not, I don't want to say indexes, but special keys to loop for them. I want to search for all my patterns that the job, for example, equals to a bit, or labels equals blue. So how do we correlate all this information? Because all this context of labels annotations in Kubernetes are not in the same node where this data was generated. And by spec, this data is in the master API server. And this is one, and we're going to introduce one bit, one of the agents for logging that we have in the ecosystem. There are others, but we're going to talk about the fluid ecosystem where fluid it's part of it. On-date is a project that was started around 2015, originally for embedded links, but quickly evolved to the cloud space. And it's under a batch license, pretty much like FluentV. It's really in pure C language. In that decision, that was difficult years ago, we wanted to have a real optimized logging agent that was able to consume the recipe if possible and optimize memory usage as best as possible. But also, being really in the same doesn't mean that it's limited or restricted, quite the contrary. We have a platform architecture where we have more than 60 plugins, I would say that many 70 as of today, and we provide built-in security and different kind of rules that is very flexible to be configured as a pipeline. A pipeline is just the concept that we have that from an agent perspective, right, where we're collecting the data from one side, from the input, and parsing this data, this data then floats, but filters buffering, and finally it's brought it back to the destination. And destinations can be many. It can be any of the connectors that we ship with the FluentV distribution. For example, Elasticsearch, Locky, InfruxDB, Amazon S3, and Bearman. It's just a matter to take a look at their documentation. And as I said, correlation is really important. So if we do this data, like we would have a log with a log message, we have a key with a stream and a timestamp. So how do we correlate all this together in Kubernetes? Because I think our data will become something like this, right? Where our data will likely be processed. We're going to get from an instructor format to a structure format. This is just an example. We will get all the Kubernetes metadata where we can see that we have a pop name, namespace, quality, labels, and so on. So correlating this information is really important, as I said before. Because if you look at the size, it will be really hard to try to find the right information that you have at a specific moment from a specific place. And how this can be deployed? As an agent, FluentV and FluentV can be deployed as a Cypher container, or as a demo set. A demo set is a part that runs on every node of your cluster. So considering the basic use case, we just deployed FluentV as a demo set in every node of your cluster. Which of course, in the configuration, mounts the volumes to access the container logs that are in the node. So as explained before, the pods writes to the file system through the Kubernetes API, of course. And then we have the demo set consuming all the container information. But after reading that information, we need to start correlating. Also, the agents go to the master API server and retrieve all these labels and annotations. And of course, it's an expensive task. I would say that it's not that easy. But at the end, you, as a user, what you want is that the agent deploy a basic configuration and make sure that this is working in a straightforward way. And FluentV and FluentV, that's that. Once you correct all the information, then you're ready for your next step, which is to keep your data to your preferred database or storage engine. Recently, in the latest version, we have access, we have native support for possibly SQL, AWS S3, Lucky, Azure Blob, Elastic, Kafka. We have main connectors, pretty much everything that is used in real production environments. And when you think about FluentV, you have to think about that FluentX equals high performance at low cost. You can deploy any kind of tool to ship your data out of your box, but you always have to consider that as fast your data grows, the amount of data is more data you need to process. And when you process data, that is not cheap, right? You have CPU cycles. And if you're running a cloud environment where you have hundreds of nodes, or thousands of quads, it's really important that your login agent be optimized and configured for the best use case possible. Low CPU consumption and low memory consumption without low overall performance or throughput, which is really important. One concept that we have in many areas of engineering, but applied to login is a concept of back pressure. I don't know if you remember, but we just mentioned about that we have a data pipeline, right? We have the input, we have the filters, parsers, buffers, and we send it out. That sounds, oh, this makes sense. But one thing is theory, and the other is when your things are deployed in production. And when you have things production, sometimes you have neighbor outreach, you have services that doesn't respond quickly. But from the inside, but from the input side, you're still consuming data and more data. So where do you store that data? What's usually happening with this? And this, I want to explain the back pressure concept with a couple of pipes and water. So think about that you have the incoming data in the left side and the going data in the right side. So your pipe has a capacity, right? If you put more water on it, it won't go through faster. Sometimes this will start struggling a little bit, right? It will start flowing, but then we will get some problems because if we cannot process the data faster enough on the right side, and we're getting more data from the left side from the input, we are going to have a real complex scenario. So how do we fix that? By default, most of the loading agents, including the input, we just store the data in memory by default, because we operate with data in memory. But when we are sending this data out, if we cannot ship fast enough, and we're getting more data, this data start accumulating in memory. And in a Kubernetes or a containerized environment, you ended up saying, oh, the kernel has killed this container, the loading agent. And it makes sense because you were storing data as much as possible in memory. And that's why we need to offer some mechanisms to deal with this, we just call back pressure. So in order to avoid back pressure in logging, or at least in the Kubernetes use cases, it's always to implement this mechanism. For example, can the agent ask, I'm okay to ingest more data? Because if you think about the beginning of this presentation, we talk about that all our data, it's already in the file system. And actually the data will be there for a couple of minutes before the data gets rotated in most of cases. So it's okay that sometimes we can pause the data ingestion until we can flush all the data, at least in fluid data, yeah, that is possible. And by default for production environments, we always suggest enable the file system buffering mechanism. File system means that your data goes to memory, but it's always making a backup in the file system. But also, if you put a limit of how much data Fluentbit can use in the Fluentbit configuration, Fluentbit will say, hey, you are ingesting so much data, and I can face back pressure. Okay, all the new data that is coming in will go as a secondary storage with the file system. So you always keep your memory control, and you avoid this back pressure scenarios, because just imagine that what would happen if lucky or elastic is down on the other end. And you're still receiving data and you don't want to lose data. Right, you have to enable all these mechanisms. And I will say that Fluentbit are part of proof of dealing with back pressure. And we are pretty happy about that. We have to get really interesting feedback from different use cases. And I think that most of the default configuration will enable file system buffering is enough for general use cases. Now, pretty quickly, we're going to do a demo of how to deploy a Fluentbit and make a simple integration with the logist storage made by Grafana. So we're going to switch right now the camera. As part of this demo, we're going to deploy a simple pod that will generate some random messages, and then we're going to send all these messages out of the box into a lucky instance. So here on my computer, I'm running a mini cube, a Kubernetes single cluster. And as I say, we're going to deploy the first pod, which is just a dummy pod that generate apache logs messages. So we see it's already running so we can take a look at the logs. And you will see that we're generating a simple message per line, right, with the IP address, timestamp, and the other components of the HTTP request. That is fine. But right now, this is all running locally. So the next step, we're going to start up a lucky with Grafana with the whole stack. So we can access it and trying to connect the dots between the mini cube instance and also a Grafana. Since we are just starting, we can start exploring our data, but should be almost empty. So on the mini, we're getting back to the mini cube, we're going to deploy Fluentbit. The first step is we're going to deploy the contact map, which has all the relevant configuration. We're going to take a look at it pretty quickly. And the relevant part here is how we are collecting the logs. I mean, how we are shipping the logs to the lucky instance that is running locally. So it's pretty simple. We have the auto Kubernetes labels on so everything will be fine. Okay, so now we deploy the demon set, Fluentbit will be running as a demon set. So it will be able to take all the data from the node and then just back that into a Grafana. We're going to refresh here. We can go to the job. We're going to do a simple query using the job Fluentbit and just show up. So now we can see in Grafana that we have all these logs that are coming up from the main node. So if we click here, we will be able to see this is the job. And this is a log that is coming in from a mini cube, the stream, the timestamp and the nanoseconds. And now we're going to continue with that next part of the presentation and we're going to talk more about a stream processing and other capabilities. Okay, so after this demo, we're going to continue with the next part, which is called stream processing. I know you have heard this term a lot recently, but not in debugging actually maybe here familiar with Apache flame or Kafka, maybe this concept is pretty familiar with you. Stream processing as a concept is just the ability to process the data while the data is still in motion. It means that in generally cases, for example, you store your data and then you do analysis, right? That's a common pattern. But what about if while you get your data, you create a window in memory and you process the data by chance before to shift the data out. Actually, you can do many good things like distribute the data processing or get into this new market, which is called each computing or each processing. So this concept is really interesting. And in general, stream processing is made by events through networks with events through the works with events, every event can be a record. And if you have the notion that every record has at least the timestamp and the message. Like this example, we can do many things because take a look at this. This is mostly is a JSON map. I'm not saying that this is just with JSON. This is just a human readable representation. We have the concept for timestamp, a temperature as a key and values. So what about if we can do analysis over the different keys and values that is flowing through the loading agent, right? But also, if you think carefully, you will say, hey, my data is not always equal doesn't have the same structure. And that is fine, because that is what the ability for stream processing capabilities that do not enforce a fixed schema for the data. Right. It has a structure, but it's a schema list. So forget about the notion of like a database table where you have the right columns and all that stuff. Here, data is quite dynamic. But the good thing about stream processing is that you can process the data. It doesn't matter what the schema is, as far as it has some key and valid values, you can do whatever you want. And with stream processing, you can accomplish many things like fast data processing because you don't need tables indexing. Actually, you can process data before to send it out to a database or a cloud provider. I'm not saying that the stream processing is our replacement. I'm saying that stream processing is like a new way to optimize how do we take a look at our data and gather the insights. So how this works in our scenario. Initially, this is the common model that we have in the market. You have the hardware software, each one of them are generating events, right, go through the pipeline. Then you have a central stream processor. You process data and then you send data to the database or sometimes you process after the structure database. But when you have this stream processor, it's a you have to think that if you're going to create some processing rules, you need to have a query language to do key selection, key filtering, aggregation functions and so on. But what's the more important be able to do it in memory. Because if you're hitting to this, maybe you are not optimizing as much as you want. But every use case is different. I'm talking just about generic use cases. So if you think about that in the left side, we have the edge. The edge can be a Kubernetes node. And on the right side, we have the cloud environment. What we do usually in the every node model that we have is like as explained before, we have the login agent on the edge, process the logs, send it out to the stream processor or the database provider and then we get all our interesting results there. What we do also is that we do kind of some log in and extracts. What about if we make the stream processing capability as one more component of the login pipeline or stream processing on the edge. And taking this model, what we are going to propose a as a project and it's really good feedback about this is move all the stream processing capabilities to the same agent. You can be able to query all your data on the edge. You can do it on the edge. But this is quite optional. So the stream processor is quite powerful. And now we're going to jump into what is new about 1.6. And sorry, before I talk about that for stream processing, we're going to have more online talks in the community. We're going to share more details about that. And now getting back to this flow bed 1.6 was released two weeks ago and have really exciting news about this. The first one is that what are the new enterprise connectors? Enterprise connectors are a connectors that we create for enterprise services, maybe in a partnership with these companies. In this case with Microsoft, with profanity and with AWS. Actually, many companies are contributing back to flow bed nowadays. And look, the previous growing a lot. And I think that if you're looking at this conference at the moment, at this session is because you're interested more on this too. As part of filters, and as part also of the stream processing with what was thinking what how we can innovate on this what we can do differently, how we can improve data processing. And well, I don't just contribute it back and you feel that to deploy tens of flow models on fluid bed. This is quite powerful. This is not about training machine learning models. It's about how to do all the difference and put a model when there's flowing and trigger some action. If the data is flowing, it makes a match or not with a with a model deployed. So there are many use cases that we're going to explore in the future with machine learning without getting too intensive in like for example, training. This is just about the point models. And at the last month, we are hitting more than 170 million deployments as of this year. And this year has not finished yet. So this is really important to see how our destruction. I'm not saying that in general as a community, we have 170 million users. But of course, this is a fraction of unique environments that are continuously using fluid bed, but also growing in the number of nodes that this cluster has. So we are hitting a great adoption since the break started. As an enterprise option, also, we are really happy to have all this electric, the top three providers using and contributing back to fluid bed. And I'm talking about Amazon, Google Cloud and Microsoft. Also, Digital Ocean is being supported by fluid bed, one of the new offers for applications. And every service is using fluid bed behind the scenes, but also many customers of these companies that you are seeing here are using fluid bed. So in general, the synergy between as a community of end users, companies and developers is creating a great value for everybody. So in fluid bed, it's been in the loop by open telemetry. As my trainers, we are talking with continuous conversation with the open telemetry team to see how fluid bed can fit on this specific observability new space. I think it's been very good right now at metrics and traces, but the missing part is still lost. And that's where we are working today to try to bring dynamic solution that solve most of the problem for the observability agent perspective. Well, the presentation has just finished, but now we have some minutes for questions. We will be really happy if you can make some questions, tell us about if you are using fluid bed or not. We have the chat available. So please make sure to write your question, we will be really happy to answer all your questions. Well, thanks so much for coming and I hope you enjoy the conference.