 My name is Eduardo Silva. I work for this company which is called Treasuredata. And at Treasuredata, we provide a service in the cloud where you can store data. I'm not going to do a sales speech, but it's relevant for the context. So if you are selling a service where you are going to store data in the cloud, before to store the data, you need to collect the data. So in order to collect the data, we created a project which was called Fluendee, like five, six years ago. And nowadays, Fluendee is part of the CNCF. We always made the Fluendee open source because most of the background of the company's open source. So there was no business reason to keep it closed. So why not make it open? And what happens later is that we had many contributors using Fluendee for their own purposes. And that's why it has grown. And then we gave out Fluendee to the CNCF and it continued to grow. So what I do basically on Treasuredata, I'm a software engineer. And I work with the Fluendee team and also I'm a maintainer of Fluendee project, which is a project that we mentioned early this morning. And all about related to Kubernetes, cloud native, performance, connectivity. So this session, we're going to talk about logging. And as I said this morning, if we want to come up with a really good solution for logging, we need to understand the concepts behind it. It's not that easy, as I said, I'm going to install just a component, a product, and this will work magically. But at this point, it's not like that because if you have an environment which is scaling like crazy, you need to scale logging. So you need to understand how things works internally so you can adapt your own configuration, improve your performance, and make it more reliable. So monitoring exists because we want to monitor something or because we want to troubleshoot something. Logging is not just for troubleshooting because there are many applications in production which generate login information but without statistics, meaning okay, this is the data that my user is using and my mobile game application, the user from that application, that game is moving to this phase, is purchasing this kind of information, and sometimes in the server side, they use logging for that purpose. So it's also monitoring troubleshooting. But if you think about standalone application, how do they work? Basically, what they do is to write to a log file or write to a login service like syslog or syslog or write to a remote service when they chip the login message from point A to point B. But from an operational perspective, sometimes things can get complex because if you have one application or two, that's fine but if you have multiple applications in these multiple hosts, logging becomes more complex. And when we talk about the multiple hosts, multiple nodes, and those kind of things, we start talking about distributed systems and that is something very relevant for the concept of Kubernetes and how do we design the architecture for our applications. So if your application is composed for multiple services, sometimes with the couple application and make different small application, sometimes this application will run in different nodes, nodes I mean a host, a virtual machine. And this application and services also as I said before might scale. So from a login perspective, we can have many sources of information. Also each application, and I'm sure of that has their own way to do login. Think about that. You're going to have your production application, you hire one guy to develop one microservice or their team to develop a different service. And what is common is that they are not going to have the same way to do login. And I mean formats, login formats. Maybe everybody's running through the file system, which is pretty fine, it's pretty stable. But having different formats is a pain. And let a company do, please, the whole development teams are going to use the same format, it's hard. People will complain and that does not happen. That's the truth, not yet at least. So if we have also an application, different applications, and we want to do some data analysis, we want to understand that some logs come from application A, from host B, or from different service or a different cluster. So we need to add the notion of identity and that it's related to metadata. So if we jump into the cloud native space, this is okay. Cloud native is all about scaling, resiliency, but if we talk about cloud native login, so we say that this is more complex. Distributed application is one thing, running a cluster of a hundred of nodes or more, even more complex. So we understand that we have a problem. We don't know yet how to solve it, but a scale application also requires to scale login. So if we think about how to solve the problem in a cloud native space, we need to think about, okay, how can I consume the logs or the information from the application from different sources. Also, this information besides the format might have a structure or might not. If it doesn't have it, well, that means that we need to do some extra work. We need to enrich the logs with metadata, where these logs are coming from, and also we need to be able to deliver the logs to a central place for analysis, because login is not an end, right? Logging is just a tool that maybe we can use to take information from one side and centralize information on another side to do our own analysis. As I said many times, many people said that login is boring, and it is. It's login because it happened behind the scenes. It's not like a dashboard. Okay, you can do Kibana stuff, log visualization, but that is after the storage, not in the main, the main while you are collecting the data. And that's why login becomes complex, because you don't have too much visibility from that perspective. So who's familiar with web server logs? Please raise your hand. Okay, an application like a web server log, okay, for who's not familiar, if you have a browser like Firefox, Chrome, that's the client side, but also you have the server side, which is the web server. And a web server also is registering to the file system or any kind of login mechanism. Who's accessing and what time is accessing and what kind of resource is trying to access. And of course, maybe accessing a resource could be successful, maybe it failed because the resource is not there or it does not have the right permissions to access to it. If we wanna do log analysis for that special scenario, look at this. From one special case, that's fine, but if we have multiple applications in different places, that becomes hard. For example, in host A, we have three different web services. Okay, one of them imagine, there's PHP, there's Python, and the other maybe is Apache. But also we have a host B, which is running MySQL with the different formats for the logs. So if we want to unify this information and do data analysis, we'll become a lot more complex. So how do we approach this? Because different sources, different applications, different log formats, different nodes, different hosts. So if you were here in the morning, in the keynote, we describe about the login pipeline. Any kind of problem that you have even in your life or in software engineering, you need to come up with the right solution. If you understand the problem, you can implement the solution. And that is a software. A software is a component that tries to solve a problem. But before to write the software, sometimes we need to come up with a design. So the login pipeline said, okay, if you are going to work with data or any kind of login formation, what do we need is to have different phases from where we have an input and output, but in the middle, we need to have extra phases to do some extra step. For example, in the input side of the login pipeline means about how do I collect the data? From where do I collect the information? And I mean from a perspective implementation, that means I'm going to tell from log files, I'm going to listen from TCP messages, I'm going to connect to a remote server to pull information, anything. Once you collect the information, you need to parse the data. And parsing means to give a sense or a structure to this information. As I said, data can come in different formats. But if you are going to have a login pipeline, you need to take this information and try to unify to a simple format, which you can understand. And then of course you can decode that information back to the destination format, like elastic search needs a specific format, influx DB and a different format. So we need to unify this and that happens because you can parse the data. Once you parse the data, you need to filter the data. And filtering can mean drop messages because sometimes you don't want to have everything. Who's using Splunk here? One, two, three, four, five. Yeah, you didn't want, okay, I use Splunk. And that's fine, Splunk is really good. The problem is a commercial strategy, right? Splunk is a database, the private database. So, and they have, they implement also part of the pipeline. They skip the filtering, which is really fun. Why? Because when you have this Splunk database server and you run the Splunk forwarders, which chips the logs to Splunk, there's a problem. They chip the whole data. And that has a reason that you pay based on data ingestion. This is really smart. You know, it's really smart, it's really good. But for us, it's not. So that specific use case, skip the filtering part. Filtering can mean drop messages or just let pass these specific messages or also append some metadata. Buffering is such important. Think about this. If I'm going to chip my logs to a separate place, which could be a local service in my network, in my cluster, or maybe a SAS or any kind of service remote in the cloud, what would happen if I have some network outage? I cannot chip the logs. So, I had to take some decisions. Do I want to lose data? No. Some people say, yeah, that's fine. I can lose data. They have terabytes. If I lose five megabytes, it's nothing. And that's okay. But 90% say, no, we need to have buffering. Why? Because if I'm going to road the data to multiple destinations, if something happens, I want to be able to recover from that failure state and retry again. The thing is, everything can fail. And that is fine. Things fails. But you need to be able to recover from that state. So the question is, what's your default behavior when things goes wrong? And for the login pipeline, you solve it with a good buffering, even in a file system or in memory. And of course, that's our end. Be able to centralize the whole logs in just one place. And this can be elastic search, Splunk, can be Apache Kafka or any kind of storage. So, how's it going? It's the last talk. I know that everybody's tired, but let's try it. There's a concept that I mentioned before, which is called the unstructured data. When people generate logs, usually don't have a structure, okay? If you look at that red line, you will see that, oh, that is a web server log line. And then you say, okay, that has a structure. Okay, for you, it has it. Because you know that that is an IP address. You know that you have a timestamp when that message was generated, what's the HTTP method, the resource that you want to access, the URI, the protocol that was used on that moment, the return status code, and how many bytes were returned to the client. So, you understand, this has a structure. But in your mind, but if you give that string to a computer, it will say, okay, this is a byte string. That's it, there's no structure. Unless you add some kind of logic on top of that. So, if we think about this, wouldn't it be better to have a structure? Because think that your goal is to do data analysis. And if your goal is to do data analysis, you want to say, please return me back the whole records that has a status 200. And to accomplish that, we'll be better to have a structure. And here, I'm not just talking about JSON. I'm just talking about the general structure, where it can have a map, a sense of key values. So, with that information, any kind of storage engine can work better. And of course, you can optimize your computing time when you're trying to search for something. So, Structural Logs makes your life easier because you can do also filtering. You can say, please drop the whole messages that has a status 500. We don't care when the server was down for some reason. And also to perform any kind of analytics. So, from the cloud native perspective, we have a login pipeline, but also we need to be aware about distributed system. We need to have strategies for applications. And also, keep in mind that we want to centralize the logs. And that's what Fluent D is. Fluent D is an open source project under the Apache license, and now under the CNCF hands, which allows you to implement a full login pipeline in your architecture. And it does buffering, buffering and also does filtering, which is really good. So, as we said, we have more than 700 plugins available. We have reliability, security, and also, well, it's really in Ruby and C. We have some feedback sometime that's, okay, this is really in Ruby, it's not too fast, but it depends on the use case. And Fluent D is being used for more than 2,000 companies. Just the open source version, imagine that. So, Nintendo is using Fluent D. And did you play Pokemon Go? Yeah. Well, the good thing is that that was deployed with Kubernetes, one of the beta Kubernetes classes, and that was running Fluent D, too. We didn't knew until we read the news. It's not like, okay, we were working with Google to come out. No, it was not like that. You know, they are pretty quiet. So, but as we said, also Fluent D is more than a project. We consider Fluent D as a loginful, login ecosystem. But you get some feedback. You go to conference, you go to meetups, you talk to our users, you go to Slack, and sometimes they say, you know what, we need something lightweight. I have this problem. The performance is good, but can we make it better? And I think that that's a goal, right? Making things better every day. And always, I wish future A, B, C, and D. So, from that perspective, like two years ago, we decided to start a new project. To fix some missing gaps in Fluent D, where six years ago, we didn't have too much concepts about cloud native or that kind of things. Fluent D works really well, but we also think that we can make it better. We can make it better as a Fluent D, but also as an ecosystem. So, from a new ecosystem, we come up with Fluent Bit project. And Fluent Bit was started almost two years ago with the same philosophy. Just just try to implement a login pipeline, but also make it compatible with Fluent D, meaning that you can make talk Fluent Bit with Fluent D. You know that Fluent D also can talk over the network with different services, but also with other Fluent Ds. So, Fluent Bit also can take out the logs from your services and talk to a remote Fluent D or any kind of destination that is supported, of course. The highlights of Fluent Bit is that reading in C language. Most of people say, hey, why you didn't write it in Go lang? Well, the answer is we started the project two years ago, that's the first answer. Two years ago, and we wanted to have it very reliable solution for ARM architecture. And two years ago, Go lang was not really well, did not play very well on ARM. So, and also I think that if you want to build something that really skills and you know the language, because you can make C slow also. It's not just the language, it's also, if you know how to make it right, you can increase performance, you can optimize for many angles. So, that's why we choose C. It has a pluggable architecture, we support more than 35 plugins at the moment. It's asynchronous, event driven, support TLS, security, and also monitoring capabilities. And some of the plugins that we have available, we can tell log files, we can listen from messages from system D, we can parse C-slug messages, listen from messages over the network. We can also gather metrics. We were talking about this today, with some guys, one of the first plugins of Fluent Bit to get information from was from the CPU, to measure the CPU where Fluent Bit is running as a whole. Then we added memory, disk, network, and other more. Also, as part of the pipeline, we implemented filters, like Kubernetes filters. So, if you're working in a Kubernetes cluster, as I said, you want to get some notion from where the logs are coming from or where do they belong to. Because you're not going to query the logs. You're going to say, please show me the whole logs that belong to the pod name XB. But you need to have that kind of metadata. Filters allows you to parse, or to modify the records, means add your own custom information. And as an output destination, we support Elasticsearch, Inflex-Db, Kafka-Rest, and Nets, HTTP, and we're going to support Kafka now, native Kafka. And the primary focus of Fluent Bit is Kubernetes and Docker, of course, the cloud native space. It can work everywhere, but if you said, I'm using my Raspberry Pi and I have a bug, and another person said, I have a bug in Kubernetes, you know what we are going to solve first. So, we always get some pushback about that, but we try to prioritize, make some prioritization about the issues. So, let's jump into, now, we talk about logging in general, right? Now, let's talk about Kubernetes and logging, and for that, I need to give you some context, very general context, about how things works in Kubernetes. If you have the notion of an application, that you call of a cluster, you're going to deploy an application, basically, this application runs in a container, right? But a container in Kubernetes belongs to a pod, and a pod can have many containers. So, here, you start to see, okay, logging becomes complex, because you can have also many pods inside a node, a node can be a host, very meta machine or a virtual machine. And in a cluster, these start to grow. Sometimes you can spin up more nodes, drop nodes, applications, you can scale applications. So, how do we implement the login pipeline at this level? To make it right. So, in Kubernetes, there's a concept, which is called DemonSet, and a DemonSet is a pod that runs on every node of the cluster, okay? So, if you want to solve logging, the first thing that you need to do is, of course, read the logs, right? So, using FluentBit or FluentD as a DemonSet allow us, with the right configuration, of course, to read the application's log files, which belongs to the container engine. On this case, it could be Docker. So, once you read the information, what you do is to go to the API server, because you want to get some extra information from that container. Remember that the application does not know that it's running in a container. It don't care about this Kubernetes or it's running on Swarm, it doesn't matter. But, from a logging perspective, what we want to solve in an address is just to give it the notion about where these logs belongs to. So, what do we do is that every time that we are parsing the logs, every log message, we go to the API server if we don't have the information and we append the labels and annotation to that specific record. We're going to see that now in a demo, okay? So, once we collect the whole information, we centralize the output in our database or we can send the data also to multiple places. That is one of the flexibility of FluentD and FluentDepth. Okay, let's make a quick demo. Who's familiar with Kubernetes here? Raise your hand again. Okay, so I can, we can use, can you see my screen? Oh, it's too, okay. Keep CTL, get nodes, make sure it gets around. Okay. I'm running MiniCube, which is a single Kubernetes node cluster running the computer. Because I had a problem this morning with my real cluster. So, what we're going to do is something very basic. As you saw, we don't have any pods running here. But I'm going to deploy an application that will generate just a dummy message to Kubernetes, a dummy message which means to the, a dummy log message. So, I'm going to run it as a deployment. We are going to call it JSON. And this is the image. You can take it if you want. So, oh, JSON is running. So, my pod application is running here. So, just let's take, let's look at the logs. So, what this application is doing is just writing, so I have, you know, well, it's a map here, it's a JSON map. So, we are writing just every second a dummy message to the standard output. What I'm doing is I deploy an application that writes a message, and that's it. I'm querying the information with cube control. Okay, that is one thing. But of course, in a real scenario, for to see the logs of your application, maybe you're not going to use cube CTL logs. You're going to ship the logs somewhere and then do something or a magic, black magic, okay? So, the application is running, and what I want to do right now is to centralize these logs in a place. And for the purpose, we are going to use elastic search. Okay, so this is the, watch, TN2. Okay, what I'm doing is like every second, I'm querying the elastic search HTTP endpoint to see if we have some index or we have some documents or some information there, okay? There is nothing. Okay, so what I'm going to do now is to deploy fluent bit, which will allow me, with the default configuration that I just ingested, of course, just to take out the logs of the whole pods or application that are running and ingest that information back into elastic search. So, if you want to see the demon set file, it's a demon set. Here we have the image, right? Very basic. We are talking to elastic search, which is my local host, because it's mini-cube, that's the magic IP, the TCP port address, and that's it. There are the volumes that we need to learn to read the application logs, okay? So, cube CTL, create. Okay, cube CTL, get on. Okay, so fluent bit is already running. And if everything is right, I'm going to see some logs here, as is refreshing. Okay, if you see every second or every two seconds depends on the delay of the buffer, you are going to see that the number of documents is increasing. That means that something is ingesting data into my elastic search. Well, we assume that is fluent bit, right? It is, so that's why I show you that there's nothing there. Okay, so, if my memory, I mean the computer memory allows me, I'm going to try to run Kibana, which is a log visualization tool. I'm not an expert on Kibana, but I want to show you how the logs looks like in Kibana. Cube comb, oh, service Kibana start. Okay, so let's open Google. So this should be that support. Kibana, if you're not familiar with this, I just ingested the data into elastic search, which is a storage in the database, but also they have another component from the ecosystem, which is called Kibana, which allows you to generate some graphics or query the information in the database. I'm going to create this, oh, I need, it's just fresh, okay, create. Our default button, okay, I'm going to switch this to auto-refresh every five seconds. Okay, here some logs are coming in, okay? But what I want to show you is a structure of the logs. Can you still see? Okay, so here I'm going to do a query. Kubernetes that, oh, and I'm going to pause. Okay, what I did is I told Kibana, please show me the whole logs that comes from Kubernetes and have some pod name, which start with JSON. Do you remember that we started a pod called JSON in the beginning? Okay, that's it. So here we have many records, I'm going to open this one, and I'm going to open the JSON version. Okay, what I want to tell you is that every single line that was generated, very simple at the beginning in the terminal, now gets a bunch of information that also adds a notion about from where and what this log means. If you look at here, we have a Kubernetes metadata. And check this, this is the pod name, the namespace, the pod ID, the labels, and all that information, and that is really important. So if you're going to do logging for Kubernetes, you're logging agent, it doesn't matter if you use Fluent D, Fluent B, it or not, but you need to take care about metadata because that is what you need in order to query the information. And if you look carefully, this is the message that it was printing to the screen. But also the logging tool was configured to decompose that message and make the fields and expose them in a different way. So for example, I'm looking at the record which printed the number 304 and also has that word. Okay, so that is a goal of any kind of logging solution. Be able to take out the logs, implement all kind of logging pipeline, work reliable, and also append metadata. And for Kubernetes, this is a must. You cannot chip logs without this. So that was the demo. And what's next? From a Fluent Bit perspective, we are releasing the version 0.13 at the end of the year, well actually December 20th before Christmas. So, and it comes with a huge news that it comes with native support for Prometheus for monitoring. So you can monitor the logging pipeline of Fluent Bit where my records are parsed, how many bytes I'm consuming, how much I'm parsing, dropping and sending out. And also full integration with Apache Kafka. All of those are functional but we are just doing some extra checks, extra tests. And, well, thank you.