 My name is Eduardo Silva and I work for a company which is called treasure data, but now we are called arm treasure data We were acquired some time ago and This this presentation is mostly about logging and the problems that exist in general For cloud environment or distributed systems and the different approaches that exist to solve the problems Okay, I work as a software engineer at this company and I'm an avantainer of this project, which is called fluent bit Maybe you are familiar with fluent D which is a CNCF and Linux foundation project a fluent bit It's like the small brother of it Okay, so let's start talking about a little bit about applications and login I know that most of you are familiar with C-slug system D system D mostly But we need to get back a little bit to understand the problems So basically when application wants to log something if you're doing login It's because you want to do data analysis basically so the way to do data analysis is Concentrate or aggregate your logs somewhere. So basically when application triggered a message To a log file or to a stream start the output or standard error Okay, that is a common pattern and in the container space We focus mostly standard output is standard error but in docker Considering that dockers like at the main as one of the main containers engine You know, everybody knows that the container but itself doesn't exist a container is just a set of rules with the kernel applied to our process When docker runs or any application is running this application trigger a message that message will be trapped by the docker engine For example, on this case a simple message that said hey Berlin Will get some metadata on it like the timestamp at what time it was created and also the stream that data is coming from And that is fine. Of course, and not everybody's happy with a some messages You know process met a Jason messages or text messages is quite expensive in terms of computing And then when you get this message in the in docker engine Docker engine said I'm going to store this message message somewhere in the file system for persistency There are other workarounds where docker you can say, please use journal D or use a different backend But the common way in Kubernetes is just to try the file system because it's that The fastest persistent way Okay, and this gets stored in the bar lift docker containers the hash of the container Slash something that look and blah blah blah So each message is appended right to the same file. So but there's one problem here. So from an operational perspective If you want to do logging you need to understand where the logs are located Once they are located you need to start parsing these log files that was created by the docker engine and Then you need to start looking up for special key fields inside the JSON maps that docket generated And maybe a pencil metadata on it. We're going to split footer so Docker runs inside Kubernetes here works on top of docker right docker is a main container engine is one thing Kubernetes on top of that try to orchestrate right and do self healing of a whole containers and applications So the goal here is just try to explain a whole login works on this scenario and the problems that we're going to get So in Kubernetes, for example, you have your application and this application will run a container But this container is group inside a concept of a pot and a pot can have multiple containers So here you start realizing that one application Can have one login format or one kind of messages a different application in a different container a different thing But both are grouped in the same pot and in a note in Kubernetes can have many ports I know it can be a virtual machine or a very metal machine instance And if you have a cluster so logging becomes more complex So that simple log file that we had in the file system is replicated so we have the same time of messages in different places and At some point if you want to do data analysis as we said at the beginning You need to try to correlate all this information together and of course doing SSH It's a sage or running the journal CTL command will not help on this kind of scenario So the login context is really relevant here in Kubernetes mostly Because if you have one look message, you know that this look message was generated for some container but this container come from a pot and that pot has an idea and also belong to a namespace and Also, this namespace. Sorry this pot was running in a note and maybe the whole namespace or this pot was Appended with some levels with some annotations and all this information Allows you to give some context because at the end if you have a distributed application and you generated levels Annotations you would like to correlate all this information back because you don't want to look at a To say elastic or any kind of database, please show me all the logs that come from pot X from node B Right, maybe your application has many replicas does many replicas were to different nodes And maybe one of the node is fairly but not all of them Or maybe you want to probably shoot specific stuff from a different node or a group of because of namespaces So how this works basically in Kubernetes you need to try to gather the whole context To solve logging that means container name container ID pod name namespace and so on But all of this information come from different places Okay, the file system contains what is relevant from the local position like the pod name the namespace and container name This information is a is appended as a metadata by the Docker engine but from a cluster perspective in The API server or the mastering Kubernetes We also have some extra information like the pot ID container ID the node name the labels and annotations So as you can see one simple message that was generated by one simple application in a container Has more information that we can imagine So we need some kind of log processor that can understand how it has been stored can understand the format and how to gather this information from different places And basically the the big thing is not the log processor I would say that the big thing is just try to correlate all your information back in a storage service Like elastic influx DB or Kafka because your end is not log processing your end is data analysis but to get from end to end you need a log processor and I understand that nobody's happy with log processors. Nobody like it, right? They are not fancy. They are not dashboards and things like that So it is pretty low level and you know everybody it's everything is about performance People say oh my look process is running slow is consuming too much memory blah blah blah blah Yeah, because we need to add filters We need to connect to different places But of course the goal of this is just not to blame the log processor Just try to see how we can make it better and better from a programming and perspective and also from a design perspective So with that said I would like to introduce fluent bit and fluent bit is a child project of which born in 2015 like three years ago and It was is this is quite fun because it was created originally for embedded Linux So at ratio data we created a flu indeed years ago We made it open source and flu indeed is really good But sometimes sometimes people complain about the it needs like 40 megabytes of memory to run because it's a mix of Ruby with C And also is not quite lightweight if you want to run it in an embedded Linux system So you're not going to run it. That's that's true. You're not going to waste that amount of memory and CPU So we're people say why we cannot create something lightweight and something different. Well, fluent bit was started but when we started fluent bit for embedded Linux the people for embedded Linux is starting using it, but Okay, it's fine. We have C-slow. We have these tools But people from the cloud space where fluent D was running or people who use lockstash to say a Why you don't add this future to fluent bit? This is really great. It's reading in C language. It's using like 500 kilobytes in memory, which is awesome Of course, if you start processing like thousands of messages your memory go up but anyways is quite lightweight and After that just three years of work and adding filters and many features We say that nowadays we have like 50,000 deployments a day Just a start from our own Docker Hub and maybe in a few days. We're reaching 10 millions since March So which is a is a huge number and I thought that was unexpected because when people started asking for fluent bit a lightweight lock processor we said, okay, maybe we're going to hit like $10,000 a month, but now it's it's growing like crazy So imagine that if somebody has a Kubernetes cluster and this cluster is spinning up a new A new note that note likely is running a file bits or lockstash of fluent D or fluent bit So fluent bit is reading in C language. It's tied to low memory and CPU footprint It has a pluggable architecture Pluggable architecture I means because when you have a look processor it's really important to understand that Is the goal is not to replace syslog not the goal to place system D or journal D As a log processor your goal is to integrate different source of information In one place and to accomplish that you need to be able to talk to TCP UDP red locks from the file system talk to system D API and so on and It has built in security as TLS because at the end when you are shipping your locks outside You're talking to third-party services could be for example stack driver could be elastic search And of course everybody but nobody wants to use plain HTTP. You want to use HTTPS or So in from an internals perspective fluent bit is pretty basic and we try to make it very simple if you start designing something that needs to scale for thousand a thousand of Notes and also to 200 cores on this machine. Maybe your design Will get a lot of problems and it's but I would say that personally it's better to optimize When it's needed and not be for hand. So basically the The design is like you have input plugins on one side where they care about how to collect the data And I mean for example, if you're going to read a log files, you need a plugin to read log files Okay, if you're going to receive messages over TCP you need an input plugin that listen for messages over TCP so The data come from the input and then we have the filters because before to ship the logs to some word You need to process that data And it runs in a single even look. Yeah, it's a one process We don't super multi-process here and then and then we ship out the logs to an output plugin And what's the relationship with between an input and an output and input take the data from somewhere? It transforms the data to the internal representation of flu embed and then the output plug-in Take care to take that information and transform that information to the format that is required by the third-party service For example elastic search has a very specific format in Jason, which has a header and then the body And for example influx DB is a totally different protocol. It works over HTTP, but it's different So in data processing One thing that's really important is how to deal with unstructured data versus structured data Because as I said you want to do data analysis But for example think about Apache locks Apache locks, you know that has like a timestamp a host a user and status code a method and a lot of information But you understand that because you are familiar with but for a computer That is just only Bytes for example, and Let me show you one example, I think I have Where's that far? Yeah, yeah, I know thank you That's what was my full same tools. Oh here is okay, so We understand that If at some point of your life you have ever deal with web service, you know that it's an IP address a timestamp and so on but for a computer That is just bytes So if you want to do data analysis and you want to get please give me the whole Access log for slash clearance, which status code is 200 204 You have two ways if you handle this as instructive data It will be quite expensive because you are parsing every single byte here But if you convert this to a structure representation, you're going to just query The information that you care about which bill will be this position and This position but to accomplish that you need a log processor Okay, and for that you have we have many different of back ends to process the data One of them is with the regular expressions So you said please a for Apache log files use a special regular expression, which will create some kind of structure Or maybe you can say all this information that is coming in is a JSON map or maybe some LTS v format And of course each message can has also its own timestamp because the message was generated at some point So and internal representation of the data is like this the input plug-in Get the data from somewhere but internally it needs to emit emit this record in the internal representation and Internal representation concede that is this is not jason. It's like but it's similar It's like an array where you have the timestamp and the message in a map But internally we use message pack. Are you familiar with message pack? Some of you message pack is like a binary jason made for salarization Okay, so we always convert the data from doesn't matter which is a format to bind binary internal representation So an input plug-in can generate many records But if you're going to flush this data to different places Sometimes you would like to group this data because you can have data coming from files from TCP UDP And maybe you would like to add some kind of notion from this data is coming from or group them by some name For example, we use the concept of tags So an input plug-in can say please so they host this record that I mean just in a patch attack Which is called Apache that be host one And for the records that I'm coming from Cislog just a pen attack, which is called Cislog And then what happens internally we have a routing system Which said okay for every plug-in that is asking for something that is started with Apache in the tag Just ready to add the records to this place For everything that is exactly match with Cislog Send it to the different output plug-in So you can take sometimes different kind of input source of data and flashing to different places or to multiple places so In the output plugins most of them rely on network IO, so they need to do a Network connection to a host and you know if you do you create a socket you trigger a connect system call Sometimes in a normal way that will will block your program Okay, but if you're using a synchronous IO that will return back, but you need some mechanism To understand when the connection was performed So we were thinking about how to solve the problem when creating output plugins Because if everybody's going to create an output plug-in likely will need network IO But we don't want to have something like the callbox hell like a node. Yes When you have an event loop you create some connection And you have to create some callbacks when some event happens on that connection Okay, because maybe you can connect to a service and the moment that you get that report back And then you are connected and you're going to write some information Maybe that socket was disconnected or you can you got some TCP problem But do you want that every developer from each output plug-in handle that that will be a mess so we need to abstract that and Also, we need to reduce the plug-in time when it's possible. Is there a way to suspend those put plugins and resume? Of course familiar with go lang and similar languages. This is not a new thing But in C sometimes a bit challenging, but it's something that can be done So the workflow is like this when the data was routed to the output plug-in You need to create a TCP connection convert the internal representation to an output representation Write the data over the network wait for a response most of the cases and return at status That is the normal workflow of every output plug-in, but if you look carefully there are places That may block my output plug-in But the things that I'm you read right now. They are blocking but It's not because my problem is doing some computing stuff. It's on the kernel site So what about if I suspend my execution when I try to do some network operation and Just let the kernel notify me back through my event loop and then resume So this is where introducing coroutines Which are totally hiding for the output plug-ins. So upon future time. We create a coroutine for the output plug-in All the network operations are currently abstracted by our own API So if you want to connect to a service just use a specific API But that API will handle internally all the errors all the suspend and resume every time that is required And that's also can be can be used it with TCP or TLS So if you're running an output plug-in and you need TLS you just use the internal API Just pass the certificates and that's all and from a network perspective You can suspend and you can return the control back every time that you want so Imagine this is like a very simple a sample code coming from the elastic search output plug-in But I want that you focus here The two things that has the red arrows are the sections that might block so The upstream connections actually say to flow embed please perform a connection to a server and once you are connected Return me a context that I can use later to flush data over over that channel Okay, here. They're not threats This is just one main process But for example here and convert format this part Is just blocking because it's competing on that moment. It's converting the data from one format to the other So here we got the connection we convert the original data that is coming in message a pack To the JSON format that elastic needs Then we use the built-in HTTP client using the connection that we get from the step in like in line eight and Then I do my own request But as you know the request Also can fail and also can take some time it depends of the service, but we don't want to block So basically what this API does is just flush the whole request Suspend and continue working Imagine that is the pretty much like a scheduler in kernel You you put some so I are cool You have the bottom half the top half you have the data flowing from one space to the other Nothing is blocking so you just suspend when you don't have anything to do at the moment and then Here at this point. Oops. This is it. I don't want to say exit Let me check So we have some return values, but of course we use a specific API to return values Because this is our core routines. Otherwise, we are going to mess up with the context of the stack Okay, so and also every plugin can return three values. Okay Means that I was able to flush my data I can say, you know what I was trying to flush my data and I got some problem Please try and of course from it has in its own retry logic because you don't want to lose data and If there's something that you cannot deal with and error means that that data will not be tried to be flush again so and from my internal perspective we have many and Helpers in the API for output plugins like upstream connection Http a client or out to Authentication timers crypto we support lua script team and going to explain that a little bit and have many more So just a few plugins that we have implemented in the project in the input side We have like tail to look to tell log files. We have key message and to just try to read kernel messages As you can as you can see this was created initially for embedded Linux because you want to troubleshoot the message from the kernel and you want to listen for message from a serial interface and we have plugins also for CPU memory this and so on since we can get message from system D from C-Slog We can filter the data which means process the data or filter out some specific information And we have output plugins to flush the same records to multiple places Kafka stack driver as yours blank and many of them and Sometimes some companies complain about this, you know, for example in case of Splank Splank has their own log forwarder So and customers say a white flu embed is promoting this or but you have this other option That's when it offers filtering which Splank forward it doesn't and so on so Just have a few minutes before to this year So let me explain a little bit how flu embed also plays a role in Kubernetes Basically, when you have a cluster you have nodes and you have pods the goal So the way to deal with logging is that you deploy flu embed as a demon set Which is a pod that runs on every node and you make flu embed pod Just read the log files from that node and of course you're going to read the whole containers information And then we need to gather the context So flu embed or flu indeed also can talk to the API server to gather the labels and whole metadata And then so it can match by the whole information such as a simple message like started like this Because something like with more context and with this information with a structure You can query all the data in a database in a better way like please show me all in the records that are Kubernetes and the pod name is all system go and also in Kubernetes We have special features with where we allow the pods to suggest a parser For example, if you created your own application that this application has a specific format Which is not json or whatever you can say you can add your specific annotation and say please use the parser call apache You can configure many many parsers a few one in a config map And also we are discussing how to expand this to support different containers for different stream But this is an ongoing conversation We support also a way to gather metrics from the log processor using permit use or just curl over HTTP This is a simple and if you want to filter your data and you don't want to create your own plugins and see You can use the Lua filter and just do a simple function in Lua in a configuration file that can filter the data for you Well, that was the talk. Thank you