 Well, hello again and welcome everybody to yet another OpenShift Commons briefing. I think this is number 72, I want to roll here, and we are going to go to Wednesdays and Thursdays, so we'll double up soon after Red Hat Summit, which is happening in a couple of weeks. And I'm really pleased today to have the folks from Treasure Data on the call. Eduardo Silva is going to give us a presentation on Fluent D. It's a bit of an overview of Fluent D and logging with Fluent D. I thought that would be set a good baseline for everybody because it's one of the projects under the Cloud Native Foundation's umbrella now. And we are trying to make sure that we give equal time to everything from Kubernetes to Prometheus to Fluent D, and we'll go through most of the projects that are under the CNCF umbrella, hopefully in future briefings as well. So without further ado, I'm going to let Eduardo introduce himself and talk about Fluent D. You can ask questions in the chat. We're going to try and let him get through his presentation. So if you have like a burning question that's really confused, make sure you let me know. But otherwise, I'm going to save your questions for after his presentation, and then there'll be an open Q&A. So Eduardo, take it away, and thank you very much. OK, thank you again for the invitation to participate from this session. My name is Eduardo Silva. I'm from the Fluent D team at Pressure Data, and we're very close with the core developers of Fluent D, so we always try to see how we can improve logging across different areas. A few years ago, I was mostly on stand-alone services. Then we started integrated whole-topics logging with containers, and then we joined it in the cloud-native space with orchestrators like Kubernetes and other systems. So when we talk about Fluent D, we need to understand the concept of logging. In the experience, we can see that most of people do not understand very well about what logging is and why it's important. So that's why I want to introduce a little bit about it. The first thing is that when we have applications, our main goal is always to analyze how these applications are behaving. But in order to accomplish that, we need to some mechanism that allows us to collect some information from this application, otherwise, analysis will be quite difficult. So the number way to perform some analysis is that from the beginning of the application, we try to generate some log files or some any kind of log information, either in the file system or to the network. And then we try to centralize that information in some kind of common storage, so then we can do analysis. But in order to get to that point, it's quite hard. So how we can make an application to write logs is quite easy, but centralize the logs is a bit complex. So how we accomplish that? We have a concept which is called the Logging Pipeline, in which we say that if you want to take one point from logs to a storage, you need to go to different phases. Like, for example, collecting the data, parsing the data, filtering the data, buffing the data, and sending the data out to a destination. So when we mention about doing the logging pipeline, we're referring to this scope from the input to the output, and we have many phases in the middle. And this is not so easy to handle because there are many problems. For example, if you think about the different inputs that we can have, you can say that maybe the syslog messages are quite different from Apache web server log messages. So the input side needs to be quite flexible in order to understand this, and you have to implement your own parsers, your own filters, and your own buffering mechanism to have a reliability delivery mechanism for this to send the logs to the right destination. So at the beginning, a few years ago, this was quite complex because if you have many kind of inputs, you used to create your own scripts to parse the data, filter the data, and try to put that information back in some database or cloud service. But the problem is that for any kind of new input source that you have, you try to create a new script or a new program or update your Chrome jobs, and it's complex. That is a solution that does not scale in current architectures. So when we talk about logging and then we start talking about microservices, the things become more complex because, as you know, everything now, it's mostly running in containers. Applications run in containers. So each container has its own log, has application logs, and if we take this to the cluster level, it's more complex. So we cannot have small scripts guided in data if we have more than a billion of containers in a week. So the thing is how we can solve this problem. So we can say now that after a few years of work with different companies, with different users of containers and cloud solutions, we can set up who indeed is a reliable solution that fix the whole logging problem. FluentD was created initially by TreasureData and we are the primary sponsor for the project, but there are also other companies contributing to it. And we donated the project to the Cloud Native Computing Foundation all last year on November. So right now, TreasureData is a primary sponsor, but the project right now is hosted with the CNCF. So we were mentioned about the login pipeline. So we were trying to see how to fix this, and we can say that the whole login solution now is through FluentD. So FluentD allows you to take the whole logs from different inputs and centralize that information back on any kind of storage solution. So and things become easier. And despite that, FluentD is a really good solution for the problem. Also, we need to understand how things works internally. Otherwise, you cannot scale very well the solution. If you want to fix a problem, you need to understand what are the real pains. So about FluentD, we can say that FluentD is quite big right now. And despite our company and maybe other companies we maintain around 20 plugins, we can say that we have more than 600 plugins available, more than 500 made by the community. So that is a really huge win for the whole system that needs a reliable solution. FluentD also has a pluggable architecture that means that if you want to create your own filters, your own old parsers, you can create it as a simple plugins. It has built-in reliability. So as much as possible, it will try to collect the data, deliver the data, and try to do not lose the data. And we have native integration, of course, with Docker and Kubernetes. And FluentD is written in a mix of Ruby and C-language. The most critical parts, mostly for performance and data salarization, are latency. And the ecosystem, it's quite big right now. So with FluentD, you can deal with mostly of solutions and mostly of backends. For example, we can deal with Elasticsearch. You can integrate with Splunk, locally, Google Cloud Services, AWS, and many others. So the thing is that there over a year, if we get a new cloud service available, a new backend, there's always someone writing a plugin for FluentD to centralize their logs problem. And FluentD has a modular architecture. That means when they was talking about the login pipeline at the beginning, that means that also the architecture, it's pretty much aligned with that in order to try to solve the problem. As you can see on the left, we have the input plugins. The input tech cares about to collect the data. Then we have the filter plugins. Sometimes if you are collecting, for example, one terabyte of data, or I don't know, 2000 logs, maybe you don't want to process all of them. So you want to discard some of them. So the filter plugins allows you to filter data, maybe to discard some specific records, to match some ones, or also to modify the data. For example, maybe if you're collecting data from Syslog messages, you would like to append the host name or any specific detail for the matching when this Syslog service is running. So a filter plugin allows you to enrich your logs or discard data or match specific logs. Then before you want to send this and the data that is already filtered to send it to some destination, to some output, you also would like to buffer this data. You know, there's some timing associated, there's a flush time associated. So when you buffer the data, means that you're going to store temporary this data either in the memory or in the file system. Most of the really enterprise deployments accomplish this in the file system. You don't want to have everything in memory because if for some reason the process crash or the container cannot continue working, you can lose data. So most of the big deployments, big enterprise customers try to use buffer system with the file system. And every record that goes to the buffer, it has at least three specific data, as you said it. One is a timestamp, the other is the tag. A tag is a kind of label or name which is used in order to wrote the data. For example, I can tag the whole my, everything that's coming for example from Syslog, I can apply a tag, this is my Syslog. And then I can create some rule so the rotor then can decide where to send this information based on that tag to for example, to Elasticsearch or maybe to Google Cloud Platform or Amazon S3. And then we have the record. The record of course is the data which has a structure. And then we have the output plugins. So output plugins take care about to take the buffer data and transform that data to the output destination. For example, if we wanted to send information to Elasticsearch, Elasticsearch allows to ingest data over HTTP using a specific JSON format. So an output plugin take care of, take the internal record representation from FluentD and convert that to the output expected destination. And FluentD also, when we talk about structure of logging means that it takes a specific log message and try to give it some sense about it. Because just a raw message is not too relevant. But for you can think about Apache web server logs which is like a raw message but it also has a structure. So internal FluentD use some kind of JSON structure. It's not JSON because it's a binary version which in message pack format. But it has many different kind of sections. So the data becomes very easy to read. So you can understand that a message has different metadata. And FluentD and structure logging, it's working in a lot of space. We started with integrating a net login driver for Docker. It's fully integrated with Kubernetes right now. At least a FluentD is like the main aggregator for Elasticsearch and Google Cloud Platform for Kubernetes deployments. You can deploy of course another looking agents but FluentD it's by default for Elasticsearch and Google Cloud Platform. And also the Google Cloud Platform is a Stackdriver team. So Stackdriver service use FluentD as their main agent. And of course OpenChift use FluentD as their main aggregator. Which is great also, FluentD has been integrated with structure logging across different sections and different components. So right now I would like to talk a little bit about how FluentD integrates with Kubernetes. For who's familiar with OpenChift, you know that OpenChift manage use Kubernetes to manage containers and orchestrate everything. So the way that we installed FluentD inside a Kubernetes cluster is that, well, as you know, we have the concept of, for example, the API server. And we have the nodes. And each node has different ports. And ports has different containers. So FluentD is deployed as a demon set. And a demon set is just a port that runs on every node in your cluster. Because as you know, in every node that you have in your system, every node has different ports. And every port is generated in different log services. So the goal is that you have FluentD deployed as a demon set. And this demon set, so this FluentD has a mounting volume with the whole logs and start collecting the logs. But when we start collecting the logs, also we need to go beyond that. It's not just reading log files. Because as you will see in the next slide, you will see that you need to enrich the log with some kind of metadata. So when the demon set is deployed, FluentD have access to the whole logs from the node. Because it's a shared volume. And you can see that when it starts with the FluentD, it starts reading the log files from each container. It can determine it. Sorry, for each application can determine it from which container ID, which container name, these logs belongs to. But also it does a fallback resolution with the API server. Because also you would like to know what kind of metadata and what kind of, for example, labels and annotations are associated with the pod where these containers are running. So, and this is possible because a Red Hat team wrote a FluentLogin Kubernetes metadata filter. So when FluentD it is installed in Kubernetes in your own cluster, it has this Red Hat plugin. What it does is, for example, it takes the namespace, it takes the container name and go back to the API server and try to discover which kind of labels and annotations are associated with this and then this filter append that data to each log file. Sorry, to each record that it's reading. So at the end, when you are going to store the logs, FluentD already have the, for example, an association, for example, for each log for container name, container ID, labels, annotations. And that information you can visualize later on your own storage, for example, on Elasticsearch. So, if you look at a bare docker container log, it's quite simple. The log message is in the log field. It has a stream field. We decide if it was done through the standard output, the standard error, and of course you have a timestamp within nanoseconds. But when you push this into Elasticsearch, you get something more complex. You get the same log, the same stream, and you get some metadata. And this metadata was appended by the Kubernetes filter. For example, the docker ID, the Kubernetes host, the pod name, POD ID, container name, namespear ID, labels, and annotations if they exist. So we can say that FluentD is quite flexible as an architecture, but also the plugin, the filter is making a good job. Of course, that we're going to read each log files. We're not going to talk to the API server every time. So we have a local cache for every information that we're getting. And also when we have FluentD trying to deliver the data to our storage, it doesn't matter if the storage service is running inside the cluster outside. Sometime we can face some issues, some network outage, or some connection problem, or maybe some DNS problem. So, but if you remember, and you recall one of the initial graphics, you can see that we have a buffer. So when we're going to send the data, FluentD is always reading the buffer. Of course, you want to have your buffer in the file system. But if something happens, for example, it cannot deliver the data, you can configure FluentD to perform an X time of a tries or try to take a different approach for this kind of situation. As you know, on every cluster, you need to prepare for any kind of failure that you can face because you don't want to lose data and always you want to have a reliable system. A process crashes, application crashes, but the things is what you can do to deal with that. So FluentD, in the on-scase, if you cannot talk to Elasticsearch because of any external problem, it will retry until it's achieved by default. And you can specify your own interval for that. You can specify an exponential interval and the documentation, there are more options to deal with that specific thing. Also, you can say, for example, if you cannot talk to Elasticsearch, you can try to implement your own fallback mode or try to do some load balancing. So you don't, in that way, you cannot just put the whole load on just one service. And when FluentD, it's deployed in a, for example, as a demon set, you know that's a demon set for Kubernetes. It's always just a gem on file which specify how FluentD needs to be deployed. So here are the slides which you will have access later. You can see a specific open source project and repository which has like a template for the FluentD Kubernetes demon set. Kubernetes already comes with FluentD in the source code. So you have your own, sorry, you have your own demon set by default, but also we try to maintain an agnostic FluentD when it is demon set, gem on files with different rules to play with different backends. For example, with Cloudly, with for Amazon Stree, for Elasticsearch. So we have maintained more configurations that we can find in the vanilla Kubernetes source code. And now, who's using FluentD? I will say that basically it's everybody who's on this session. If you are using Open Chief, you're using FluentD in the backend. And also one of the biggest users is Microsoft. As you know, Microsoft has a system which is called Operations Management Suite, OMS, and they try to monitor each node or each service that is running in the customer's platforms. And FluentD, it's like the default agent to collect data from every service on every application that is running on those nodes. So the OMS login agent is FluentD plus some specific plugins addition. And we can find this model in different ways. For example, Google is using FluentD as a main agent, Microsoft, and well, right now it's Open Chief. In FluentD, it's a full ecosystem. I will say that it's not just a full and just a standalone service or program to collect data and send the data. Because I did not mention this, but FluentD, initially another way to communicate application, to send, sorry, to send log applications to FluentD, we have different bindings for different languages. For example, if you're writing your own application in Golang, we have a Golang package for FluentD so you can make your own application, talk directly to FluentD instead of let FluentD to consume the logs from the file system. So it can talk to the network. But also I want to say with this that FluentD has many components because we just not try to consume log files, we try to fix the problem of the login pipeline in a complete way. So FluentD is extendable and we can set that for example, when FluentD and in login mostly, you have two modes. One is log forwarder and the other is log aggregator. This means a log forwarder, I take care about to go take the logs from some point and send this log to an aggregator. An aggregator is nothing else that a full solution that has really strong buffering capabilities. So for example, a FluentD can work as both as an aggregator and as a forwarder. Also, if you use just FluentD, you're using both modes at the same times. But also, there's some problems. So if you have many nodes, this has some cost. As you know, handle logging and parse log is not cheap. This has a cost. Parse and string or applying regular expression, this is quite expensive. You have computing time and of course you can reflect that at the end of your build. If your cluster is quite big and you're running on AWS or Google Cloud Platform, you can say that you had to pay some money to sustain that. So it depends on how you configure the things. It depends how much resources you're going to consume. But what happens when you have many nodes and these nodes start to grow? So at the beginning, you start with 10 nodes, five nodes, but at some point you can have like 50. So if FluentD requires, for example, a minimum of 40 megabytes to run, the average is 200 megabytes in a Kubernetes cluster per node. If you deploy a quite a few hundred will be quite expensive. So also from a FluentD team perspective, we are trying to see how to help our end users to reduce the cost of their own deployments. So one solution that we come up is to separate in the log forwarder from the log aggregator. That means for example, FluentD works as both, but what would happen if we create a separate and lightweight log forwarder to make things cheaper? So I want to introduce very quickly to the project which is called FluentBit, which is like a child project of FluentD reading from scratch to try to solve this problem. And FluentBit is part of the FluentD ecosystem and FluentBit is also under the CNCF. So FluentBit is a solution completely written in C. It tries to be very aligned to about how the architectural FluentD is. So it supports plugin, it has built-in reliability, and it's fully event-driven. And of course that means that it does a lot of asynchronous IO operations over the network. So what's FluentD as a forwarder? The good thing is that FluentBit can also support many kind of inputs, can filter the data and also support different destinations. It has a built-in parsing support. That means that you can text, for example, unstructured text messages and give it a structure. And at the minimum, it requires no more than 500 kilobytes. Of course, if you are doing a lot of parsing and a lot of things, this requirement will increase. But it's a huge win if you have a big deployment, you have a lot of nodes, so it's times better. So the approach that we are trying to test right now, FluentBit, we have already some early adopters, is try to put FluentBit inside the most critical nodes that need performance and try to keep the memory usage quite low and make FluentBit talk directly to FluentD. So on this case, FluentBit, it's working as a forwarder and FluentD, just an aggregator. And FluentD, it's taking care to store the logs in a reliable way in their own backends. So with this model, we can have a cheap forwarding of logs. So all of this is mostly about strategy. There's not a fixed solution for each use case, but we can find that different people has different problems and different problems need to be addressed in different ways. So FluentBit aims to solve the problem of high memory consumption and try to reduce the cost in the cluster deployment. And one of the complex things about when dealing with cloud native features inside, which is called back pressure, that means back pressure is mostly, it comes in about when you get a lot of, for example, on this specific context, when you get a lot of data from your input, but you cannot flush the data out at the same rate, of course, you're going to get some back pressure because you cannot send the data out as much as fast as you would like to. So what that means, if we take this example, the water will go out, and of course you're going to get some back pressure because you cannot deal with, so the output cannot deal with that kind of data ingestion. And it happens mostly with different kinds of back ends, databases, and mostly the cloud service that you are going to, where you're going to send your logs are more, it's quite a time so slower that the time that you consume your own logs. So dealing with back pressure is quite easy if you implemented it in the right way. So fluent D and fluent B implement the back pressure solution on which it will not ingest more data until this data can be ingested. Of course, it has some pros and cons, but since in our class from our context, from our Kubernetes context point of view, this is quite simple because since we are just reading log files, the log files becomes our buffers. So we are not going to consume more log files, so we're not going to load more data from the log files until we can flush the data. And once we can flush the data, then we can ask to ingest more data from the log files again to the back ends. So with this mechanism, we can solve a back pressure. And different systems from different users are different and sometimes you cannot have just a default configuration for everybody. You always need to review how things are going to monitor things and maybe you will need to do your own configuration adjustments. So for fluent B, it has the back pressure, it has built-in security things. It also supports, it has its own filter to gather Kubernetes metadata. And we are now in the version 0.11, fluent B will turn two years in now in July. And the next version 0.12 will support a timestamp with nanoseconds, fractional seconds in nanosecond unit. Sometimes you have different logs that are generated multiple times over the same time, over the same second, and you would like to have some granularity piece over that timestamp. So fluent B will also support that and fluent B too. And well, one extra thing about fluent B is that networking and coroutines are made through different interfaces that allow to create a full service which is working not blocking. And also you can, from the plug-ins perspective, you can do a lot of network IO or perform TLS communication without blocking the main process. So everything is done through a main event loop with coroutines and different interfaces that helps the plugins. So at the moment, we don't have a full documentation how to write plugins for fluent B, but if you look at the examples that we have, it's quite straightforward. And if you want to deploy fluent B as a Kubernetes demon set, just to test how it can behave and compare with fluent B, we have a full repository and a full token image that has the same, pretty much the same configuration that the fluent D demon set. And it's by default, it also fluent B can talk directly to Elasticsearch. So you can test how your fluent B is working with Elasticsearch or either you can also put fluent B. It's not mandatory that you need to put fluent B to talk to fluent B directly. So optionally, you can make it talk directly to Elasticsearch. And if you want more information about fluent B and the project is fluent B that I owe, it's fully open source, Apache license, it's under CNCF as I said, and also the whole fluent B community is under our main Slack channel. And right now I would like to present to Anura Gupta, who's our product manager for fluent D Enterprise who's going to give a really small presentation about what are the enterprise features that are coming for fluent D. Perfect, let's see, is he on? Colin, is he muted? Let's see, maybe I've got him muted. Hey guys, I'm here, can you guys hear me? Yep, I can hear you now, thanks. Hey, perfect. My name is Anura Gupta, as Eduardo said, and I also work at Treasure Data and I am the product manager for fluent D Enterprise. So just kind of going as an overview about the messages that Eduardo said is, hey, you really need logging at a unified layer. We need something that's reliable, something modular that's vendor agnostic in the backend that's flexible and that's all great and fluent D really addresses a lot of those issues. But what we've noticed from both Treasure Data and my background from being at Microsoft is there's a lot of large scale deployments that require some additional features and security. They require support and deployment. They need some best configuration practices and SLAs associated with those backends. And so that's where a fluent D Enterprise comes in. And you can see here from the picture, it's built on that same open core fluent D platform but with some additional output plugins for Enterprise ready backends like Splunk and you have things like our own Treasure Data service. And then we're really making sure that we are adding features that are really rich for Enterprise and the security space if you wanna click next Eduardo, perfect. So pretty blanket slide here but just to go over it is N10 Security, fluent D Enterprise, there's a really powerful buffering mechanism and we've added encryption to that so that when source comes in from say a firewall not everyone can read that data or today in fluent D it's just a message pack format stored. So if you can take a look at the buffer file you have access to it you can see all the data that's flowing through. The certified Enterprise plugins, these are both source and inputs to for example, for Splunk we do a big benchmark with our fluent enterprise bits. We make sure that we can run at 1,000, 10,000, 100,000 messages per second give you performance counts, configuration around that as well as some of the CPU layered with it. World class support. So this is the makers of fluent DR with us. We have folks who have written message pack. We have Eduardo who's create fluent bit. We have a guy who sits on the Ruby security team. So we have all the layers covered from the eventing framework, the protocol to the actual application itself and whenever we make a fix or do any security scans and find vulnerabilities there we make sure that all of the stack is gonna work appropriately in these fluent D Enterprise bits. And then the last slide. Cool. Perfect timing. Thanks very much guys for doing this. There's one question that someone's just asked Dave is asking if there are pieces for correlating entries from various sources to identify event storm, root cause, analysis, et cetera. And those are enterprise features or are typically provided by third party projects or products. Right. I would put that as more of a analytic backend. So fluent D is great at unifying the log streams for correlating entries. So you can take your events from your web server, your application server, your stack traces, throw it all into one analytic backend and hopefully that one analytic backend can do some of the things you said around event storms root cause analysis. In terms of event storms there has been some cases where we think we might be able to do a little more pre-processing on the enterprise side, but I think that's a little further off than something that's available in the near term. And so since this is a project under the CNCF and it is open source, where is it better to go to fluentd.org or jump on the Slack channel if people have questions to ask where's the best place to reach out? Actually, like two years ago, it used to be mostly the mailing list. The mailing list, I would say that has a lot of traffic and most of the fluentd teams trying to respond most of the possible, but also we have seen in the last year a huge grow on our Slack channel. So in Slack channel, we have like 600 members but mostly a hundred are active. So right now in the position where you can go either to Slack of you have more complicated things and you want that more people look at your problem and the mailing list also is a good resource. And you can find it on the fluentd.org side in the support page in the community section. There's the mailing list, which is a Google group and more information about different or channels. Like could be Twitter, but most of the big issues are handled by mail list and second as Slack. And we're finding the same thing with OpenGIF Commons. We have a mailing list. People put stuff in, mostly announcements and release notices and things like that, announcements of events like this, but the Slack channel is where people are really connecting these days. And that's probably been true for us for OpenGIF Commons is Slack.org. And it's pretty active these days. So I think that's probably a good place to pause and stop on and I'll put the recording of this up on YouTube in a day or so. And if you could send me the PDF version of this slide deck, I will post that as well with that and on a blog post on blog.openship.com. What I think might be interesting too is to do a follow-on and maybe demo, excuse me, using Fluentd Enterprise and actually do some live demo showing it off too. So it might be a great thing to do. There's a follow-on in a month or so. Post Red Hat Summit, post the next OpenGIF Commons gathering, which is May 1st in Boston. So we're co-locating that. Are any of the Fluentd treasured data folks registered yet for that? Haven't seen your name on the list. It's been about 300 attendees already. So if you haven't, let me know and I will get you there if you're coming to Red Hat Summit. Oh, yeah, definitely. Yeah, let's try to coordinate that offline. Perfect. All right. Everyone on the call. There's a bunch of you there being very quiet, lurking. Is there any other questions? If not, going, going, gone. Thank you again for a wonderful overview of Fluentd. I hadn't heard about the Fluentbit part, so that's actually was really cool for me. And I hope everybody else enjoyed it and reach out and contribute to Fluentd and ask questions of honor and Eduardo if you have them. So thank you very much, guys. Thanks, Ayan. Thanks, community.