 In this video, I want to give you a short introduction to Fluendy, an open-source log-data collector. To understand what this actually means, I will first explain why we actually need logs, the challenges of collecting and consuming application logs, how Fluendy works and how it solves all those challenges, and finally how to configure Fluendy as a user. Let's say we have a microservices application deployed in a Kubernetes cluster. Two applications in Node.js, a couple of Python applications, maybe databases, a message broker and other services. All these applications talk to each other and produce log data. So each of these services is logging information about what the application is doing. Now, what are some of the information these applications are logging and why do we need this log data? This may be some compliance data, for example, like if you're required to log some specific information depending on your industry in order to be compliant. It could be for your application security, for example, to detect suspicious requests in your application by logging all access attempts with IP address and user ID, etc. or log with accessing what and when. And an obvious usage for log data is debugging your application when there is an error, analyzing all application logs to find the cause. These are some of the examples why log data is so important. Now, the question is, how do applications log this data? There are a few options. First one is applications right to a file, which is a common way of logging in applications. However, as you can imagine, it's difficult to analyze loads of data in raw log files. So it's not really for human consumption. And without user interface or visualization for this data, how do you analyze logs properly, especially across applications by checking each application's log file and trying to look for similar times to check across applications. Also logs will be in different formats coming from different applications, like the timestamps and log levels, etc. Another option could be to log directly into a log database like Elastic, for example, to then have a visualization of this data. However, in this case, each application developer must add a library for Elastic Search and configure it to connect to Elastic and send those logs. And also each developer must configure the proper format. So again, there's some challenges with this option as well. Now, what about the third-party applications in your cluster, like databases and message broker? Also, in Kubernetes, requests go through NGINX controller. So what if you want to see those logs too? Or what about system logs? You can't control how they log. So how do you collect logs from all these different data sources? All of these are challenges of collecting and consuming logs in complex applications with tons of useful data. Because you have loads of data which you can't really consume and analyze because you don't have it all in one place in a unified format to be able to visualize them properly. So lots of valuable data is kind of wasted. So what is a good solution to that challenge? A technology that lets you collect all the data regardless of where they come from and transform in a unified format all in one place so that you can then use that data, again for compliance or debugging, etc. And that's exactly what FluentD does. And FluentD does that reliably, meaning if there is a network outage or data spikes, this shouldn't mess up data collection, right? So FluentD handles such cases as well. So how does FluentD work and how does it do all of this? FluentD gets deployed into the cluster and it starts collecting logs from all the applications. It can be your own applications, third-party applications, all of it. Now these logs that FluentD collected will be of different forms and formats, right? Like JSON format, NGINX format, some custom format maybe, and so on. So FluentD will process them and reformat them into a uniform way. Now on top of that, you can enrich your data with FluentD. So you can add additional information to each log entry, like pod name, namespace, container name, and so on. So for example, you can later group logs of the same pod or logs of the same namespace. Or you can even modify the data in a log. So now you're streaming your logs from all the applications into one unified format through FluentD. What happens to these logs after FluentD processes them? Well, obviously in most cases, the goal is to nicely visualize them, right? So we can do some analysis on it. Well, FluentD can send these logs to any destination you want. This could be Elasticsearch, MongoDB, S3, Kafka, etc. Now what if you want your Python application logs to go to MongoDB Storage for data analysis and all other application logs to go to Elasticsearch? Or what if you want that Node.js logs also go to the MongoDB in addition to Elasticsearch? You can actually very easily configure that routing in FluentD, which is a great thing about FluentD because it gives you such flexibility compared to alternative tools. So you can send any data from any data source to any destination or storage. And this flexibility also comes from the fact that FluentD is not tied to any particular backend. So you have a wide choice of such destination targets without a vendor lock-in when using FluentD. Now you're probably wondering what you as a FluentD user need to configure and how you can actually use FluentD. First, you must install FluentD in Kubernetes as a demon set. Demon set is a component that runs on each Kubernetes node. So if you have five nodes, they will all have a FluentD pod running on them. You can configure FluentD using a FluentD configuration file. Now FluentD configuration may be a bit complex to get started with, but it's very powerful in terms of processing and reformatting your data. And for that you will use FluentD plugins. FluentD has tons of plugins for different use cases. First of all, you can define the data sources. These are all the applications from which FluentD will start collecting the logs. So first you configure which application logs you want FluentD to start collecting. Second, you configure how these data entries will be processed line by line. So you parse each log as an individual key value pair. You have log level, message, date, user ID, IP address, etc. And you do that in FluentD using parsers. After that you can enrich the data using record transformers. Again to have even more information on that data. Or you can even modify the data. A great use case would be if you want to anonymize personal data in the logs for data protection for example. And finally you have the output. Where should the logs go? And for each such output target there is a plugin like Elasticsearch, MongoDB, and so on. And as you see here in the example configuration file, FluentD has a concept of tags which you can use to group together logs or to filter logs. So using these tags you can say I want all logs with tag myApp to be parsed like that. Or I want logs with the tag myService to go to Elasticsearch and so on. And also using these tags you can easily filter out any unneeded logs to save resources for example. And the flexible routing that I mentioned before, that's why it's easy to configure because using these tags you can very easily configure which logs should go where. So that's basically how you can use FluentD for your logs. Now one big advantage of FluentD is its built-in reliability. When FluentD collects and processes the data it saves it on hard drive until it sends that process data to the configured output destination. This means that if FluentD pod restarts in the middle of collecting or processing the data or the whole server restarts the data will still be there. And when FluentD starts again it can pick up from where it left off. And it also means you don't have to configure any additional storage for FluentD like Redis database and so on. What can also happen is when the back end, the output target is not accessible. It can happen that elastic search is down or MongoDB isn't accessible. In that case FluentD will handle that by automatically retrying to send logs until that endpoint becomes available again. And in addition to that you can also cluster your FluentD setup to make it even more performant and highly available. I should mention here that this is one of the use cases of FluentD which is logging in Kubernetes. However, logging is a very important topic in IoT applications too or in non-containerized applications running on bare metal servers for example. And many projects are using FluentD for those use cases as well. So FluentD can be used in many different environments. If you're interested in learning more about FluentD I recommend checking out the online resources and documentation of FluentD.