 Hello, folks. Thank you for attending my presentation called Logging Operator, the Cloud-Native Fluent Ecosystem. It's not a big secret, we will talk about Fluentbit, Fluendi and Kubernetes. Just a couple of thoughts about me. I was working as an infrastructure engineer at Ustream, acquired by IBM, and after that we founded PonziCloud with my friends, and it's recently acquired by Cisco. I have several years of experience in the observability field. We've been operating Kubernetes cluster since version 1.4, and I have personally more than five years experience with the Fluent Ecosystem. At PonziCloud I wanted to pour all that experience into software, and that's what I will talk about today. So let's start at the beginning with the Kubernetes part. Kubernetes doesn't have too many options when it comes to logging. The container runtime stores each port's standard output in a file on the host disk. When a user uses the kubectl logs command, the kubelet service goes to this well-known path and reads up the logs from that file. Simple as that. And that's it. There is nothing much you can change about that. So what are the problems with this? First, the logs are stored locally on a host file system. These logs eventually get rotated, or they will fill the node disk. And second, we noticed the kubectl command is not always the way you want to consume your logs. So eventually you want to ship them to your preferred location or service. Of course, this solution raised several questions. To access the host file system, you need privileged containers. Moreover, there is no separation. This method doesn't respect the Kubernetes boundaries like namespaces or airbag. But let's see how this would look like. First, you have a Kubernetes cluster. After that, you deploy a daemon set collecting and sending logs individually. But if you realize you want to batch your logs for performance or other reasons, you will need an aggregator. Later, you may want to send a portion of logs to different endpoints as well. At the end of the day, you end up hundreds of rows of configuration and a big bunch of them are copy pasted, not mentioned managing secrets and other variables. This was our first approach as well, and we wanted to simplify the solution. The first version of the logging operator was born. The goal is to automate configuration generation from Fluentbit, Fluendi, and handle them as Kubernetes resources. With a really simple custom resource, you could save a lot of manual deployment and configuration. It was watching for resource changes and pulled credentials from Kubernetes secrets. The first version had some strong restrictions. It utilized Fluent tags for routing, and it allowed using Kubernetes Pods app labels only as tag. Even with those restrictions, it's already lifted tremendous amount of work from the operator's shoulder. It was a big success for us, and as more and more community gathered around, it helped us identify the pain points of this solution. So after validating the use cases, we set back to the panning table and drafted the second version of the logging operator. We had a couple of strong requirements in mind, namespace isolation, Kubernetes label selectors, the same way as Kubernetes series or Kubernetes command works, and multiple flow and multiple output support. So roughly a year later in September 2019, we came out with the new custom resources and new core model to be more Kubernetes-like. Let's see how this works in the logging pipeline. Note that the following principles are still the base foundation of version 3 as well. We collect all logs with Fluentbit. Our labor router plugin identifies log flows based on Kubernetes metadata. This enables the logging operator to route logs based on arbitrary pod labels. So how is this represented in the custom resource? A flow defines a login flow. You have two options. Use a flow definition with a namespace or a cluster flow across namespaces. In a flow, you can use Kubernetes-like label selectors to select the relevant logs. Empty selectors mean all pods within a namespace or within a cluster. After selecting the relevant logs, you can apply Fluentbit filters on them. In the custom resource, you can define several individual filters in order. Typical examples are parsers, Prometheus metric exporter, or JYP filter. The last section of the flow custom resource is the output reference. You can define local output for outputs in a namespace or global outputs for cluster output resources. It is possible to define more than one output for a flow. There is one missing piece of the logging pipeline. We need to define the outputs as well. Similarly to flow and cluster flow, there is an output and cluster output. When you configure the output parameters, you can inject sensitive data from Kubernetes secrets. You just need to refer to them like you would do in a pod definition. After you apply those custom resources, there is one more step the operator will do. It will create a configuration check pod. For the system integrity, only upon a successful configuration check will the operator swap the active configuration to the new configuration. This check runs on every changes on the custom resources. We arrived at the last station of the logging operator. The core concepts remain the same, but we released the version 3 on March 2020. We added several community driven new features, like not just selecting, but excluding logs from a flow. Namespace filter for cluster flows and default flow for the leftover data that haven't matched any other logging flows. I'm happy that the operator reach more than 600 github stars, have an active slack command and if you are using rancher from version 2.5 and above, you're already using logging operator as it is the default logging backend for it. Okay, so how does this look in practice? Let's do a small demo. So as a first step, I will install the logging operator using the OneEye CLI tool. OneEye is actually not just a CLI, but an operator as well. It helps us manage many observability related tools on Kubernetes. Moreover, OneEye will visualize for us the resources we create to manage the logging operator. It's an easy way to keep track of what, do, where and when in the system. The first test after install is to configure the logging resource. Configuring logging resource is pretty easy. We've got a good template for it that we can decide to edit or not and after that simply just apply it to the system. So after creating the logging resource, we can create our first output as well. Let's create a cluster output with type S3 for AWS Object Store and select a packet name like cubicle test packet and choose a region like US1 for this packet. After that, OneEye will know that I set my AWS credentials in my environment and create a Kubernetes secret from that. With that secret, it will create my output. After that, we have to create a cluster for resource. The cluster for resource need cluster output. We choose the S3 output that we just created and if you accept the template, it's just ready to go. And if you apply to the cluster, it stores the logs. Checking the created resources is pretty easy. We can use the cube to get logging command to print out all the information from the logging resources. Moreover, OneEye provides a OneEye IngressConnect command to connect to the web UI. On the web UI, we can visualize the flows, outputs that we just created and have a brief monitoring information as well. So let's see a more complex example. For this, I will install the OneEye Log Generator tool which will produce Engine's access logs in a controlled way. So now, I will set it to one request per second. After that, we will create an output and the flow to count all the requests with the response codes and expose them as primitive metrics. To achieve this, first we have to create an output. For this case, we will create a null output. This is a special case because it doesn't really transport the logs anywhere, instead just drops them. But it's really useful if you have a filter that produce output itself and you don't really need your logs. It also means that we don't have to specify any parameter for this output. We just need to apply to the cluster and it's ready to use by any of the flow we will create later. Now, we have to create the flow resource. The flow resource is really similar to the output resource. We have the same API group and we specify the kind flow and give it a name like this is our access log flow. And after that, we have to specify what filters we want to apply. As we talked about, it's an Nginx access log and FluentD has a built-in parser for Nginx, so we can use the parse type Nginx for this kind of resource. We have two more important attributes to add. The remove key name field is removing the original message from the log and the reserve data too means that we want to keep the Kubernetes metadata next to our log. After that, we specify the match section to select only the logs that match the labels app.cubernetes.io per name log generator. This will only fix the log that we really need. The last thing we have to specify is the output itself and we can use our definite output that we just created. As we finish creating the resource, we just need to apply it to the cluster. So now we have a valid output and a valid flow as well, but it's not enough to expose metrics. To do that, we have to add another filter called Prometheus. In the Prometheus filter, we have to define metrics. The metrics should have a name. Let's have a name, HTTP, response, code, total. We have to specify a description for a metric. It's a total number of requests. This is a counter type metrics, means that we count from zero to infinite the request and we add labels, code, and we add the label value from the log message, code attribute. We add a static label, the app log generator to identify our metrics. After that, we just need to update the resource on the cluster. So we have a configuration in place. Let's check if they are working. We can use the metrics service for the Flendy and check in the web browser whether the required metrics are presented. As you can see that there are already a couple of response codes available in the metrics. So we have our metrics in place, but it's more important that we have these metrics in Prometheus as well. We can open a Grafana, go to the Explorer menu and just put it in the metric name and hit search and voila. We have all the metrics. We can use the rate function and check how the log generator produces the random status codes. We have time for one more example as well. So let's create a namespace called infra and we create a new resource called Hostailer. Hostailer is a custom resource of the logging extension, another great tool embedded in OneEye. With this, we can create pods that can tell logs from the host file system. Let's call this resource a system decubalat and put it in the infra namespace we just created. This resource will be a system detailer that tells the journal log of system D. We only need to specify a couple of attributes like the disabled false, the max entries, let's say 100, the name of this tailor cubalit, this will be the container name and a system D filter that narrows down what kind of logs we want to tear from the system D. After applying the resources, it will deploy pods that will tell the logs for us to the standard output. Let's create an output to store all that cubalit logs somewhere. We will name this locky because it will be a locky output and we will put this into the infra namespace. For every output, we can define the buffering mechanism. As a locky output, we will define time key output means that every 10 seconds we will fly the messages. We want to use the UTC time keys in the messages and we use time key weight parameter two seconds to weight. And last but not least, we set the configured Kubernetes labels through that will attach the Kubernetes labels to locky metadata. The last parameter is the URL is the locky service URL we will use in the cluster. After finishing the output, we just apply it to the cluster as well. So let's move on and create the cubalit flow. It will be familiar. We use the same API groups, the kind flow. We have to specify the metadata. We name the flow cubalit and we put it in the infra namespace as well. Remember that you need to have a same namespace for the output and flow to work together. We don't really need any filters, but we want to match the locks to select the labels app.cubanities.io per name host tailors. And we specify an output for this flow. But for demonstration purposes, I will intentionally typo this output. Let's apply this resource to the cluster as well. See how logging operator handles this situation. We can see two outputs. One with red circle. This is the missing, the typo one. And one with a normal black circle. And this is the orphan output. We just didn't connect it to anywhere. When we check the configuration of the flow, we can see that yes, we are typo this output. But we don't need no fancy UI to check the problem. We can use the command light tools as well. The kubectl get logging command. We print out all the resources. And the resources with active force are the problematic ones. But what are those problems exactly? We have to check the resource status field for it. We can write a simple one either kubectl get flow hyphen n infracubular hyphen o json path and check the dot status dot problems field for them. It will print out that we didn't specify the proper output. Just for fun, I will fix this issue on the UI. Because the UI and the CLI tool works from the same resources as well. So here, I just need to fix this typo in the locking output. And everything should work as expected. Just to be 100% sure, we can go back to the CLI tool and print out all the logging resources and check if everything is fine. I know this sounds too good to be true. And logging operator is not a silver bullet. It highly depends on the underlying components and sometimes it can amplify a small problem into bigger ones. For example, debuffering. It can change the message orders which is a problem if you want to ingest logs into locking. And another problem that it's hard to see what inside. A loyalty plugin can exhaust resources and it's a huge task to find which plugin causing the problem. Fluent bit is a newer, modern architecture. But it has its own issues as well. It can face silently, for example, attaching Kubernetes metadata which will cause labor router to drop messages. And it's a bit harder to prototype because it has only a restrictive way to apply plugins. So last but not least, talk about the future of the logging operator. We have a couple of things in our mind. Advanced routing based on richer metadata or log content. Apply air back routes to different log streams and provide a logging API to have a unified experience enhanced with authentication and authorization to be easier to use logging operator as a component in a bigger system. And we are watching several interesting projects in the cloud native landscape. Open telemetry collector which speaks the fluent protocol and tremor which is an interesting approach for collecting and transporting logs as well. Thank you for listening to my presentation. Have a nice day.