 Hello, and thank you for joining me for what bit Swiss Army tool of observability suggestion. My name is Michael Marshall, and I am a senior SRE with Neiman Marcus. I'm also an avid cloud architect and developer. Before I begin this talk, I would like to do a shout out and thank you to my management team who sponsored the project. My name is Alec Nelson and Centosh Modhaker. During the talk, I will cover a little bit what it is, why I'm so excited, how I added log derived metrics functionality, how I use it as my Swiss Army tool, and then I'll let you see it in action. So I came across a challenge after building out my data observability platform with Grafana low key logs with Prometheus and Cortex metrics using Grafana UI for dashboarding and ad hoc queries. I set out to find the best tool to use to ingest the data into the platform. It needed to be low complexity and it needed to be inexpensive. What is it? Fluent bit. By definition, a log processor and a forward. It's a pipeline made up of plugins for extreme agility, flexibility. A few of those I've listed here to ingest to massage to extract or drop records. As you can see in the diagram low. And it's the best things that slice bread. So why am I so excited about it. It's fast. It's developed in C. It's developed to run in a vetted system environment. It's open source. You can look at see how it runs and contribute to it. It's container friendly. It runs great inside of a container. I run it on Fargate. It's tiny. It's very low resource usage. Along with being fast and container. It's very cost effective. I can run multiple instances of one bit, each doing its own thing, but impacting any of the others at a very reasonable cost. It has wide cloud provider adoption. I first came across it when using AWS and I came across FireLens. FireLens is a sidecar implementation of FluidVit that runs in Fargate for sending logs to directly to FluidD or in this case FluidVit and bypassing the need to send it to CloudWatchLogs. I also discovered that Google Cloud has standardized on FluidVit for its logging infrastructure. FluidVit is extremely agile and flexible, which is what makes it so attractive. It has a very active community for its open source project. It makes releases often and always welcomes additional contribution and is extremely extensible, as I will show later in my talk. So checking the boxes. It checks the box for speed, agility and cost. But it was missing a critical function that I needed. I needed to be able to derive metrics from my logs to Prometheus. So what do you do now? Well, if it doesn't exist, create it, says Henry Royce. So I did. This is how I added log-derived metrics to FluidVit. I used the GoLang SDK and I built output Prometheus metrics plugin. This is the plugin architecture that I used. I route logs through FluidVit using the tag. Then I route them to the output module, the Prometheus metrics output module. I define the metric type, the metric name, the job name, and then I push that information from Prometheus push gateway. Prometheus then comes and scrapes the Prometheus push gateway when it's configured to do so. Now I wanted to address a couple of questions. Why GoLang? FluidVit is written in C. You can do native C plugins as well. I used GoLang because Prometheus push gateway and Prometheus are both written in GoLang and made the handoff easier. Now anybody that's run Prometheus knows that push gateway is found upon. I decided to use it so that I didn't impact the tight speed at which FluidVit is running. And instead of holding the metrics waiting to be scraped, I chose to push them to push gateway as a fire and forget to allow Prometheus to come pick them up. This model works very well. This is a sample configuration section for my plugin. What I wanted to do here was create a metric, a counter metric to count the number of Southwest log records processed by this instance of FluidVit and index them as the log group name and the source database account. So the first few here are standard output parameters, the name of the plugin, the tag you're matching on, the log level. And then we have the job. The job is Prometheus job label that will show up when you do queries. The URL shows where to push the metrics, where the push gateway is. I have the metric type at this point counter. The metric name, following Prometheus naming convention. And then I have the metric constant labels and variable labels. I want to explain the difference between this. Constant labels are set static in the config file. At the time you configure FluidVit. The variable labels are extracted using reg acts from configuration above further up in the pipeline. This is a true key to this plugin. I can then create truly log derived metrics from the log lines passing through the pipeline. The ID allows me to have multiple copies of the Prometheus metric plugin in use within a single FluidVit.conf instance. They must be different. Now I'm going to go over a few of my go to plugins that I use quite a bit from my production installation. An input plugin is how you get data into FluidVit. I have a couple of external feeds coming in from upstream vendors. One of them is Syslog based and the other is a stream of raw JSON. I'm using the TCP input for the raw JSON and the Syslog input for Syslog. My on stream data is using the tail input to tail logs on application servers. There is a special input plugin which is also an output plugin called Forward. This defines the forwarding protocol in use between FluidD and FluidVit or FluidVit and FluidVit. It's for passing records between instances. I leverage this by using TLS and pushing my on-premise logs to my cloud based FluidVit. I also use it for multi-region passing data between regions. There's a second use which I've leveraged and I'll be demonstrating later in this talk is the programmatic input used using the SDK that I've listed here. Basically, it takes the logs that you want to push to FluidVit and packages them into the forward protocol and assigns a tag and pushes it into FluidVit. These are the parser plugins that I use quite often. JSON, so you can pass a JSON string and it'll return the set of QLA pairs and RegEx. RegEx uses Ruby RegEx to extract content using named captures. You'll notice that I'm using the extracted fields later in the demo in my output plugin. The filter plugins that I use quite often is grep, mainly for discarding records that I don't want ingested into my system. And the modify, which I commonly use to drop keys prior to ingestion. There's an additional one for tag manipulation called rewrite tag. Tagging in FluidVit is one of the key uses for the system. You can reroute records through the pipeline just by changing the tags or you can split the record by using rules, assign a new tag, and when you re-emit the tag it will restart at the beginning of the pipeline. You can do this, this allows you to do many different things at flexibility. Now these are my most used filter plugins. The parser using RegEx and then there's Lua. Lua in FluidVit was a game changer. There are many built-in plugins in FluidVit but when you face a situation where you just don't find support write a Lua function and you go over it. It runs very fast and it's well integrated into the product. Here's a few tips and tricks. I like to say if I can see it I can troubleshoot it. In this case I'm creating a T for the pipeline. You can add a Lua filter between pipeline sections write a function in Lua that will dump the current table standard up. This will display all the tags and keys in play as it goes through that T. Now this is very helpful for just getting a good feel for what's going on. The second tip is basically for when you're looking at examples of configurations that people have posted and documentation you'll see fields in play that you don't see a definition for and therefore they don't make sense. Having been bit by this several times I will tell you to check the parsers.com file where you'll probably find the definition. Let's see it in action. In this demo I'm going to show you the set of configurations for my internal programmatic fluent forwarding. In production this is implemented as a serverless project. I'm doing a Lambda subscribing to a log group and pushing the Lambda gets fired every time a log group receives a record and the Lambda takes it puts it into an SQS queue. On the other end I have a Lambda routing that takes the message decrypts it and puts it into fluent forwarding protocol and pushes it to low-key information as configured. I will show you what it looks like in Grafana UI. This is the fluent bit.com file that I'm using for my fly watch logs. Each of these sections is basically a part of the pipeline. In this case it starts with forward. The forward is receiving the logs from the Lambda which is calling that SQS that I show you and receives them on port 24224. Notice there's no tag matching here. In forward inputs you always tag the messages prior to pushing them to the forward. Now I know that they're tagged with CW logs fly watch logs. So the first part of my pipeline is a parser and I'm going to push the tag and I'm going to decrypt it with a reg X called CW tag. What I'm doing is when I send this data I am adding the cloud watch log group and the source account to the tag and then I decrypt it and use it in fluent bit. You can see those names there. Those are the extracted labels. The next phase is the parser on the record. If I find this JSON I'm going to return it as a name field called nested JSON along with pre and post data. I'll show you that in a moment. And then after that I'm going to clean up the nested JSON because it includes backslashes and non JSON compliant formatting. Here's my nested JSON parser. You see here I'm capturing the nested JSON into a field name nested JSON and the post prefix data. This is the Lula calling the function. It passes in the record. The record looks for the nested JSON key runs through, cleans up the JSON formatting on the record. Code one says change it when it comes back and code zero says ignore it. You'll see here the commented out section that I mentioned for diagnostics that would be helpful to dump the pipeline that's flowing through at this time. The next parser decodes the nested JSON that's cleaned up using a built-in decode field that's built into Floodbit. I'm passing it a string called nested JSON and telling it it's in JSON format and it returns key value pairs. The next one I'm using the filter modify I'm dropping option and tag keys and then I get to the output section. One way I'm sending it the low key sending the raw log low key I'm sending static labels as set here and these are my dynamic labels which I derived from using the regular access from above. The output here I'm using the Prometheus metric plugin and setting the job to Clelish logs, pushing it to my push gateway if I encounter a metric. These are my static labels and again these are my dynamic log derived metrics. I'm going to quickly show you what it looks like in Grafana. This is an ad hoc query looking for a lot of logs. There's only one log being pushed in at the moment. This is a dashboard I put together real quick. This is showing the rate of change and the actual counter. The counter is always increasing. The rate of change will increase and decrease as the rate of quantity shows up over a period of time. A higher quantity can indicate a possible issue in your network. And then these are the logs. And one last thing I wanted to point out was what it looks like in push gateway. So I'm doing CloudWatch logs. I have one log record at this time and you'll see the counter and you'll see what it's indexed by. This down here is also a syslog job that's running and it has all the different combinations with status logs. You can update it. This is being scraped by. As far as lessons learned I would say this is a good instance for each endpoint. They're cheap. Also, if you need a new feature added, there's plenty of SDKs and examples of how to do that. The tagging and tag manipulation is one of the keys to agility. Thank you for joining my talk. I hope you enjoyed it.