 Hi there. Thank you for taking the time to attend this presentation. My name is Prana of Marla. And today I'm going to be talking to you about how you can get more out of Lua Influent Bit. Specifically, this is what I'm going to cover today. After briefly introducing myself, I'm going to talk about the motivation behind this talk, go over the basics of using Lua Influent Bit, walk through some example programs, provide some general tips, and finally wrap up with a list of helpful resources as well as my contact information. To start, a few words about myself. I am a senior software engineer at Fidelity Investments where I build solutions that allow other teams at Fidelity to monitor their applications running in the cloud. Over the course of my work, I've gained experience with a variety of open source tools, including Kubernetes, Helm, Fluent D and Fluent Bit. Most importantly, the question on everyone's minds, no, sadly, those are not my cats. Next up, let's talk about the motivation behind this presentation. As you all know, Fluent Bit has many filters that allow you to extend its functionality. Of those filters, the most powerful one is the Lua filter. The Lua filter lets you call a custom script written in a programming language called Lua on each matching record. This means that unlike the other filters, which all have very specific predefined functionality, the Lua filters functionality is essentially limitless. Since it just calls whatever program you write, in theory, you're constrained only by your imagination. In practice though, in my experience, this incredibly powerful feature is criminally underutilized. If people use it at all, they tend to limit themselves to making minor tweaks to the existing sample programs provided in the Fluent Bit documentation. To get an idea of why this is, let's look at the most recent Tiobi Index and Red Monk Rankings. They have two things in common. One, they are both measures of programming language popularity and two, Lua doesn't appear in the top 10 or even the top 20 of either of them. In fact, if we want to find Lua in the Tiobi Index, we need to go down the list all the way to number 39, Beloeda, Lisp and Cobalt. And I believe this is the crux of the issue. The average user is unfamiliar with the Lua programming language and because of that, finds it too intimidating to stray outside the guardrails of the sample scripts. Now, over the course of my work with Lua, I had to do just that and I ended up spending considerable time and effort trawling through manuals and just going through a lot of trial and error. So my hope is that by sharing what I've learned, I can spare you that effort and make your Lua journey smoother. Let's start by going over the basics. The Lua filter in Fluent Bit takes two main parameters. The name of the file, continue your Lua program as well as the name of the specific function that you want Fluent Bit to call on each record matched by the filter. For those of you who don't know what a record is, every input log is represented in Fluent Bit as a structured collection of key value pairs which we call a record. The Lua function takes three arguments which are automatically supplied by Fluent Bit every time it calls a function on a matching record. The first argument is a Fluent Bit tag associated with the record. The second is the Fluent Bit timestamp associated with the record formatted as an epoch timestamp with nanosecond resolution. And the third and final argument is the record itself formatted as a lower table. For those of you who don't know what a lower table is it's an associative array which is essentially also a structured collection of key value pairs. So for example, let's say we have the following input log representing Fluent Bit as the following record associated with the following record timestamp and the following tag. If this record matches our Lua filter then Fluent Bit automatically calls the specified Lua function on that record and supplies the associated tag, the equivalent epoch timestamp and the equivalent Lua table as arguments. This function also has a specific format when it comes to the return values. In particular, it always returns the following three values. Starting from the right, we have a Lua table representing a record and epoch timestamp and finally a return code. This return code is an integer which tells Fluent Bit what to do with a record that triggered this function call and which effectively determines whether or not the other two return values play any role at all. Specifically, if the return code is minus one Fluent Bit drops the entire record. If it's zero, Fluent Bit does nothing. If it's one, Fluent Bit replaces both the record timestamp and the record itself with the second and third return values from the function. And finally, if it's two, Fluent Bit replaces only the record itself with the third return value from the function. So now that we're all clear on the basics, let's move on to some actual programs. Let's start with one of the canonical examples provided to us in Fluent Bit's documentation, replacing the record timestamp. For example, let's say we have the following record with the following record timestamp. As you can see on the right, the record also contains another timestamp within its my time field. And we would like that timestamp to replace the record timestamp on the left. We can accomplish this fairly easily with a Lua program. First, we get the value stored in the my time field and save it to a variable, which I'll call new timestamp. Next, since you want Fluent Bit to replace the record timestamp with new timestamp, we return one as our return code. We return new timestamp as our replacement timestamp. And finally, even though a return code of one tells Fluent Bit to replace both the timestamp and the record, since in this case we don't actually want to change the record, we just return the same original record as our replacement record. All right, so now this function does what we want, but let's take a closer look at it and see if we can improve it. First of all, I don't know about you, but I can never remember which return code means what. And I'm definitely not gonna remember six months from now, which means that maintaining and debugging this code will be that much harder. So my first tip is to replace this hard-coded integer with a well-named variable. In this case, I'm going with replace timestamp and record. And this applies to all programming languages really. Your life will become a lot easier once you replace all the mysterious magic numbers in your code with well-named variables. Now, looking at this code, you might notice that since this variable is defined outside this function, it's a global variable. But what you might not know is that in fact, new timestamp is also a global variable. In other words, even though I've defined it inside this function, its scope is not limited to this function. In fact, in Lua, unlike in other languages, by default, all variables are global variables. And this brings us to my next tip. Wherever possible, use local variables instead of global variables. This is because global variables can make your code harder to reason about and more error-prone. And in Lua, it's slower to access a global variable than it is to access a local one. So how do we do this? In Lua, the way you declare a local variable is by adding the word local in front of it, which takes care of new timestamp. As for the variable above it, when you want to reference an existing global variable inside your function, a common pattern in Lua is to declare a local variable with the same name so that it shadows the global variable. So now, when we reference replace timestamp and record inside the function, we're actually referencing the local variable. And this is good because not only is it faster to access, but even if another function accidentally changes the value of the global variable, we now have a copy of its original value preserved in the local variable. Now, let's consider a more advanced example. Let's say you want to generate a new timestamp every time Flun bit processes a log and then inject that timestamp into the record as a new field. One reason you might want to do this is if your Flun bit collector is just one part of a distributed log pipeline, in which case debugging any delays in that pipeline can be tricky when the only information contained within your log is when it was first created. However, if your log contains another timestamp, letting you know when it was subsequently processed by Flun bit, then that can help you narrow down where the delay is coming from. At first glance, this seems fairly simple to accomplish with Lua. For every record, we create a new field, let's call it collector timestamp, and we set it equal to the current time. So the question then becomes, how do we get the current time in Lua? Well, there are two functions that can help us here. OS.time and OS.date. OS.time produces an epoch timestamp, whereas OS.date produces a human readable timestamp. Unfortunately, as you can see, they both share the same problem. The timestamp has only second resolution and that is definitely not precise enough for our needs. Ideally, what we really want is a nanosecond resolution timestamp, but Lua is not currently capable of producing that. So where do we get it from? Well, the solution I eventually came up with was to force Flun bit to generate it for us. To understand how this works, let's take a step back and see where Flun bit gets the record timestamp from. It obtains it in two different ways, depending on the structure of the incoming log. The first method is when you have a structured log with a well-defined time field like this. When Flun bit parses this log, it produces the following record with the following record timestamp. Note that the record timestamp is the same as the log timestamp. In other words, for structured logs, Flun bit extracts the preexisting log timestamp from the log and uses that as a record timestamp, which here represents the time when the log was created before it reached Flun bit. The second method is when you have an unstructured log without a well-defined time field like this. When Flun bit processes this log, it produces the following record with the following record timestamp. Note that this record timestamp doesn't match anything in the log. In other words, for unstructured logs, since Flun bit cannot find any preexisting log timestamp extract, it generates its own record timestamp instead, which here represents the time when the log was processed by Flun bit. So, for unstructured logs, Flun bit already does essentially what we want. The problem arises when we have structured logs, since there the record timestamp represents the time when the log was created, not when it was processed. So what do we do? Well, let's take another look at our structured log setup. Notice that at ingestion time, we configure Flun bit to parse the log with an appropriate parser, in this case, a JSON parser, since it's a JSON log. So all we need to do is get rid of that parser. And now, even though we know it's a structured log, Flun bit thinks it's an unstructured log, which means that it's going to generate a new record timestamp. So at this point, we've ensured that Flun bit is generating a new timestamp for all our records and is setting that new timestamp as the record timestamp. And if you recall, when Flun bit calls our Lua function on each record, it automatically passes that record timestamp in as the timestamp argument. This means that if we return to our initial Lua function, all we have to do is set collector timestamp to the timestamp argument. We then return the appropriate return code to let Flun bit know that we made a change the record and return the modified record as our third return value. In this case, the second return value doesn't matter, so we can just stick with the default. Now, let's see this in action. We have the following record, representing our unparsed structured log with the following record timestamp generated by Flun bit. Flun bit then calls our Lua function on this record and passes in the tag, the record timestamp and the record itself. And finally, our Lua function returns a modified record with the new collector timestamp field containing the same value as the record timestamp. So we're done, right? Well, not quite. If you look closely, although the nanosecond resolution record timestamp has nine decimal places, the collector timestamp only has six. And unfortunately, this represents a fundamental limitation. When that floating point record timestamp is converted from C to Lua and then back to C, there are some unavoidable loss of precision, which means that the original nine decimal places of the record timestamp are not preserved in the collector timestamp. So now we have a microsecond resolution collector timestamp, which is better than second resolution, but it's still not quite what we want. So what do we do now? Well, in response to this issue, the focus of Flun bit came out with a new parameter for the Lua filter called time as table. When this setting is enabled, Flun bit no longer passes the record timestamp to the Lua function as a floating point number. Instead, it passes it as a Lua table with two keys. Sec containing the integer part, in other words, everything to the left of the decimal point and Nsec containing the fractional part. In other words, everything to the right of the decimal point. Although this takes a little more work for us to parse and manipulate, it allows us to preserve the original nanosecond resolution of the record timestamp. Let's return to a Lua function and see how it works. First, we extract the integer part and the fractional part from the timestamp argument. Using the string.format function, we then put the two pieces together to reassemble the record timestamp as a string. And finally, we inject that string into the record as collector timestamp. If we return to our example, we can see that our modified function now returns a collector timestamp containing all nine digits of the record timestamp. This looks good, but there is a subtle bug in this code. To demonstrate, let's try this again with a different record timestamp. Just like before, Flun bit calls a Lua function, passes in the record timestamp as a table, and the Lua function then injects it into the record as collector timestamp. However, notice that this collector timestamp does not match the record timestamp. Specifically, it is missing the leading zero in the fractional portion. This is because up above, when the timestamp is passed in as a Lua table, the fractional portion is passed in as an integer, which means that any leading zeros get dropped. So let's go back to our function and fix this bug. Instead of including the fractional part in the timestamp string right after we extract it, let's first modify the fractional part such that if it contains less than nine digits, we pad it with leading zeros. We then include that padded fractional part in the timestamp string. And if we go back to our example, we could see that the collector timestamp now includes the leading zero and is once again a perfect match for the record timestamp. So at this point, we have successfully injected the record timestamp into the record as a field named collector timestamp. However, collector timestamp is still formatted as an epoch timestamp. And especially if you want to use it for debugging, it would be better if it was formatted as a more intuitive human readable timestamp. So let's return to our function. Instead of including the integer part in the timestamp string right after we extract it, let's first use the os.date function to convert the integer part from an epoch format to a human readable format. We then include that human readable integer part in the timestamp string. And now if we go back to our example, we can see that the collector timestamp has been converted from an epoch timestamp to a human readable timestamp. Let's zoom in and take a closer look. The collector timestamp says September 5th, 1 p.m. But note that there is no indication as to what time zone it's set in. In fact, this turns out to be the local time zone because that's what the os.date function defaults to. Now in general, I strongly recommend that whenever you're dealing with timestamps, you never set them in the local time zone. It makes interacting with systems located in other time zones very confusing. And it can even cause confusion within your own system if your local time zone has a concept of daylight saving time. Instead, I strongly recommend keeping all your timestamps in UTC, which is essentially a neutral time standard which all time zones are based on. So returning to our Lua function, we can force os.date to give us a timestamp in UTC instead of in the local time zone by adding an exclamation mark in front of the format string argument. In addition, the convention is to add the letter Z at the end of the timestamp to let people know that it's in UTC. Returning to our collector timestamp, in my case, my local time zone is four hours behind UTC, which means that instead of the local time of September 5th, 1 p.m., our corrective Lua function will now return the UTC time of September 5th, 5 p.m. So at this point, our log finally contains a human readable collector timestamp that clearly tells us when the log was processed by Fluentbit. However, we still have two issues left. One, our record timestamp still measures the time when the log was processed by Fluentbit. But now that we've successfully saved that information in the collector timestamp, we would prefer it if the record timestamp went back to its default behavior of matching the log timestamp, thus letting us know when the log was first created. And number two, our structured log is still being treated as an unstructured log. We can solve both these issues by parsing our structured log. As you recall, we purposely chose not to parse it at ingestion time. So what we want now is a way to retroactively parse it. And we can do just that with the help of a Fluentbit parser filter. Once again, we configure it to use an appropriate parser and finally, we get a nice, clean-looking record with a record timestamp that now matches the log timestamp. And with that, we are finally done with the world's longest example. So now let's go over some tips. We've already mentioned the following tips. Use variables instead of magic numbers. Use local variables instead of global ones. And for your timestamps, use UTC instead of your local time zone. Let's go over some more, starting with truthiness. In other words, whether or not a particular value evaluates us true when used in a conditional statement. Let's consider the following Lua values. The Boolean value false, nil, which is a reserved word used in Lua, used to indicate the absence of a meaningful value, zero, an empty string, and an empty table. If you're used to other languages like Python, you might expect all these values to evaluate as false. But in fact, in Lua, only the first two values that is false and nil evaluate as false. Everything else evaluates as true. So for example, if you're checking whether a table is valid before you make use of it, make sure you're clear on whether or not you want your program to consider an empty table valid, since that will affect what conditional check you choose. Next, let's talk about the contents of the main Lua function. As you recall from the beginning of this talk, the Lua filter takes two arguments. The main Lua function that gets called by the filter and the Lua file where that function is located. Till now we've dealt with relatively simple programs where the Lua file contains only the main function and that main function contains all the Lua code. However, as your programs get larger and more complex, this might not always be the most efficient approach. For example, let's say you have the following main Lua function containing code that needs to be executed repeatedly, that is for every record, as well as code that only needs to be executed once. Because the Lua filter calls this main function every time it matches a record, if the filter matches three records, then both these pieces of code will be executed three times, even though the first piece of code only needed to be executed once. However, if you move the first piece of code out of the main function, then although they will still both be executed for the very first matching record, for every subsequent matching record, only the repeated code within the main function will be executed. Thus, I recommend restricting the contents of your main Lua function to only that code that needs to be executed for every record. On a similar note, let's talk about the contents of the Lua file. Till now, we dealt with relatively simple programs where we want to execute the same Lua code on all the matching records. However, what if we want to execute different Lua code on different subsets of the records? Initially, this might still seem fairly straightforward. We create three different Lua filters with three different tags and three different Lua functions. But now the question arises, what about the Lua file? Do we store the code for all three Lua filters in one big file, or does each filter get its own separate file? Let's try both these approaches and see what happens. First, let's specify the same Lua file in all three filters. Next, let's take a look at the contents of that shared Lua file. It has the main function for the first group of records, as well as the one-time code for the first group of records. Similarly, it also has the main function and one-time code for the second group of records and the third group of records. Now, we already know that the main function will be executed every time the corresponding filter matches a record. So let's focus on the very first time that a filter matches a record, since we know that's the only time that the one-time code will also be executed. For example, the first time that the group one filter matches a record, both the one-time code for group one, as well as the main function for group one get executed, which is what we want. In addition, the one-time code for group two and group three also get executed, which is not what we want. Similarly, the first time that the group two filter matches a record, not only does the group two code get executed, but also the one-time code for the other two groups and so on for the group three code. So, as we can see, when all the lower filters show the same lower file, we end up needlessly executing unrelated one-time code meant for other groups of records. Now, let's return to our lower filters and this time specify a different lower file for each filter. Looking at the contents of those files, we can see that now each group's code is located in its own separate lower file. Now, let's run through our example again. The first time that the group one filter matches a record, the only code that gets executed is the one-time code for group one, as well as the main function for group one. No other code gets executed, which is what we want. Similarly, the first time that the group two filter matches a record, only the group two code gets executed, and the first time that the group three filter matches a record, only the group three code gets executed. So, as we can see, this time, only the code that we wanted to be executed got executed. Thus, when you have multiple lower filters operating on different subsets of records, I recommend creating a separate lower file for each filter, such that each file only contains the code relevant to that particular filter. Moving on, let's talk about some helpful resources that you can use to build upon the stock and to dive deeper into the areas that you're interested in. From the FluentBit side, you have the lower filter documentation and you have some sample lower programs. From the lower side, you have the official lower manual, but personally, I recommend starting with the programming in lower book written by one of the creators of lower as that provides a much friendlier introduction to lower than the manual does. One thing to keep in mind when consulting lower documentation is that although lower itself is now on version 5.4, FluentBit's distribution of lower is limited to version of 5.1. That's because FluentBit doesn't embed native lower. Instead, it embeds LuaJit, which is essentially a fork of Lua that is frozen at version 5.1 with a few newer features backported. So before you incorporate features introduced in newer versions of Lua into your code, make sure that there's support to FluentBit. Finally, if you'd like to reach out with any questions or comments, here is my LinkedIn profile, my GitHub page, and my email address. And with that, we are done. Thanks again for giving me your time today. I hope that you found this talk helpful. And I hope that going forward, it makes the prospect of using Lua to build in whatever additional functionality you need in FluentBit less daunting. Thank you.