 Hello, everyone. Welcome to FluentCon. I'm Matt Lehmann. I'm an engineer at Gray Noise. Hi, and I'm Guillermo Mangervar and I'm an engineer at Gray Noise as well. So today we're here to talk to you about noise-canceling headphones for Fluentbit, powered by Lua. And I'm going to give a quick outline about what we're going to chat about today. Just kind of set expectations there. First thing is Guillermo is going to talk about what is Gray Noise. I'm going to jump into how Gray Noise uses Fluentbit internally for our framework. Guillermo is going to talk a little bit about how to write a hello world Lua filter. And then I'm going to jump into the Gray Noise Lua filter and a real quick demo and at the end we'll have a Q&A session. So Guillermo is going to tell you a little bit about Gray Noise now. Great. So yeah, so what is Gray Noise? So Gray Noise tells you what not to worry about. And the best way to think about is think about any server that you have that is publicly facing and it has a safe open. And then you jump onto the terminal and you open that file and you look at all these random IPs probing your server. And then you ask like, well, are they attacking me? Are they a threat to me? Are they hacking me? And there are many things that happen in the internet. The internet is a super noisy place. So you have opportunistic, internet scanning devices or common business services. I think about it. Googlebot being, so they're not really a target to you. So what Gray Noise focuses on is what to tell you what not to worry about. So it helps you focus on what matters most by filtering out the noise. So that is what Gray Noise does. It allows you to look at data and look at your logs, move away the noise, focus on those targets, those threats that actually do affect you. And yeah. Great. So I'm going to tell you a little bit about how we use a little bit at Gray Noise. And so a little bit plays a pretty big role for us here. We have about a thousand sensors and growing every day for us. And for sensors, that's largely just a honeypot framework for us. And so we have a global footprint with that framework. We're in just about every country you could imagine across every provider you could imagine. And we have a hardware agnostic approach there. So we might be in a cloud provider in a VPS in a dedicated piece of hardware at a data center, or it could be even as something as small as like a Raspberry Pi. And so on all of those different pieces of hardware in all of those different places of the world, we're running a collection of open source and closed source software. And we have various different profiles of those sensors that we run. And the one thing all of those things have in common, both with the open closed source software, is they all output to log files. And so what we needed to be able to do is figure out, okay, how can we combine all of these various weird log files into some common schema that has meaning to us and that we can use to provide value into some sort of data set? And so the way we looked at that is we were like, okay, well now we need to figure out how can we on each one of these sensors, sensors, filter, transform, and validate this common schema and the logs coming out of those various open and closed source software. And then finally, route the events out to our data lake so that we can do various different types of processing, data science, etc. and get those into a format where ultimately they land in a data store where we can put an API in front of them and provide value to our users. And so Guillermo is going to tell you a little bit about Flimbit filter. Great. So in order for us to really understand, oh, catch the idea of a Flimbit filter, we need to understand kind of the ecosystem of the pipeline that Flimbit makes available to us. So in its context, we have an input, parser, filter, buffer, and then routing. And so in that pipeline, the filter plays the role of transformation so you can transform an event that comes in or you can drop an event. So it allows you to do intelligent decisions based on some criteria that the event has, or meets. Now, while Flimbit allows you to do basic filters, its power really comes by combining Flimbit with the fluid jet that it makes available to you. That's where we can write a bit more advanced logic to make determinations of certain criteria the event does and then maybe drop it or maybe transform it and pass it down to the buffer and eventually the routing mechanism. So with that in mind, we're going to now write a very simple hello world Lua filter. But before we start writing our code, we actually need to configure Flimbit to actually use our Lua filter. Now, so remember we mentioned about that pipeline. In the context of the Flimbit, we're going to look at three steps, which is the input, the filter, and then the output. Now, the input in this, for our example, we're going to use this dummy really super handy mock input that Flimbit allows you to use. And as you can tell in line four, you just generate some random message and then you actually can tell it how the frequency of it being emitted per second. Now, for our example, it's important to cache the, it's a key value, right? So it's first value, hello, second value, world. What we're really interested is the filter stanza. And here the name is actually the type of filter that we want Flimbit to use, which is Lua. The match is the flag, the tag that we actually assign in line three. Line nine tells you, tells Flimbit where to find the script. And in line 10 tells it within the script, I need you to call the function handler. And then the output just, it could be sent around, but for, you know, for our example, we're just going to use the unlock file. Cool. So now, now that we have Flimbit configured, right? So now it's ready to use. This is the, this is how simple a Lua filter looks like. If you written any AWS Lambda, this would look very, very familiar. And the reason is because there's a predefined set of inputs that you take in, predefined set of outputs that we expect for you. So the tag is the tag associated with the record. The timestamp is the timestamp of the arrival of the event in the Flimbit framework, and in the record itself. Now, if you remember from that sample from the input, the dummy input, we got first value, second value. So what we're going to do, we're going to transform this event through this, through our hello world Lua filter. So as you can tell here, the new record full value will be the first value plus some spacing. And in the second value, which should give us the hello world. And then now we're going to return it. Now, there's something to say about the return as well. Now, okay, perfect. So it's key to know the caveat to the code that we return. This is how our Lua filter communicates with the greater Flimbit framework. So if you return a negative one, that means that you're telling your Lua filter selling Flimbit to actually drop that record. If you return a zero, you're telling it that while you might have peaked at the record, you actually didn't modify it. And then if you return a one means that you actually transformed the event. Something actually changed as we did ourselves here. So the cool thing here is that there's a very simple yet elegant way for your Lua filter to communicate with the greater Flimbit I mean Flimbit framework. So now, because we're taking this, usually if you are going to write a Lua filter, you are going to deploy it to your production servers. So we actually have tests for Flimbit plugins that we have written. So we also want to show you how to write a very simple test using the Buster Lua testing framework. Now this is BDD. So if you ever written BDD test, this looks very, very familiar. If you've written Ginkgo and Go, super familiar. It's the same and more or less looks like it with some nuances. Now, here we do the same thing as we actually Lua Flimbit would do which is in line three, we load our code. We set some inputs in parameters 8 to 12. And actually then we just exercise our handler in line 14. And notice that the three events we, for the sake of this example, we actually didn't care which one inspect the record, which is the V in this context. And then we just assert that basically full value equals hello world. And that's it. So now we're testing it and you can see in the bottom how simple it is to test, to run Buster. Now you can, which means that you can configure it with your configuration management. Flimbit, you can write your Lua script. You can now have tests. So you can actually wrap this into a really nice CI workflow. They promote, they actually now only ships your code, but you can actually have some assertion with confidence that it will work. So definitely Buster is a really cool framework that complements the Lua filtering setup. Cool. So I forgot to mention earlier with the busted framework, we're also leveraging that already in grain noise for a lot of the ecosystem that we have. We've got a lot of pretty rigid tests for the various open source projects and closed source stuff that we're building on top of. We've got a robust test framework that we've leveraged, busted for, and it works really well for us. So the next thing I'm going to jump into is talking about the grain noise filter. So this was, once we started gaining that experience with Lua and learning how to leverage Flimbit in our ecosystem, we realized, hey, we could write a Flimbit plugin in Lua that would actually call the grain noise API itself. And so we started walking down that path to see how we could do something like that and what it would look like. And so the reason here that we wanted to do this is so that we could remove or deprioritize noise from your logs. And so for us, we break noise down in this specific plugin in the three categories, which are really noise, riot, or bog on. And so noise is that passively observed scanning, someone passively observed scanning or crawling the internet. In this case, we're going to be looking into an SSH log on a Linux server. And what happens a lot of times is somebody just is using like an SSH brute-forcer where they're scanning the entire internet with this collection of usernames and passwords to SSH. And it isn't a targeted attack, and it generates a lot of noise that might look like some sort of attacker when in reality, it's just, they're blasting the entire internet with this brute-force. And so riot's the second category, which stands for ruled out. And those are predefined address spaces, like Googlebot or Bingbot, where Google or Bing themselves are saying, hey, look, these slash 24s or these subnets, these are dedicated to our bots or our crawlers. And so great noise internally has a framework that goes out and collects all of those sorts of resources and constructs it into what we call our riot data set. And then the third category is the Bogon category, which is private IP space. And that might be 192.168, or it might be like a multicast IP space, or any IP space that doesn't turn out to be public IPv4 space, according to the various RFCs. So the last part is that we want or the second part here is that we want to route these events based on tags injected via the filter themselves. So those categories that we're getting back, we want to say, hey, we've extracted this IP address from a log, and then now we want to see how do we route this specific event or this log line based on these tags that we've injected via the filter. And the last piece is to talk about where the best place is to use this sort of filter. And so for us, we're using it on a single server, just on an SSH log or the off-log analytics server, because it's a good place to talk about it's a simple enough thing to implement and put into a small ecosystem to test. But really anything that's some sort of public facing asset, where you have global inbounds allowed, and it might be an API or VPN or firewall itself, where you have a lot of noise there, and you want to try to filter that down and determine what are these things that are targeted, and how do I remove that or deprioritize that. And it's important to point out on the deprioritization that that is not that it's not just dropping necessarily, and it's largely up to the user to determine, well, hey, you know, I just want to send these log events to some cheaper storage or put them in S3 and keep them out of some, you know, more expensive data store. And so for us, for this grain noise filter config, one of the first things we started doing is, you know, just like Guillermo mentioned earlier, we need to build this config. And so we started with the input section, and we used a tail input plugin, and we used the SSH log that we pointed out here. And this is just an example line showing one line of log here. And here you can see, you know, somebody's blasting a login for a user WordPress with some fake password from this specific IP address, and probably doing it, you know, across a bunch of different usernames. And so we're able to take a common regex that you can find off the internet, to be able to parse that message, whether it be syslog, or the offlog, or any sort of specific regex routine. So we were able to put that into the config, and able to chunk this up. And the part that we really care about is this last part in red here, where it starts with invalid user, because that's the thing that contains the IPv4 address. And so we extract that out, and we call it message. And it's important to pay attention to that label, because it's going to come up here in a minute. So with a quick prayer to the demo gods, we're going to transition over to a real quick demo that I'll kick off and then be able to talk in a little bit in depth about the code paste there. So we'll go into the exact folder layout, because it's pretty small here, but I'm going to go file by file. But the first thing I'm going to do is go ahead and kick off the demo, which I have make targets for. I'm just going to kick off the make run rewrite, because it does take a few minutes to get through. And we'll let that run in the background while we're talking about some of the files here. So that's just going to kick off the Docker file, and it's going to start running the fluent bit configuration based on this rewrite.com. So what we did is we built a Docker file that contained all of the various Lua resources, whether it be busted or Lua requests or the caching framework itself, to be able to build that in isolation. So it was really easy to extend various libraries and continue to grow that Lua script itself that we'll show here in a minute. We threw together a quick make file. And you can see the run rewrite target that we just ran here for this test. You can see we're just calling into Docker here. And we are targeting the rewrite config file. Looks like I had a small error there, but we'll chat about that in a second. And then if we go into the rewrite config file, we can walk through this just kind of like Guillermo did earlier. And we can see up here at the top, we have a service block that designates the port that the stats server itself is going to listen on. And we'll talk about gathering stats here in a little bit. And the parsers file where that regex is located, I mentioned earlier. The next chunk, again, is the input section where we're using the tail input configuration. We're looking at that auth log, which is a log file, an SSH log file containing 10,000 logs, 10,000 log lines. And then there are a few other attributes here that are available for config that we won't go into, but you can look in the details on the flimbit documentation page. And then next is our script. This is the important piece. This is us jumping into Lua at this point with something pointed at a grain noise Lua file. And then specifically calling into the GN filter function. So we'll jump into that here in the grain noise Lua file here and look for GN filter. And here's our function right here. This file has a bunch of other helper functions. We won't go into the details there, but just stick to this GN filter function. So again, this is just the handler. We're not using tag within this handler, but we're leveraging timestamp and record. So the first thing we're going to do is extract the IP out of that message field. And we're using IP field here, but we actually set that via an environment variable so that we can say, okay, let's just look at that red portion, if you remember earlier, and we're going to try to extract the IPv4 address out of there, which we do with a simple Lua match here. The next thing we do is then we take that IP address and we say, you know, maybe this isn't the first time we've come into this section of code. Let's first check the cache and see if we've seen that IPv4. So we look in an LRU cache, which maintains, I think it's 10,000, the last 10,000 records, and also has a time-based roll-off. And so if it's in the cache, great, we go ahead and return. There's no need to call out to the API and there's no need to even check if it's a valid IP or not because we have an exact record in there and we only set valid records in the cache. If it's not, the next thing we do is we go and we run a check IP routine that determines if it's a valid public IPv4 address and if not, basically, if it's an invalid address or if it's a ballgown address, meaning it's, you know, one of the RFCs that covers the private IPv space, IP space, or like multicast or one of the other ones that we wouldn't normally see on the internet. So if it is a valid IPv4 record, then we say, okay, let's go ahead and reach out to the Gray Noise Community API. So we can see this function here that we're calling. And real quick, I'm going to hop over to Gray Noise, the Gray Noise Community API documentation, just to walk through that. And so you can see here, there's, we have a response body that contains noise and riot booleans. And those are the only fields that we really care about in this. But if you're interested, you can hop on over to developer.GrayNoise.io and check out the Free Community API and some of the other information you can get out of it. But for this purpose, we're just going to talk about those two fields. So jumping back here, we can see that we get those two booleans out, GN Noise or GN Riot. And then we just return the record, or at that point, we only return if it's invalid. And then the next, the final step is we set that in the cache so that it's available for us the next time we come around. So it looks like our test finished. But before we jump into that, I wanted to jump into how we leveraged busted tests with this. Guillermo talked about this a little bit earlier. But this is just me coming through and essentially running that check IP handler with a valid IPv4, making sure that it doesn't come back invalid. And then checking it with a bog-on address, making sure that it comes back bog-on, the same thing with link local, and a few other like multicast addresses, et cetera. And this is just a nice example for leveraging busted to get some consistency in our Lua code. Because as you start to write it, you know, the code base grows a little bit, it's good to have test coverage in here to isolate things so that, you know, when your issues are in Lua versus when they're influenced themselves. So finally, if we hop back over to our test, it's finished now. And I can go ahead and jump over and run the stats endpoint, which curls back to that port 2020, just runs a little bash script to generate some metrics out of it. So I'm going to run that just to show that it's live. And then I'm going to jump out of this and actually go back to our slide deck to talk about the results there. And so with this one, we can see we had 10,000 records, 7,704 of those were noise, 6% were bog-on or invalid, which yielded about 77% noise. So a lot of the traffic that was hitting this specific SSH and the off log was largely noise, took about 134 seconds. And we were hitting about 75 records a second on average, which, you know, we're not winning any races here. This is obviously just for a prototype and really just to prove out all the different moving pieces and how to go about writing a plugin to do IP enrichment or some of the things that we're doing here. And then Guillermo is going to talk to you a little bit about where we're taking things next. Yeah, great. So as you guys see, we were extremely excited of finding out that our thesis that we had of taking gray noise, putting it with Fluembit made sense, the 77% of real noise from a real SSH log validates our hypotheses. Now, so what we're looking for now is actually how do we reduce the per IP IP requests? How do we actually can do offline noise caching? And we actually went back and started looking at some of the good filter, Fluembit filters that they are out there. And Matt's my GUIP has a really nice model that we actually are tailoring our production gray noise filter tours. And the other thing is, because we want, because we all use Fluembit in production and we trust it and it's reliable in scale but performant, we also look at how do we make this this plugin that you guys, they managed demo as a native plugin. So you can also easily now if you find yourself interested on what gray noise is doing, but also the need of reducing the noise in production, right? So we want to make sure that it's something that the community can use that is performant and scalable. So we actually really, really excited about this part of things and for the people to actually see how cool gray noise is. And, okay, there you go, perfect. And the last thing, the repository that Matt mentioned, it's already a public repository. Matt mentioned the Dockerfile. We actually have that image in Docker Hub, so you're actually free to use it, not only it brings all the dependencies they mentioned, the caching, but also brings Buster already. So if you want to just play around and use it as a means to get yourself acquainted with Buster, right, and testing, that's actually a good image to use. The gray noise API, so as Matt mentioned, that the input that we use for this demo is actually a free community API, which is you only have to authenticate to actually play with it. So you just can strap curl if you just want. And more importantly, if you want to look more of the other options that the gray noise API, it's supposed to see you, that's a great source for you, great resource for you to learn more about it. And also, we found this blog post that was super helpful, and Matt and I were doing the slides, we needed to put this into a shout out of how valuable that resource was. So definitely one blog post for you to go and check it out. Awesome. So I've got some links here at the end, just to our gray noise website, our GitHub repo, which has more things than just this in it, we've got a bunch of integrations and other tools that we publish and hopefully more to come there to engage the community. Our gray noise Twitter here, if anybody wants to reach out or follow us on gray noise. And then finally, our community, lost or less, our community Slack here at the end, you can sign up for there, sign up for gray noise Slack access there, that's our community Slack, Guillermo and I are both on there. And a quick shout out to our community manager, Supriya, she can help you out on there too, kind of get you engaged and maybe help answer some questions, even if it might be about this flimbit plugin. So anyways, I appreciate everyone's time, especially the fact that this may have been a little bit clunky, Guillermo and I are very far apart today. And so it was awesome, we were able to synchronize on this and be able to get through this, this chat will be available in the Q and a section