 So I think everyone is here who like to listen for this talk so we will start. My name is Yuri, I'm working for Lazada and I did log delivery pipeline for Lazada for last year and that's the place where I got a lot of experience with Earth's log. So I have no straight relation with this Earth's log project but because of working with it, I felt a lot of issues and I contributed some parts to Earth's log so I'd like to share my experience and how to use it properly and maybe a few ideas around that. So let's start. So this is actually what people usually think about Earth's log. So it's some kind of black magic which can transfer your Earth's log which is written to dev log or to EDP port 514 to some logs or to some remote host. Well, there is a bit more. What actually Earth's log says it is? Well, so they have a lot of inputs, they have a lot of outputs and they can do some changes in the middle. What's actual magic behind that? So we have inputs, we have queues, we have parsers, we can do some transformation and we can put it to output. And here is rule sets. So as you might remember, there was no rule set before but it was changed somewhere in 2010 maybe. So now you can have multiple rule sets, not just one. And why is good? Because you can do things like that. So in default case, when you have just one rule set, one pipeline, one stream, you should do this. You should check on every message, you should check, well, what this message is about. Is it from this program or is it from this input or is it from that file? And then you can do some decision. When you have multiple rule sets, you can bend input straight to that rule set so you don't need to check. It will save you some time and it will save you some code so you don't need to write some complex logic. It's really great here. And yeah, when you look on this configuration, you might ask, well, is it Earth's log at all? Because I know that it's some ugly strings around Earth's log conf. What's this about? How can I use this modern syntax? Well, it's actually here for a while, as I told from 2010. So Earth's log has three configuration formats for Earth's log.conf and more. So you can use, you can still use old syslog that syscal log d format which, well, we have some pattern here and when it matches, we will send this message to this file. But it's really for simple cases because when you need complex logic, you cannot use it anymore. For that, previously was introduced that ugly syntax which is fortunately now absolutely. So you don't need to write all that ugly strings anymore. Please don't use it. It will break your life. Use Rainer script which is now called advanced syntax. And all samples I will show later are in Rainer script actually. So let's check for another thing. It's queues. So queues is most confusing part of Earth's log configuration because when you need to understand that, well, I have that amount of inputs, every input produce some amount of message, how much, how long queue I need to be able to survive if I have some latency and I have some delays on my output channels. And yeah, it's really hard and I will not describe how to do it because there is documentation on site. So I will talk only about things which is not clear in documentation here. So queues or site, you can attach queue in two places which is always confusing. You can have rule set queue in this place where message heat when you got it from input. And you can have action queue. This is queue which message will heat inside of rule set before going to some output. Both of them are usable. They behaving a bit differently but well, they are usable, let's say. And every queue operating in four modes, first one is no queue actually. So it's queue in direct mode when message appears on input, it will go straight to output or to rule set the processor. So there is no actual queuing happens. But this mode have one big difference. It's done in synchronous way. So when you put it to some action and your action change that message, this is the only way when you can go later with different actions and have that modified message inside of that. Because elsewhere in all other modes, there will be just copy of message done. So you can change it but rest of outputs will not see that change. So if you need to change to modify some message, you should use direct queue mode. So for rest, I think in memory and on disk, I never used on disk queue. Well, because usually it's good idea to use in memory queue because it's much faster. And if your amount of message is so long and so big and you need some persistence for that, then it's better to use combined mode, disk assisted. So if you watch about performance, this set of ideas how to tune your rule set queue or action queue. So first one, if you realize that your message processing is not so fast as you expect, then it's better always try to increase amount of workers of rule set queue. Because in that case it can paralyze that rule set processing. It's actually instantiate multiple rule set and feed them with messages. But you should understand that in that case you cannot predict output order of messages that drawback of that. So if your speed is still not enough, you can try to increase action queue amount, action queue workers amount. But well, because action queue, action processing is about string building, so that templating thing. We got some message, we do some changes, we produce string and put it to output. So there is not so big benefit of that. If you have heavy string processing, yes, then we will have some profit. But usually just two workers is enough. In really rare case you can have more. For thing like reading to file is not usually good idea to increase workers amount. Because well, you need to log file, log, I mean, because when multiple trees trying to write the same file, well, you need to do something with that. And that overhead will eat all your profit from that. So maybe you can check, maybe it's better to use some OM file settings, tune that output. Well, for single action trade, rule set is more complicated and it's better to read this link, that documentation about this, because there are much more details. So you can find it later in slides. And let's go to modern world. So that before I was talking about earth's log things. So now let's check how we can use it in real world, in modern world. Well, usually what, in case actually stack people usually using in production right now and here I summarized what steps are done during processing of measures. So yeah, let's have a look how we can change it versus log. So let's meet your case tag. You can see that, yes, we can still read messages from file or network by using stock inputs. We can still parse it using that modules. And so it looks like we can use everything. We can do everything here. So let's check how exactly we can do that. The really unique feature of earth's log is MM normalized module, which is using Lieblok norm, which is using parse 3. It's not grog, it's not ggx. That's why it's amazingly fast. When this was actually one of selling point of earth's log when we trying to choose what we will use in our log delivery pipeline. Because we need to parse our log messages. We shaped some standard about logging and there is strict format. So we need to be able to parse them fast because we are doing it, well, we are logging a lot. And I can say now that this is really great thing. So if you need to parse a lot of message, you can consider using MM normalize in earth's log. I will show some rules how it works. So this is example of rule set file and you can see that, well, part after first colon is just message usually. So my earth API is part of message. So we just write in message how is it and replacing few placeholders by using, for example, that user colon world means that we will, we expect field user which will be a world is some space separated set of chairs. And that SRC IP is expected to be IP before address. And you can see that there is two formats which actually match. But second one is a bit different. We added a PID which is number. And what will happen if we, when message got into that rules in the MM normalize, it will check them from first to second. So in order of rules appears, which means you can have migration of log format easily. You don't need to write complex configuration. You can just add another rule which will be matched. And you can even understand which exactly rule, which rule exactly was matched by that first part after equation sign that we 0 and we 1, you will have it in special field. So you can check, yeah, in that case I got message in this format. In another case I got message in this format. And if you're collecting them, for example, to Kibana, you can check how much message were in old format and how much message in new format. So you can do some, you can understand how much API, for example, migrated to new format. And there is two way to send it further as JSON. The first one is by directly modifying JSON objects and put it, well, to rely on ERC's log actually. So we're doing some magic with fields. And after that, we are just sending that full JSON object. Well, unfortunately, do we have some laser point? No? Well, okay. So, yes, then I need to explain it. Oh, yeah, I think I can use some magic by Apple. Yeah, I can. Well, it's not open source, but still good. Okay, so you can see this part means that, well, ERC's log have some variables. And this is predefined variable which consists of all that JSON, which is actually in JSON format and keeping all fields of message. So when you go to parse it, you will save every variable here, every variable from your message here. So what we are doing below that said log time means actually we will change format of field to be ERC3339. And we can unset fields by using unset tag. So I am doing renaming of field here. It's not really usable, but and I am doing unset of field here. And then I just throwing all JSON which we save to elastic search. And, well, there is another way to do the same. When we produce in template which will keep all fields we need. So we prepare in JSON in this case. But in previous case, we just operate in variables. In this case, we operate in template. So we just enlist in all fields we need to produce. And then just using it to send data to elastic search. Okay, so let me cancel this. Okay, so another topic is reliable delivery. You all know that UDP, there is joke about UDP, but you may don't get it. Well, and actually there is the same about TCP, but it's a bit different. You can lose your message when it delivered by UDP or TCP differences. Well, for UDP you will lose it much frequently. So there is another thing is real, which is protocol. It's like TCP syslog over TCP, I mean, but with confirmation from another site. This have, well, it's understandable that is great because, well, now we have delivery which will happen always. But there is drawbacks as you can understand because you need to receive confirmation from that site which will reduce your bandwidth. And there is another, well, yeah, as a bonus you get easily configurable CLS. But another drawback is single-treated. So it looks very great when you're using documentation, but when you're starting to use it on high volume, high bandwidth logging, you realize that, well, we hit into single CPU core and we cannot get more. That's why we decided, well, it's not actually bad if you will lose one message sometimes and switch to TCP. But real P is great. It's still great. So if you have some financial data in logs, maybe we just want KZ, but still. So it's better to use real P. And if you need to configure CLS, again, it's much easier to do with real P than using all that internal things of versus log. So now we have reliable delivery, how we can understand that this works good. As you look here, metrics. It have built-in module which acts as input, which actually means you can use anything which you can use with any other input. So you can transform that message. You can use different templates or you can do some internal encounters. So feel free to do that. Here is an example of logging in JSON format. By default, that input produce plain messages, you can use configuration changes to ask this module to produce JSON. We are using JSON and, well, I will not explain every counter here. You can read that documentation. And if it's not enough for you, you can create dynamical states. For example, here you can see that counter, bunch of counters, I'd say, which will be increased on every new message with hostname. So when I receive a message, it check for hostname, and then increase that counter of message per this hostname, exactly. So you can see there is example of output when you have few lab machines and there is how it looks and statistics output. So we found it's great. We did the same for APIs, so we have statistics, how much message produced by a single API, even if we have bunch of APIs running on the same host. And there is lookup tables, which is really great. For example, if you'd like to, well, if there is any postmaster here or guys who understand how firewalls works, then you understand how with lookup tables. The good point behind that is you can reload them. So you don't need to restart whole RC log. You don't need to stop it and to start it. You can just reload single hookup table when you need to change some processing. So this is configuration, actually. So we can see that there is file, JSON file, which actually holds that table. And reload on hub means that this table will be reloaded when RC log got hub signaled. And then you can do something. You can even do call indirect by combining strings and it will act like go to that business unit A, business unit B, or if there is unknown state, the business unit unknown. So it's some kind of switch construction here. And, yeah, you can even reload it from Reiner script, which is not obvious way to use, but it got me another idea. I thought, well, if we can do things by receiving message, maybe we can do even more. And actually we can. So if you are using HAProxy, for example, you know, you can configure it by using that unique socket interface. So there is special interface for that. You can do something similar by using RC log, actually. So here you can see that we creating input socket in this path. And when we got some message, for example, if this message is reload BU, this table got reloaded. And if this message is run CMD, we will put that message into action OM proc, which start program and receive that message on SDD in. And that looks great. So we can use it for some, by some ways, which is not expected, usually, by again. Once I have used it to bring interface up, well, we had some server with some semi-broken firmware on some new interface. So I used RC log to bring it up when it goes down. And it was, it was working. It's not usual way, but you still can do that. So actually I am over of amount of ideas. So you can ask me some questions. Well, and actually I am ahead of time a bit. Yeah, we are not using logstation now. Yeah, so our pipeline looks, well, actually we are not using Kibana as well. We are using gray log, but I'd like to throw it away. Well, from my perspective right now, ideal pipeline sounds like, well, we're receiving a lot of message to RC log, delivering it by RC log to some relay host, then delivering it to pairs host, which again have RC log, which feeding message into Elasticsearch cluster directly. You can do the same with RC log, actually. You can, there is, there are modules to append GUIP data, for example, to, so, from first site it looks like RC log can cover at least 80% of your needs from logstash. The only problem there is actually clusterization, because when you have logstash you can just say, well, I have cluster of logstash, please use it. For RC log you still need to think about that high availability and how to do load balancing, but it's, yeah, that's possible, but I see no point to use logstash for that. Usually you can do the same using RC log now. Well, I guess there are still cases when logstash is better, but well, for us RC log is just enough. And we do an actual parsing right on server when our API runs and it takes just few CPU percent because it's C++, it's running, it's even C maybe. So it's running pretty fast. You don't need to run to fire that Java to take care about memory, so it's amazing. And modern versions of RC log are even better because they did some coverty scan and fix it a lot of bugs because of that. I'd say it's pretty good now. I cannot compare between RC log and C log and G because I have no experience with C log and G, but I say if I will dig by the same level of depth I did for RC log, I will find a lot as well. Okay, any other question? We can be a bit relaxed with time because there's no other session until after lunch. So I have a question, but I'll leave it at the audience first. Any other questions from the audience? Yeah. So asking about, is it possible to do loop in RC log? Well, you can configure it this way, I'd say. Well, yeah, good point, by the way, because we just got once a problem when our pipeline was full of messages. So it's stuck on receiving side. And this means that your API cannot write anymore. We are using Unix circuit for that. And yeah, everyone hit into the same problem. When I see any users of any log delivery pipeline they hit into this problem then. Because when your pipeline is full, when every queue is full, yeah, you're good. But we decided, well, we can throw messages in that case on API level and we can throw that message in this way. So any new message which come into RC log is just dropped if queue is full. So you can configure it in this way as well. We did it on both ends to protect. There's rational behavior. At some point you just can't fix the problems. You've only got so much to ask me. One piece of asking before I ask my question. We are doing, as we have done every year, a large group photo of everyone at the conference. We strongly encourage you to come join us. That'll be at the exhibition area at 12.15.