 So the next talk is scaling your logging infrastructure about using Syslog NG by Peter Tsarnik. Hi, I'm Peter Tsarnik from Hungary, community manager at Barabit, upstream developer of Syslog NG. I'm doing packaging support and advocacy of Syslog NG. So first of all, what is logging? It's a recording of events on a computer. Just think about an SSH login message, what you can see under bar log messages. And what is Syslog NG? It's an incensed logging daemon with a strong focus on high performance central collection. You might ask, why central logging? First of all, it is of use. It's one place to check instead of many for log messages. It's availability, even if the sender machine is down, you can check your log messages and figure out why the sender machine is down. And it's also security, as logs are available, unmodified even if the sender machine is compromised. There are four main roles of Syslog NG. First of all, it's a collector of messages. It can also process them, filter them, and at the end, either store it locally or forward it somewhere. The first role is log collection. Actually, it's data collection as you can collect system messages, application logs, or any kind of text messages. And this can provide quite useful contextual data for either side. Syslog NG is a multi-platform application and can collect messages from a wide variety of platform-specific log sources like Devlog, on older Linux machines, Jornal, or Sunstreams, and so on. As a central log collector, it can speak all of the different Syslog protocols, the legacy one, the new one, over UDP, TCP, and encrypted connections, and can collect logs from applications in many ways, through files, sockets, pipes, and even from application output. The next and in my view, the most important role of Syslog NG is message processing. It can normalize, classify, and structure log messages with built-in parsers, like CSV or JSON messages, or any kind of messages with pattern DB, or key value parser, which is mostly for firewall messages, and so on. It can also rewrite log messages. You don't have to think about falsifying messages here, but, for example, for compliance reasons, you might need to anonymize log messages, and it can also reformat messages using templates. For example, if the destination needs a specific format, for example, ISO date, or JSON, and last but not least, enriching data is getting more and more important in Syslog NG. You can use GYIP to add geographical location to your log messages, and Syslog NG can create additional fields to your log messages based on the message content as well. The next role is data filtering. It has two main uses. First of all, it's discarding surplus log messages. For example, you don't want to store all of your debug-level messages. And the other one is message routing, so making sure that only login events are going to your SIEM system. There are many possibilities for doing this. It can be based on message content and parameters. You can use comparison, white cards, regular expressions, and so on. And the best of all is that any of these can be combined using Boolean operators. So really complex filters can be done inside Syslog NG. Finally, we need to store log messages somewhere. Traditionally, it's done in a flat file on the local file system under the dialog directory, or sent to a central Syslog server using one of the Syslog protocols. Recently, many big data destinations were added to Syslog NG to enable scaling of the logging architecture. So you can now store messages to Hadoop, to different NoSQL databases like MongoDB or Elasticsearch, or push your messages to messaging systems like Kafka as well. Log messages on an average Linux system usually come in a date, hostname, and text format. You can see here another SSH login message, and you can see that the text part is practically an English sentence with some variable parts in it. It's quite easy to read by a human, but it's very difficult to process if you want to create reports from it, for example. There is a solution for this problem. It's called structured logging. In this case, events are represented as name-value pairs instead of free-form text messages. Coming back to my favorite SSH example, you can see that it can be described using name-value pairs, application SSHD, user root, source IP, and so on. The good news is that Syslog NG has a project started with name-value pairs inside right from the beginning. To be able to reformat messages, to be able to filter them, date, facility, priority, program name, and so on, everything was represented as name-value pairs from the beginning. There are a growing number of parts inside Syslog NG which can turn unstructured and some of the structured data like CSV or JSON into name-value pairs, which can be used later on also for filtering. I already mentioned different big data destinations as a way of scaling your logging architecture. Another one is changing how you create your logging architecture. Traditionally, that was a single central server and all of the clients were sending messages to this one. If the processing was done on this single server, if you have a larger network or multiple departments to collect from, you can introduce intermediate machine, the relay. You don't send directly all of your messages to the central server, but send messages to the relay, process those messages on the relay and send the results to the central server. This way you can distribute some of the processing to the relays instead of doing everything on the central server or servers. Another way of scaling and logging architecture is making sure that you only send the right logs to the right places, so you don't send everything to everywhere, but only a small fraction of the messages is what is really necessary. Log routing is based on filtering and if you connect it with message parsing, which I explained previously, then you can even create a poor man's SIM system like sending an email on root logins. This way you can optimize your SIM or log analyzer tools, which are usually licensed based on message volume. Only those messages are forwarded, which are really necessary. For example, authentication messages to your SIM system, or you can do throttling, evening out the peaks in message rate. A few words about our latest release. What's new? We added this base buffering. Correlation is made easier. You can write your own parsers in Rust. Elasticsearch 2, and I forgot to write here, but Elasticsearch 5 is also supported. And there were many performance improvements as well. As a summary, I would like to say what Syslog ng benefits have in large environments. First of all, it's high performance and reliable log collection. It can simplify your logging architecture by using a single application, both for Syslog and application data. Data is easier to use as it can be parsed by Syslog ng and presented in a ready-to-use form. And thanks to log routing, it can lower the load on destinations. If you would like to know more about Syslog ng, our central source of information is syslogang.org. Our source code is available on GitHub, where we also have an issue-taking system. And if you have questions on how to solve a problem, we have a quite good mailing list and also IRC and GitHub channel for real-time or near real-time questions. Do you have any questions? Thanks a lot, yes. Perhaps I will give you this just a second. Let's see if it works. Do you hear anything? Do you hear something? No? No? And no? No? Okay. Anyway, so if you have any questions, raise your hand. I will run and then run back to the speaker. Okay. Just coming. Was that, sorry? Hi. Thanks for the talk. You say you have a disk-based storage now, starting from 3.8. Is that right? So it means that you don't lose messages even when you use relays, even through the network might be down. Is that right? What's the storage is for? Yes, with disk-based buffering, even if the network is down, messages are corrected and forwarded.