 Hello, welcome to my talk log everything with log stash and elastic search to begin with just raise your hands Who uses logging in your applications? Yes, that's great who uses a central lock server That's okay. I hope will be some more after this talk Little bit about me. You can follow me on Twitter. You can get the slides on github I'll post them afterwards and of course you can visit my blog at Peter minasurfman.com I'm a software developer at Blue Yanda. We are a sponsor of this event Blue Yanda is the leading software as a service provider for predictive analytics in the European market We have our headquarters in Karlsruhe and offices in Hamburg in London and about 120 employees We use the full python stack for development We use flask for web front-end SQL alchemy for database access and the pandas numpy scikit-learn stack for our machine learning tools most of our core algorithms are written in C++ and Executed on a custom parallel execution execution engine engine and of course we are hiring So log everything When your application grows beyond one machine, you need a central space to log Monitor and analyze what's going on log stash and elastic search store your logs in a structured way and Kibana is a great web front-end to search and aggregate your logs Just a little disclaimer I'll talk a lot about log stash But I think the same accounts for grey log grey log is also a great tool to collect your logs and I think they have similar strengths and some differences so What do you need if you want to Have a central logging for your applications? Of course your log producers can be your front-end might even be a JavaScript single-page application which uses a custom API to ship the logs to the back-end Might be some API or back-end service might be an Authentication service might be even a database system or the operation system You have to transport your logs to a central station. I think everybody knows syslog I'll talk a little bit about galf. That's the grey log extended logging format But you can also ship your logs via radius queues or via the rabbit MQ system you could even log to log files and pass them back with the block Regular expressions But I think you have more benefits if you'll log your messages in a structured way Then you have to root and filter your logs You can do this with log stash or with grey log 2 server and of course you need some storage where you can store your log files I think elastic search is one of the great open-source tools It not only allows you to search your logs, but do all kinds of analysis based on your log files and To do the analysis you need a front-end to access your logs I'll talk a little bit about Kibana. It's a JavaScript only a framework The grey log 2 server has bundled a web interface, but you couldn't also use the plain elastic search head It's a JavaScript application to Or even use Python with the pious library to build custom queries and reports against your log files So what I'm going to talk today is the logging chain To transport your logs with galf to our log stash server The log stash server pipes them into an elastic search search engine and you Access the logs with the Kibana web framework. It's the the pattern transport root store and analyze If you need to grow further you can You can scale each part of the system. You can add more nodes to elastic search You can use multiple loxas instances or even add Message broker in front of your log stash So that the message broker collects the logs and then ships them to log stash to handle the load better What is galf galf is the grey log extended logging format? It's basically jason over UDP. That means it's non-blocking But it avoids some shortcomings that you have from plain syslog with which is all also text over UDP It's not limited to one kilobyte I know syslog and g can handle more but the plain syslog can just handle one kilobyte Often one kilobyte is not enough especially when you do application Monetary logging because of backtraces and you just have more data Galf also adds structure to your logs so you have a key value relation in jason and it has compression built-in and The possibility to add chunking you can one log message can be chunked to I think about 120 messages Also syslog per default has no support for additional fields and metadata In galf you can add arbitrary fields and arbitrary meta data to your log messages So I think galf is a great choice for logging with applications and Of course, there is the grey pi python handler and clients for all kinds of messages too One thing you have to consider when you want to to log with galf Because it's sent by UDP It's not reliable If your network is flaky or if the server is under high load Messages could get lost So if you really want to get sure that your log message arrive at the server you have to consider different Transport formats like I said earlier a radius or the rabbit MQ system So what does a log message in galf look like? You have a mandatory version field you have the host field where the log message comes from you have a short message You have a timestamp You have the log level and Then you'll have an arbitrary number of custom fields like the facility or Some request ID. I'll talk about this later. So how to use it with Python. It's pretty straightforward Works with the panda with the standard Python logging You just add a galf handler host and port and you can just log as normal and the handler will push it into your galf aware service In our case to lock stash So what is lock stash? Lock stash is a tool for receiving processing and outputting locks. It's written written in j ruby and runs in the Java virtual machine It's based on the pair pipes and filter pattern So you have incoming pipes you transform the messages you filter the messages You may even add fields or the lead fields and it do a bit of a pipe where you output it to elastic search I'm Jordan sizzle the creator of lock stash is now employed by elastic search and the Kibana a web Web analysis toolkit is also under the hood of the elastic search company So how do you run lock stash? Of course? You just download it unpack it and you need some simple configuration As I said earlier you have to define inputs filters and outputs the filters are optional Here I'll just drop all messages with the log-level debug For our system we Define a galf import input filter, but you lock stash can also provide input types like syslog or radius or Other tools are like I said earlier The output is to elastic search pretty straightforward You can also output to a file, but it's of course You get only the benefit if you if you put your structured locks into a lock stash Okay, what's Kibana? Kibana is a single-page JavaScript application. You need no install just unzip it In your antinix root folder or Apache root folder And it's a tool to search and analyze time-based data in elastic search It has a rich set of visualizations and provides the access to the full powerful search syntax from elastic search and You can create and share dashboards For yourself or within the company a big advantage in using Kibana is it is possible for Non-programmers or not so skilled people to query and analyze locks and I think Really important point is you don't have to have access to your servers to analyze your locks But you have to consider Kibana has no authentication built in so it directly talks to an elastic search search service and Who can read from elastic search can also write to elastic search. So if you need extra security you have to put a proxy in between and handle do some authentication the next slides are some possibilities to visualize Search queries from elastic search with Kibana Better map uses geographic coordinates to create clusters on a map You can zoom in You can do this based on country codes in your log messages and Yes, if you want to drill in you can click on the clusters and have a better view on it You can build panels with histograms histograms display time charts It displays counters mean minimum maximum and the total number of numeric fields You can build spark lines spark lines are a great tool to get an open overview of your system. What's going on? it's based on tiny time charts and You don't get the exact numbers, but if you look at a spark line with normally you can Really fast access what's going on and if there is something wrong with your system then Kibana price provides some visualization for the Facet calculation from locks from elastic search Facet calculation means based on a set of filters You can see how one term is distributed here. You can see I think that are locks from a web server what kind of files you have Delivered mostly html some PHP and as some images So it's also nice to get a quick overview of your of your system After talking a little bit About the technology. I'd like to present them some Logging patterns that are useful when you want to add structural logging to your application They are all based on adding context to your log messages So the easiest way to add context to a log message is just use the extra field from your log message it just takes a dict where you can add arbitrary key value pairs and The graylock a gulf handler just pushes them into lockstash a little bit more advanced is using using a filter With a filter you can add context to all of your Logging afterwards. So if we have a web application with a user logged in we can add a filter Which adds the logged in username to all the logging messages just afterwards The request ID. Let's you call it all lock message from a request together So if you generate a request ID at the beginning of a web request and just add them at a content with a context To all the following log messages. It's easy to identify Messages from the same request and it makes debugging much more easier How does this work? Okay, you get a request your application. You set the request ID and all the logging messages have the request ID applied How could you implement this? This is an example for flask flask provides before request handler It's always called when a new request starts. We are generating a UID and We are adding a filter to the logging so that every log message has this request ID applied A correlation ID. Let's you correlate log messages from different applications and systems If you have a front-end server and some back-end AP servers you want to correlate your log message says Over all these servers. So what do you do at the beginning of the request on the front-end server? You generate a correlation ID and when you make requests to the back-end servers You add the correlation ID to all your requests and the back-end servers just read the X correlation ID header field and Add this correlation ID to the log messages Same ID here All the log messages have the same correlation ID and you can follow a web request across different applications Implementation for flask pretty straightforward. You just get the header field if it's set and of again, yeah, I'm you add a filter Yeah I've started the Talk with the claim log everything That's not always true if you have really big systems. Maybe you don't want to log every debug message and There's a really cool handler. It's not yet available in the python logging, but it's available in the logbook from army in one of her What is the finger crossed handler the handler wraps another handler and buffers all the log messages until Trigger as an action level is triggered that means you can buffer all the debug messages And if there's afterwards an error message Then it outputs all the debug messages if there's no error messages. They are dumped So in the error case you all have all your debug messages in your system and everything if everything works, okay You're just they are dumped away It's the implementation Pretty clear. I think Really like a logbook. I think it's a worthy alternative to the standard logging But you have always to wait wait the benefits of using an extra library To the benefits of using the standard what is in python? So that's my talk. I'm finished one minute left. Thank you very much