 Today we're going to talk about open source log analytics or crowning logs with ELK. For those that don't know, ELK is an Elasticsearch log session combiner and we'll go into what each of those are as we go through the talk. I'm the support engineer for Australia and New Zealand. I've been using ELK for around about two years. Previously I was a Linux engineer and I handled the ELK stack and just general Linux infrastructure. I work for an email service provider in Australia. So Elasticsearch, you may not have heard about them. We do have some local customers like Xero and Seek. If you've used GitHub, the code search and the repo search, that's all powered by Elasticsearch. The company was formed in 2012. We're big in the EU and US. We're still making our move into the Australian market. So for this talk, what is a log? Well, really it's just a timestamp with a string. So when you think about it in that basic term, you've got server logs, you've got Twitter streams, you've even got metrics. We're increasingly told to log everything, which is all good and well, but formatting really sucks. You get format frustration dealing with different timestamps, dealing with different logs, dealing with things that don't have timestamps. It's really very, very annoying. So why do we ever want to collect and centralize the logs? Because giving uneducated people access to your systems can be dangerous, because you don't want to have 100 different shell scripts to deal with 100 different log formats. Because you want to be able to do stuff like watch an email, hit your gateway server, then move through and then watch the user actually pick it up. Because you don't want to be your boss's charting library, your boss's grep. Because really you want to democratize your data, democratize your logs. And apparently because Docker and Mesos logging really sucks, so this is a really, really good way to do it. So, and how do we do that? Well, with the help, obviously. So this is a quick rundown of Elasticsearch. It scales to hundreds of nodes, terabytes of data. It's pretty simple. Zero config means that the defaults that we ship, they're saying up until a size, reasonable size, maybe about 10 nodes or so. After that you want to start looking at changing things. So a few basic terms. An index is essentially a database for Elasticsearch. It's broken into shards, which are then spread across all of your different nodes in the cluster. And then you can replicate the indexes as well, which provide scalability and some redundancy. So when you've got a cluster in Elasticsearch, you have a single master at any point in time. It handles your cluster state when nodes join and leave, index creation, that sort of thing. So when you do want to create a cluster, you can have multicast or unicast discovery. You do need to configure this. By default it's multicast. And if you actually spin up a whole, if we spun up a bunch of machines in here, they'd all form a single cluster. So the best bet to do if you're in production or even just playing around is to set your cluster name so that you create your own specific cluster. If you want to use unicast, you just listen to IPs. And you definitely want to keep your master eligible node, count uneven, just helps with split brain. This is an interesting question. So the answer to this is pretty much always, it depends, and it depends on all this stuff here. So the best way to actually figure out how big your node needs to be for your dataset is just to get a node, it can be a VM or something, a physical machine, something that's indicative of what you're looking at using. Create a single index with a single shard and no replicas, just throw data at it. Once it falls over, you've got a bit of a limit there. You know what you're going to work with. And because this is Java based, you want to keep your heap under 32 gigs, so 31 is good, because basically once you hit 32 gig, your Java pointers are not compressed and you start losing some inefficiencies. It's a pretty big ecosystem around elastic search. There's a whole bunch of plugins for languages, for monitoring. There's one for attachments as well, so you can use basically index attachments on emails, that sort of thing. There's a whole bunch of clients. I just realized that .NET is wrong. It should be, we actually have a Nest client for .NET. We've got Hadoop integration. And there's also the EOK stack as well, which you can kind of think of as being part of the ecosystem. So super quick, this is, if you want to do something in development, if you want to have a play around, you can do this on your laptop. It's really that simple. This is, we can check if it's alive. If you get a 200 response, elastic search is ready to receive an index data and also do searching on the data. And it's near real time. So some cool tools. Elastic search is full of APIs, but most things are, everything's returned as a JSON, but JSON's kind of sucky to read. You can use the question mark pretty on the end of a request and it'll line feed, format stuff, which makes it a bit easier. But, and this is an example, right? So we're asking for the cluster state. We're getting cluster state. And if you look at the top here, you've got master node, and that's an ID that elastic search generates. If you can tell it's the bottom one, right? That's not very easy to read. Now it's super easy to read. So the CAD API has a whole bunch of endpoints here. Really handy for doing quick CLI checks or if you want to build things around Nagios plugins, that sort of thing. Very, very handy. Around monitoring, there are some community plugins. We also offer a Marvel plugin, which is free for dev use. It also comes included with support contract. So when you want to scale out a single node for elastic search, it's pretty simple. You just follow these steps. The cool thing about it is once a node joins a cluster, the shards that you've got will automatically balance across the new nodes. You can query and survey any node while things are being rebalanced as well. And as long as you've got replica set, you can survive the loss of a node. The bottom tip, all writes in elastic search are done sequentially, even an update to a document. So if you use an up-schedule, you find yourself, you'll get better performance, especially with SSDs. So super quick summary of LogStash. You can kind of think of it like an army knife or a multi-tool for... Well, it's primarily logs, but you can pass any sort of data into it. This is the general architecture. So you have a whole bunch of different inputs, then you pass it through into filters, and then you spit it back out. So here's a couple of inputs. Monitoring ones from different data stores. So you can pull stuff out of Redis. You can pull things out of an email account, for example. You can listen on a TCP socket. Different filters. And this is where you kind of manipulate your data and really add value to it. So some interesting ones are the GOIP, which is where you actually take an IP and then using an in-built database within LogStash, you can get a latitude and longitude. So you can use that later on. The date's really cool because you can actually set a standard date format or if you really want, you can change it. If anyone here is familiar with LogStash, there's Grock. If you're not familiar with LogStash, you will get to know Grock pretty familiarly. It's very powerful pattern matching. And then once you've filtered, you spit it back out. And again, we've got monitoring. We can send it out again to a different protocol. We can send it to Elasticsearch. You can send it to SendItOut. So there's about 150 inputs, filters and outputs built into LogStash. There's another contrary package from the community with another 100. Very simple to build your own and include those as well. Again, really quick to get up and running. You do need a config. You do need Java 7 or Java 8, same with Elasticsearch. That one's kind of boring. We'll have a look at this one. So here we're passing. We're just telling to listen on standard in and then whatever gets sent in consists of these three parts of a message. And then when you pass in, that string, you can see here it's actually broken stuff out. That's really simple, but it gives you an idea. So this is a post-fix example. We've got a bit more of a complex grok filter. We've also standardized. We're matching it. We're doing a date filter. And then, again, we're spinning it back out. And this is what you get. And this is kind of where you can start to see things. You're getting more value out of your logs. You're getting things like your program or your PID. You're getting the original message at the top as well. This is CLF. So patch your engine. Most people will probably know it as. This is kind of really, really cool because you can actually see here we've got the get verb broken out. We've got the response broken out. We've got the bytes broken out. And what you can actually do is once you get it to this point and put it into Elasticsearch, you can start going, well, give me all the non-200 responses. Or a date range. Or you can say, give me all the 200 responses and tell me how many bytes they used. And then you can graph all of this. And it's rather than having to do a grep and then count and add all those up, you can actually do it graphically. So one thing we haven't done here is we haven't broken out the client IP and G located it. The agent down the bottom, we actually have a user agent pattern where it'll actually break these all down so you can do further queries between this time and that sort of thing. And then to get it to Elasticsearch it's just as simple as putting the Elasticsearch protocol in the output there. And then you can see here the match message of combined Apache log. So this is actually that pattern and these are patterns that are broken. They're actually included in LogSash. There's a whole bunch of them. And again you can build your own and include those. So you want to deploy. This is essentially what a general pipeline will look like. Your ship is going to be something like a syslog or a LogSash instance. It can be something that spits out logs on TCP or UDP for example. It's pretty like it's good, it'll work. There's nothing wrong with it. But if you want to do downstream maintenance, if you want to change your LogSash config if you want to update Elasticsearch, that sort of thing, well you're chucking a broker. And the most common brokers we see are things like Redis or Kafka or Rabbit or ZeroMQ. And then you scale that out. And then you scale that out. And then you scale that out. And then you scale that out. And you can get to thousands of nodes with, or thousands of shippers. We've got customers with tens of thousands. We have Elasticsearch clusters with hundreds of nodes with petabytes sort of counts, trillions of documents. A document being essentially a line or an event. We haven't touched on Kibana. Kibana is Kibana 3 at the moment, which is the current general availability version. It's just JSS and CSS. You can put it under whatever you want. There's some sample configs on GitHub under the Kibana repo that you can use for Apache and Nginx. Version 4 is on the way. It's a complete rewrite. It's currently in beta. If you're using Kibana 3 and you haven't tried it yet, just check it out. It's really, really cool. Kibana 4 will only work with Elasticsearch 1.4, though, so be aware of that. So we've taken all of our Apache logs and our Nginx logs and whatever else you want. And this is actually how you can graphically portray it. So this is an Apache log. And we've got things at the top here. We have separate queries that are broken out into just the different file types. And this is a looks like a 24-hour time range. So you can see the use cycle. You can tell where you peak hours. You've got a breakdown of your requests as well. And then down the bottom there is actually a table with a list of all the events that relate to this time range. So you can update this dynamically and it'll update all the graphs dynamically as well. Similar sort of thing here. A couple of different graphs. You've got a terms panel over the side of the balance. This is just a dark theme and the previous one was a light theme. This one's pretty cool because it actually shows the use of the Twitter-importing log stash and they've pooled in tweets and then they've broken them down that's kind of a bit of a bad picture but they've actually broken them down into languages. So you can see the different languages in the line graph over there. And then they've geolocated them onto the map as well. And then down the bottom is a little trend a little trend bar. So in Kibana terms, this whole thing is a dashboard and then the graphs and then the world map and then the trends, their panels and you basically build your panels and there's a whole different sort of panels you can use. So some useful helpers. We've got Curator which manages your indexes in Elasticsearch. It's just a Python app. It's really handy. You can use relative timeframes. You can use relative sizes. So you know, delete or indexes older than X or keep only enough indexes so that I am only using one terabyte or 500 gig or whatever you want. There's Puppet and Chef modules. The log stash folder is written in Go so you can compile wherever you want. It's low overhead so if you don't want to install Java anywhere if you've got a small system with not many resources it's really good. Grock Debugger is a community app. It's really cool for pattern matching so once you start playing around with Grock you can just plug a log line into this and you'll get a pattern out from it. You can play around and build your own panels. So all of our codes up on GitHub for everything. All of our docs are up on our site. We have a really strong community. We've got Google Groups for Elasticsearch and Longstash. We've got a whole bunch of IRC channels as well. I hang out there a fair bit and because I'm currently the only one in this time zone you'll see me talking a lot and I'm happy to answer questions. We're always hiring as well so if you're interested come and have a chat to me I'm happy to answer technical questions. I've got a bunch of stickers and what not as well. Our community manager, Leslie Hawthorne is also giving a talk tomorrow. She's up the back there so please do come and see us and ask questions. Have more than happy to have a chat to anyone. Thank you. Okay we've got enough time for some questions. This nominal five minute break between this and the next talk so make a run for it. This one over there on the right. Does your system handle large documents at all? Is it possible to store binary blogs? For example, crash dump files associated with log entries? Is that just not appropriate? Yeah you can. Depends how big you're talking though. But yeah, it depends. This one down the front. We've got some about 300 gigs of logs a day. So to process something like that I guess it wouldn't want to process all of them but process some of them. What for a blade with eight cores with a pipeline with eight core blades with 24 gigs of RAM sort of thing. How many gigs of logs do we process per second? We have customers doing tens of thousands a second so it's not a problem. It's just a matter of elastic search is built to scale horizontally so you really only want to go certain height vertically for a node size. So you just keep throwing more nodes at it. Same with log stash. It's just a single process but you can if you use a broker like Redis or Kafka you can have multiple log stashes reading off and it'll pull the logs in. You'd have to test that but I'm happy to chat to you about that. So have you dealt with issues of data corruption between your nodes and identifying and correcting that? Does it automatically fix itself when it detects it? That's a bit of a complicated question. There's been a whole bunch of changes done in the later elastic search to add some better checks something around index and shards. I'm not sure if there's maybe we should have a chat afterwards but there has been work done to that. Historically the biggest problem I've found using elastic search is that usually I'll be asked to put it into place for someone and they either don't want to spend the time tuning it or refuse to appropriately resize up or don't want a second node or all these kind of unusual edge cases that seem like something that perhaps would be fixed if it was self-tuning the way some of the higher end SQL databases are. Even as a paid sort of option through the support channel is there anything that is like that even if it's like a community script or something that tunes elastic search as configs for where it is based on limits? No, because it depends on your document. What you're doing, your query is you're doing the node you're on the elastic search version you're on. So best option there would be to hit up the community you know say post your config let us know your details your problems and then there will be someone that can help maybe probably me for a second The usual answer was more nodes or bigger node That is a point where that becomes pretty much the only answer it was built to scale and that's kind of what you're going to do How do you get data out of it like if you only want to hold data for say 12 months or you know you have certain streams that you want to hold for 3 months certain streams for 6 Curator can manage that so you can say only keep these indexes if you use a prefix like a name on your index you give it a name so you say keep these ones for 3 months delete them afterwards you can export them you can do there's a snapshot and restore API to do backup so you can save them off to S3 for example you can actually pull them out with LogSash and then put them somewhere else there's a whole bunch of options Okay, thank you Thanks