 Okay, I guess we'll go ahead and get started My name is Brian Alton Hoffel and I like lots of data. How many of you all like data? Like seeing lots of pretty graphs Do you like seeing pretty pictures in your presentations? Because I don't really have any And then up there there's a few ways to contact me if you ever need to I'm veggie meat on IRC I am in just about every Drupal channel Connect to it by proxy, so I'll always get the message Connect with me on Twitter send me an email whatever a little bit about me I have two Drupal businesses actually one doing development, which is VMdo and the other one I do hosting and that's in horizon and that's also where I started doing a whole lot of a centralized logging I've worked with Drupal since 2008 and like I said, I like lots of data and also like automation It's very rare to find me without putting a task in a Jenkins So who knows what what what that is? Yeah, that that's just a typical engine X or Apache access log You ever tried to go in You've got an issue on your site and you need to find out what's going on right? So you hit the log files Well, typically you're going to just grab it maybe send it through Oc a little bit of said or whatever just to Just to find out certain pieces of information that you need If you don't have a lot of traffic you might just hit tail on it But if you've got three million hits a day, that's not going to work one way to go through this log like I said would be like with Oc and In this case, we're trying to get how many times content was accessed between 3 a.m. And 3 59 a.m. Server time Can you remember how to do that every time that you need that information? If you wanted to have some fun with it, you could maybe do some Pearl you could through throwing some other regular expressions I've had one liners or maybe they should have been multi lines that were alias But it would take up that entire screen And I'm sure all of you have had that before too So that's kind of brings us to our problems with a conventional logging If you've got one machine Great, you can go through it that way What happens whenever you get say 40 web heads or say that you've got multiple database servers Or you've got multiple file servers You're needing to find the source of a problem Maybe it doesn't occur on every single one of those servers. Maybe there's something slightly different because Your configuration management didn't work right Conventionally you would need to go to every single one of those machines and access the logs That that doesn't work with 40 or 400 or whatever if you You can't really have your Technical support people a lot of them won't be able to go through those logs and find out if there's a problem So let's say that you had a email issue Customer call says that I'm not getting any emails. So you Customer support would ideally be able to Okay, let's go look at an email see if it's giving up any errors or anything like that in the conventional logging You won't be able to do that. You you have to have a sys admin or an ops guy or some other technical guy basically be a keyboard for that person and Because of that it ends up taking a long time to fix problems And we all know that in customer support the customer wants their problem fixed yesterday So the solution to this is a centralized logging With centralized logging you basically are shipping all your logs to a central place and in this case what I'm going to talk about is Shipping them from your servers Through log stash, which is the cute log guy in the middle and Having them indexed by elastic search And what that what that helps you be able to do is you can go back weeks from now or whatever and You can search through your logs Just using standard Apache Lucene queries You can do that. You can limit it to certain time periods. You can find trends It makes life a lot easier, especially if you've got a website that's generating Tens or hundreds of thousands of log messages a second. So what is log stash? Well log stash is a Java That it was it allows you to receive messages But it can also parse them and then you can send them to wherever you want So if you want to send it to elastic search, you want to send it to pager duty You want to send it to a stats D and graphite? You can do all that. It's a lot like Using Unix pipe or using T Except you get a whole lot more parameters and it's a it's a lot easier to configure than that if there is not a Plugin available to ship at some place There actually should be pretty easy to write Log stash has three different types of plugins The first are your inputs plugins, and you have your filters and you have your outputs I've got a few of them listed up there which one I'm going to talk about here is a lumberjack The filters it the filters are great for manipulating the log data However, you need it so you can separate out different parts of the log message into whatever field so that you can search by say the HTTP status code field or you can search by how long a slow query from my sequel took And grok is also a great filter It's basically You can put together different regular expressions and also pass them as variables and and reference in that way Anonymize is a great one if you say you You're your IT guys need to have all the data, but maybe customer support needs certain data stripped out and Then you can also do that with mutate Your outputs right now it goes a lot of places. You can ship it out to Page your duties one that I use all the time elastic search You can also I think you can do IRC to if you if you need to It's a lot of fun, and it's also real really easy to configure This is just a simple configuration that does engine X and Apache access log What what it's doing here is It's receiving from lumberjack and you'll notice that there's a type key on everyone I'm not totally sure if you can read it up there. There's a type key on every one of them and What happens is? Logstash references that type to say that hey this filter applies to these inputs and this output applies to everything that has this type attached to it and So that way you can route stuff wherever you want as it's needed Lumberjack, which I'll get to in a minute has a really cool feature where you can arbitrarily add a field from the shipper side so that you can have it show up on in a elastic search and That way you can search by maybe your client name or or a particular server name Which would actually already be there But if you had something that you needed to put together there like maybe a cluster You could have that all exposed as a field You can also in this case. I'm manipulating it and Turning turning that field into a tag and then dropping it elastic search Elastic search is a Document-oriented search engine It's built on Apache who's seen One nice thing about it is it's schema free, so you don't have to go through and define What all of your fields are going to be it takes it from the JSON input and figures it out by itself It's also very easy to scale Especially in a multicast environment if your network you can if you can multicast on it All you need is the cluster name and it will automatically find other peers or other nodes for that cluster and It will rebalance itself as needed Most of your cloud providers don't do multicast I know that if you're doing this in the cloud Rackspace actually does if you use their cloud networks their private networks And so Kibana is actually pretty simple to explain Kibana is a really awesome front-end for elastic search is what it amounts to It's current version Kibana 2 is in Ruby and Kibana 3 is HTML and JavaScript based but Really with elastic search you could actually build your own front-end in Drupal if you really wanted to Elastic search has a rest API for that But why bother whenever you can get Kibana to the kind of make sure logs look like this Where you've got a easy chart there you can sort by fields that are over on the left you can go and search them graph them You've got the messages there below or Kibana 3 is is kind of like this Lot prettier Actually has a lot easier to use Interface to make custom dashboards, and you'll see a few screenshots from that later and Finally, what is lumberjack? well You can use log stash On the servers that you're shipping logs from to actually ship the logs to another log stash server but It usually takes up a hundred megs or more of memory just sitting there running if If you're say you're in a cloud environment, and you're spinning up little 512 megabyte web servers just to scale out You don't you don't really Have the available overhead for that that kind of application Our syslog I've seen those grow up to Close to a hundred megs before lumberjack It is a very very efficient shipper that just kind of it sits there, and it just waits for log messages to send Sends them over SSL. So if you need to ship them Over an unprotected network or over a non private network you can do that and It uses very very little memory whenever for the footprint if you look that's 4kish maybe But they are pretty lightweight And that lumberjack is also really fast this is a little spreadsheet that a Jordan SSO Put up on Twitter one day that was comparing the Published rates and the consumer rates of different shippers through log stash and If you look at that lumberjack is capable of a 25,000 Per second as what he was showing there. So Why not use a Drupal's database log or the Core statistics module that's in court It's really slow whenever you whenever you get a larger site. That's getting a lot of traffic That that's more that's more stuff that you're having Drupal do That's more times that you're having to hit the database If you're hitting the database you're probably hitting disk and it can get really really slow if say that you Say that you set up something with rules where you were looping over a set of line items on a Drupal commerce order and You're trying to do something with all those line items, but you You may be messed up and had something that threw a little error Well, that can get carried away and suddenly you've got 20 30 50 or 100 pages worth of errors in the Drupal database log That probably really slowed down the site for those people that were on there at the time so It's typically a lot better to just log it Have Drupal either just log it directly to a file or to syslog but If you can't Get direct file access like that, but you happen to be able to get remote database access Log stash actually has an input for Drupal's database log So all you have to do is give log stash the credentials and the connection information for your database and It will pull out the watchdog logs and ship them for you Some advantages to shipping Drupal's logs elsewhere It's a lot easier to search through them. You can like I said before you can do Standard patchy Lucine search queries You can easily graph things if you were say looking for 404 errors or failed login attempts Or maybe you're looking for just how many times a page was hit successfully You don't have your database exploding whenever you have that little issue inside a loop and You get happier users on your high traffic sites You can also ship like instead of having to use the statistics module to see maybe your traffic Maybe you want to see a long-term graph. Well, you could could have also shipped your statistics to Stats D and graphite where you're able to get a chart like this so How does this help help me as a Drupal site administrator or really in a general Scenario number one as a website administrator I need to know if there's a trend of increasing errors You could if you were shipping to say stats D and graphite and you had that long-term graph You could easily reference that and see that But maybe you're not maybe you're just indexing all your log entries Or you're wanting to see if a particular error is occurring like maybe a certain message That's you know is coming from a certain part of your application Well, you can easily put in a search query and generate graphs like this These are actually two charts of the same same data The one on the left is a pie chart. That's just showing the percentage just showing your ratios basically of 200 errors to your 404s and 301s and whatever The ones on the right are showing how that happened over time The another one is my sequel and the slow query logs suck You've ever tried to go and Maybe you you're trying to grab through that You don't know if it's going to be one line five lines six lines ten lines. Whatever you don't know how long it's going to be Plus a they have a really weird time stamp And so you kind of have to shift gears to kind of think about okay. What is this time stamp here exactly? Well with log stash You're able to take that and and Manipulate the log messages Compress all of those multiple lines down into one continuous message You're able to translate the time stamp into a standard Time stamp that you can use on everything so that all of your index messages have the same time stamp format You can split out fields which I'm showing the scanned the the amount of Scanned results or scandal Rows and the amount of results from that really I can actually go through and if I set down some more with the filters and Worked worked a little bit more at it. I could I could Split this out into even more fields Just splitting up that log message But this you would agree that this is a lot easier than trying to look through My sequels normal slow query log another one might be Twitter trends. Maybe you're just wanting to see What somebody's saying about your your brand or you want to search through a Drupal con tweets or you want to You want to see Maybe you've got a whole bunch of interests that you always search for on Twitter and you want to see how those are Talked about over time Well, that's actually pretty easy to This is actually tracking the Drupal and Drupal con hashtags and is actually a screen grab from last night and It's just got a few messages up there and it breaks out the Nickname and the any links that appear into their separate fields that can be sorted on there Maybe you got things like unauthorized access You always get people that Maybe it's Scripps or maybe it's bots that try to log into your site or create users Maybe you want to see if there are certain times of day certain IP ranges Or other certain characteristics that Are making an extraordinary amount of attempts That's also easy to do What what this one has is this is Drupal is actually logging the syslog and then all the Syslog messages are being shipped over to log stash and so we're searching on The syslog type being being for the Drupal program and then we're basically searching for It being denied for the access denied You could also bring up you could Easily do a graph of this just by click. I actually have the graph section Collapsed there, but it would have a graph of that over time and if you wanted to drill down Into different parts of that parts of that message you could another fun one is IRC And like I said, there's like say up there There is a ton of documentation on IRC that never makes it on the Drupal org It's just you know, somebody has a question that might seem off the wall there. So it gets answered But nobody makes goes and makes a documentation page you can actually have logstash sit in IRC channels and Log them and ship them to elastic search and index them So that you can go in and like in this case This was over a 15 minute period Last night and just searching for views. Hey, I get a message where The Drupal conbot responded to somebody about something with views bulk operations but you could say say that you did that over a long term you could go back and you could search through a Something that you're having an issue with or maybe chart interest in a certain certain problem And then another one would be a smarter notifications I'm a big fan of pager duty even though I get a lot of messages from it sometimes You can actually have Logstash go through and say hey does this message maybe have these certain this certain stuff in it so maybe you've got a certain error message from your site that Needs immediate attention, but it's not doing anything necessarily that would trigger something like Nagios or Zeno's to actually send out an alert so you can have Logstash say ship it to pager duty and That way you get a text message or an email alert or whoever is on duty at the time gets a text message or email alert So in this case got one that say that I went through and tagged it with a type ouch Well, it's gonna send me a message that says super bad event for for This or that But this comes in handy for for like I said those little errors that don't necessarily Trigger my Zeno's monitoring system, but they do need immediate attention In this case on this configuration too It's also at the bottom. You see one for stats D. It's shipping the account for a for a particular response from engine X to Or for a particular response code from engine X to a stats D So that you get one of those charts that I showed earlier more resources can be found at a Logstash net or last search lumberjack can be downloaded off a github and Then a Kamana feedback Can be left at a triple con site. So any questions? Yeah Wait this one. Um, do you have any recommendations for? All these logs once you're storing them and indexing them and that sort of thing it's probably going to Take quite a bit of disk space Are there Anything that allows you to maybe put these things on s3 or something like that to help with the storage I should be if you don't have a lot of storage on the servers that you're using it with or Maybe it doesn't maybe I'm wrong. Maybe it doesn't amount to a ton of storage, but Okay, yes, it does amount to a ton of storage I've got one site that gets About three and a half million hits a month and it generates probably 12 gigs Every day or two So I do run into that problem and One nice thing about the elastic search data Is you can pretty much just back it up as is and bring it in someplace else Bring it into to another elastic search instance So what I end up doing is I've run a script that Takes everything every index that's more than seven days old and Archives it and then yes ships it off to s3. Okay, so yeah, you can do that And it's it's easy to just bring those archives back and pull them up in another elastic search in Instance if you need to actually go back and look at them. Yeah, can you go to the microphone? Yeah, about the law your your slides are they going to be online? Yes I'm actually going to upload them to to the Node on the Drupal con site Just haven't had a good Wi-Fi connection this morning Thank you with patent media. No, it's curious if there are any performance considerations to take into place with 40 or 60 servers like this Um Yes I'm not quite sure what a threshold is on the elastic search servers I Know that I I can run them on on cloud servers Typically for me, I've been running into about 17 or 18 web web nodes Shipping their logs to elastic search And those are running 2 gig servers That that's about what I've been doing is every 17 or 18 it's up another two gigs and That usually works out pretty well with lumberjack in the shipping off of the logs Is that something where the engine acts your web server? Caching server other things are ready to disk and then lumberjack is reading those Shipping them off and then log rotates Are you able to capture them and ship them off of that ever-hitting disk? Um, well, there's two ways to go about that one one is you can have the standard ways to have Lumberjack just watch the logs and it just reads them off the disk and ships them out Which that lets you get really aggressive with log rotate Because hey, you're not you don't care if they're there or not So that gets you to kind of a second way, which is if you wanted to You could mount that in memory Mount your log logs in memory if you Can manage it well Or you know that you're gonna rotate out before it gets full or anything like that and you can actually speed that up pretty good But yeah lumberjack typically you just have it read from the files It'll also take standard in so you can just pipe something to it We're gonna have different site owners need to see logs or information just about their sites And of course they're gonna want to be on the web on a GUI. That's not on a private network Does there are these tools gonna be able to do that or are there others you using to let the public see their logs? Um Kibana to has a branch That has authorization in it It's not recommended for production use so What I would do in that case is I would take the Kibana 3 version which can run on engine eggs or patchy or whatever your web server is and Let the server handle the authentication there or put it behind SSL Whatever you're needing there not seen if they are putting authentication into this into a cabana 3 yet or not But earlier in the talk whenever I mentioned that I was shipping from lumberjack and that I had a Field that lumberjack added that is actually something I've kind of played with from the cabana to branch that has authentication Which is it just authenticates off the tags? so, I don't know if you've worked with other tools, but Is there anything in particular with this tool set you found that has particular advantages over? Splunker other products in this space other than sort of the open source. I haven't really worked with anything else I kind of got started on this one By I hang out in the infertile channel on free note, and I kind of got started by this one by asking Hey, what's the standard right now? And this is what they pointed me to I Did look into Splunk at early on and I found that in my case where I'm I'm creating Right now I'm creating a total of 20 gigs to 25 gigs a day It was just a lot more cost-effective for me to roll my own basically