 Hey, we're live so Welcome all you YouTube viewers and Anybody who's following along with us live on IRC I'm Brian Davis from the Wikimedia Foundation, and I am going to give a little tech talk today on How we use cabana for at the foundation to view log data that we collect from various applications So a little over a year ago I gave a talk on the elk stack that we use for debug log aggregation and visualization at the Wikimedia Foundation Elk is the shorthand term that elastic search be v coined to describe the combination of Three of their open source projects Elastic search log stash and cabana and in the slide here. We've got just a little bit about each of those so elastic search is a Full-text search engine on top of Lucine Makes using Lucine a lot nicer than using Lucine directly for programmers point of view log stash is a pipeline processing system that Basically takes takes data in one side lets you run it through several kinds of filters to change it and then puts it out the other side into some sort of storage or output system and Cabana is a browser based analytics and search dashboard Built to work on top of elastic search. So In our usage of the elk stack and the typical usage of the elk stack Data is ingested by log stash Processed around dumped out into elastic search and then cabana gives a nice web interface to To view that that data so Since my talk last year there there've been a few changes we have quite a few more applications sending their logs into the system and We've we've made some changes in how we process things in the middle of log stash To get a little better normalization between log events that come from from separate applications But the most end user visible change has been that we upgraded from cabana version 3 to cabana version 4 so We've got two main elk stack deployments here at wikimedia foundation the first is hosted in our wikimedia labs project as part of our beta cluster testing environment and The front end for this is found at log stash beta dot WMF labs dot org and The second deployment is in our production network and access to the production cabana front end requires authentication with a with a username and password and proper group membership in in our backing authentication system And we have to do this because there's some potentially sensitive data such as IP addresses collected from from our users in here So we can't just let everybody look at the production logs But the beta log system is open to anyone so you can you can go and fall along or play with things there So I'm just going to do kind of a basic tour of the cabana for user interface And I've got some screenshots Basically that that I'm going to walk through that I took of our beta cluster instance So when you first visit log stash beta you will see this a brief loading screen and then This default dashboard that we've created This dashboard shows some information about all of the log events that have been recorded into the system and At the top it's also got kind of a handy list of links to Other commonly used dashboards So on the just just here on the left hand side below the logo That says default. That's the dashboard name So if you click into one of those other dashboards like Apache or Hadoop It would say that name there instead of default to help you tell what you're looking at Up in the upper right hand corner There's a little display of the time range that the dashboard is currently showing And if you can if you click on that you can change the range That click will open up this this panel of preset time ranges that you can quickly switch between We actually only keep 32 days of log data in our elk clusters so some of these Build-in presets like the 60 day 90 day etc. Are not really useful for our logs, but they'll work You'll just only see things up to our our current retention period There's some some little Laws and jizz I guess on the left hand side here for Besides the quick for relative and absolute where you can you can put in times in in a different way So the relative range is Like last five minutes last hour There's a there's a box that you can type into there, and then the absolute gives you two calendar pickers for a start time and an end time and When you're when you're done after you've changed your time or decided that you don't want to see this anymore You can click that little up arrow There in the gray bar at the bottom of the panel to close it So back back at the primary dashboard interface There's a big text box here that allows you to inner search terms This these searches are done using elastic searches query string query language You can look up the full details of that online When the slides are posted they'll be a link there, too, I think in the slide It's a few quick tips That that are useful here are that if you Want to search for a phrase for something that you know a couple words. They're separated by white space You can surround your phrase in in double quotes to get it to hold together and find all the parts contiguous when it's searching for things and The other potentially surprising thing here is That the search terms you enter are or together by default. So by default if you type Foo space bar, you'll get anything that mentions foo and anything that mentions bar Not just things that mention both and you you can use the keyword and And it needs to be in all capital letters like it's shown on the slide there To do a search for foo and bar that would find any any log events that contain both of those words rather than either of those words Just just to the right of the text input box, there's a little magnifying glass icon This does double duty as both the search and refresh button And it took me a while to to realize this actually the old cabana 3 had had a dedicated button for Refreshing for catching up to the time window But in cabana 4 you just hit the search icon again Most of our dashboards don't automatically refresh So after you look at them for five or ten minutes Things things will have changed on the back end and you might want to hit the magnifying glass to see how that's going The next icon we have is a little document with a plus sign in it and this Gives you a brand new blank dashboard to start working from if you want to build a new dashboard Most of the time if I'm making a new dashboard I actually start from one of the existing ones. That's that's similar to what I want to to make it Similar to the dashboard I want to make Instead of starting from a blank slate Mostly just because staring at a blank slate can be a little intimidating That the next icon is the little floppy disk and this will open the save dialog If you're trying to make a new dashboard by editing an existing one You need to make sure you change the dashboard name or when you save you'll overwrite the one you started from which is probably not what you meant to do and On the back end on the storage side Dashboards aren't versioned. So if you save over an existing dashboard, the old version is lost this This can take a little bit of getting used to for some of us that spend a lot of time on the wikis where it's always easy to reset things to to a known good revision, so When when it pops up the dialogue that says are you sure you want to save over something think think for a second before you do Next icon we have going across from left to right is File folder icon and this is where you go to load a save dashboard It'll open up a panel that lets you search through the list of Save dashboards and and pick one to replace your current view pretty simple stuff The next one is the share icon, so It's icons a weird little arrow pointing out of a box, I guess that's that's what the kids these days think means share And this will let you get a shareable URL for the current view of a dashboard that you have so if you've added things in the search bar or maybe Click some filters or change the time window some things that we'll look at in a little more depth a little bit later Whatever you have showing on your screen You can you can hit this share and get a link so that you can you can get other people to be looking at the same dashboard you have Without having to save it like permanently as a named thing so if we click on it, then we get this dialogue box or the panel opens up and One of the things you notice here at first if you look at that share link line is it's got a really long pile of a whole bunch of stuff The actually the whole dashboard in all the search terms and everything is Encoded in that URL So typically these are way too large for a copy and pasting into IRC and other Places like that where you you want a short link but we can get a short URL here by Going down and clicking those those two arrows that point together. That's the the generate a share a short link I Con and when we click it then the share a link will change down to something much smaller That says like go to slash and has some sort of like MD5 kind of looking hash Now you've got a nice link that's easy to share on IRC That's a handy tip Back up the the dashboard screen we've got circled plus sign icon and this opens a panel to Find a visualization Visualizations what cabana calls the different widgets that are shown on the dashboard and and add it to your current dashboard So this this is a little bit different than the way that things worked In cabana 3 where you defined new widgets from the dashboard itself In cabana 4 is that the visualizations have to be made already Using the the visualize section of the navigation and then you plop them into your dashboard Once you get used to it though, this is this is actually a little nicer than the cabana 3 system so with with the cabana 4 system the single a single visualization that's that's made and stored is is actually shared across all the dashboards it's used in so you can go back to the visualization itself and You know tweak some settings on it make it show more rows or fewer rows or change the colors or whatever And then that will actually update across all of the dashboards Just kind of nice and To the far right we end up with the the last little easy navigation icon here Which is the gear and that opens the options dialog? In the version of cabana that we're using the only option that there is is to change the dashboards color scheme from light to dark So you probably won't need that too often, but that's where you find it a lot of the dashboards that we have created have a Histogram visualization near the top so this is like a a line or Bar graph that shows how many log events are seen in each Time bucket across the the period that's currently currently being viewed and one of the one of the nice user interface Features here that that not everybody catches on to you right off the bat is that you can click and drag on this To zoom into a time window so instead of squinting your eyes down and saying like oh, I think that interesting peek starts about 1755 you can just Click your mouse and drag out a highlight box around it And when you let go it will change the time window and zoom you into that section of the graph So in addition to the dashboards There's a discover section up in the in the upper navigation so if we Previously here, we were on the dashboard tab in the upper middle and now we're on the discover tab in the upper left and discover is Kind of just a generic search screen and it always shows this this histogram at the top and then a table of result output below it and you can then play with the the query bar using the same query language that the dash the dashboard screen does and This is useful to Save Something that they call a search so when you save a discover screen It's called a search and then this can easily become the basis of a new dashboard so let's This one I had a hard time figuring out how to do slides for so I'm going to try the infamous live demo We'll see what happens here. I'm going to change the window that I'm pointing to So now we're on a Here's the discover screen live and this is this is looking at our our beta cluster logs dash version So you can see the timeline here the the click-and-drag to highlight We can zoom in on what was happening a couple minutes ago and Hey, there we go. See see more detailed resolution about what events we're going on We can go down here to change the The table that's that's here showing things right and by default it'll show you the time and underscore source which is just like a blob of the full result of Any log event we can make that a little prettier by telling by Using the navigation on the left-hand side to say that what we'd really like to see besides the time is the type field and The message which is kind of like the most basic information And then we can go through here and we can see like okay, there's a whole bunch of stuff about syslog events like hmm, I don't think I'm really interested in looking at syslog events right now, so I can drill into a particular log event and see all the various fields that we've got parsed out in it and One of the fields that I can find down here is the type and I can then Choose to filter so that I only show things of this type or that I Filter out this type. So let's let's filter out the syslog type And see what happens to our graph so we got a lot fewer events showing up now and We've got media wiki as the primary type So that's kind of how we can drill down into some things. Oh Yeah, this is so the filters actually show up up here as little Laws-inges read for filtering things out and if we filtered one in like if we decide that we want to only see media wiki events Green for filters include filters And then you can remove filters if we're filtering for only media wiki, then oh only meta wiki Oh, I'm only looking at things from wiki meta wiki anyway So that's kind of some of the fun things that you can do there Depends on what you're going after how How much you're going to want to tweak the the table layout Besides being able to see that all the columns on an individual event here you can also If if you need it for something get to the raw JSON backing data that's stored in the elastic search for each event and Didn't go too horrible for a live demo Let's go back to the slide deck She's here Yeah Yeah, so you can totally Let's go back let's go back and show that So if we're on Let's let's get back to a kind of clean slate here So if you know the name of this is this is part of that the query language if you know the name of of a field like type you can put it in with a colon and then After it put the what you want the value to be so You put type type media wiki in there and hit the search button and Boom there we go. So now we're filtered down to just media wiki. So this this is the same On the back end this does the same functionality as if we drilled into a log event and click the little plus magnifying glass next to An event that had type media wiki I'm Trying to remember I'm almost sure that there's a way, but I don't remember how to do it how to put one in That actually shows up as a filter itself, but what you can do If you get some filter up there We'll filter by this channel message cache You can actually go in and one of the icons when we hover over Particular filter is an edit thing and we can actually Change it there So this this is getting pretty esoteric into how Elastic search queries work, but you could actually change this one to be from pinning us to things that say message cache to things that say I'm not gonna guess another channel See if I guess Off manager if there's anything happens Oh, here we get to find the the cool thing that happens when you make your filters too narrow that it comes and says like I can't really find anything. Maybe you should figure out how to refine it Yeah, so you can you can either do it with the clicky things or if you're you're doing it in the query language itself You can do and not Something Like let me find something that we cannot out Okay, and not panel That's a cash So, yeah, not something. I think you can also do I think you can also do like Pluses and minuses in the query language. Let's see if this comes back with about the same results Yeah, so you can you can do a plus type media wiki and minus channel message cash sort of thing and if you browse around and look at the dashboards you'll find some dashboards, I think especially the The exception log dashboard A couple of those we'll have we'll have those negative filters that get rid of of noisy messages that we don't typically Care about looking at when we're drilling down into stuff Hopefully that and answered Andrew's question All right, let's see where were we at back in slide deckland, I think I think you guys are almost done with me here I'm gonna change Slides there was our live demo slide. What do we got next? Oh, okay. So this is kind of a cool thing That's that's new since we we talked about this whole logging stack last year So we have this HTTP request header called the x wikimedia debug header that you can send in with with an inbound request to the the production media wiki servers That lets you do a few things on the slide here You can You can tell the request to be directed to a specific back-end server There's there's just a small list of them, but you can pick I think there's like four or six right now, but you can pick which of those which of those servers that that your request goes to Which can be very interesting and useful for like a live debugging session where you've Got a a non publicly released patch or some extra logging Lions put on a one of our particular testing servers When this headers at present varnish our front-end varnish caches will also Never cash the output and never give you a cash response. So you always get to talk to the back-end server in Even if you're browsing as an anonymous user Which again is is useful for live debugging And then you can you can enable some kind of feature flags You can enable verbose logging which will log To to both our elk stack and to our flat files on Florene All of the log messages like the full debug level firehose level of log messages that happened during the request that carried the header You can also ask for Profiling data to be made and then and then put up in in a dashboard that we have in a different tool that we have for looking at profile traces and The third kind of cool feature flag that you can turn on is that you can tell the wiki to act as though It's in read-only mode to simulate a locked database Which can be useful for tracking down certain kinds of bugs So there's a page on wiki tech about this that you can go read more about like the completely gory details of it So how do you use it you use it as an HTTP header which you can do with curl you can say curl-h which sensor header x wikimedia debug back-end equals one of the available pooled back-end servers And I think the behavior is if you write something in there, that's that's not on the white list you get pinned to one of the particular servers as the default And then those feature flags go in as as semi colon separated things. So there's log and trace and Read-only and again on that wiki tech page. It'll give you the the full syntax deals details of those But we've also made browser extensions for both Firefox and Chrome That make make using this easy for for just a normal person in their web browser. So you install these extensions from from your respective browsers Extension repositories And and then you get a these handy little dialogues that let you Turn turn the extension on and off make it active or inactive and then control what what header gets sent through and one of the really neat things for both of these related to this this cabana talk is that if you Enable the verbose logging Actually, even if you don't enable the verse post logging just when you have it on but the verbose logging gives you more interesting things Both of them provide you with a way to drill into a An ex debug dashboard that Picks up the request ID from From your brother your browser got back from the server in its response and go to just view all the log messages that That came in from your request So this this can be super handy when you're trying to track down something Some weird bug in in production that you can reproduce on the testing servers Then you can you can use this to capture a full trace and then get Get a nice dashboard to search around for it Right quick question from IRC again Have you established a way to correlate code releases or other events using the annotation marks in the cabana UI? You know we had one In in cabana 3 we had we had a way to do it So we have we have a log event channel That records all the things that SCAP which is our deploy tool does and in cabana 3 you were able to Overlay that on the histograms and show like nice vertical lines that when when SCAP was changing files on the cluster in cabana 4 Or at least the particular version of cabana 4 that we have that ability went away That they don't have a way to combine multiple data sources on the histogram like that And I haven't looked yet so As is the way with open source software things things move pretty quickly and We've we've been on cabana 4 for I know three or four months now But cabana 5 is out now and I haven't gone to look and see if it brings back a nice way to make annotations visible or not That would be a nice feature to get back It's one of the things that I miss from from what we had for cabana 3 There you go. Like I rambled enough to get Joel to say that was good I think We are at the credit slide I sped read through it in 35 minutes. All right, so Yeah, elastic search cabana log stash and and all their logos are Registered trademarks in various countries. I just used them here for the purposes of identification And Awesome I picked the wrong one so much screen share fail We find it That one says desktop to you. How about that one? That's probably the slide I wanted people to be seen Thanks, Rachel Yeah, so CC by essay 4.0 on the On the slides and we'll get the the slides in the video up on Commons as soon as we can Probably if we got any more questions from IRC Take that as a no, I guess The YouTube delay is always hard to deal with though when you get to this part Yeah, I think we don't have any further questions. So thank you so much Brian. That was excellent. I'll see you all later