 Thanks everybody. They said to make your intro fun, right? So I thought I'd do that. So as many of you know, Django came from a journalism background, and we've had some talks today that relate Django in like literary terms. And so I'm going to take it back to the journalism roots and do a little bit of the who, what, where, when, how of app metrics. I'm going to screw up the order a little bit. So first, who, me, I'm the founder of Revolution Systems and have been working with Django for a really long time. For this talk, the most important part of my open source world is a project called Django App Metrics that helps make doing some of this a little easier, and we'll get to that towards the end. But then we're going to talk about, most importantly, the why. When I first proposed this talk, I thought, you know, I want the shorter time frame because covering how to do metrics, I can't fill 45 minutes with, right? It would just get really boring and repetitive. But then when I went to actually do it, the landscape has moved and things have gotten a little easier, and so I can't even fill 25 minutes with it on how to do metrics. You know, the code part. That part has become really, really easy. The hard part is, what should I be tracking, and why should I be tracking it? So let's cover some of those things. I come from Kansas. I had mentioned Django's from Lawrence, Kansas. My state has some trouble with believing facts, and so, but I, and I know about you guys, I am driven by facts. If you can show me some numbers and I believe that you somewhat accurately collected those numbers, I'm much more likely to take action than, you know what, I feel like that button should be a little bigger, or I feel like we should make this feature more prominent. So this helps give you ammunition with your coworkers and your bosses for things that you want to get done or what you don't want to have to do. So obviously we collect a lot of metrics and have pretty graphs for the kinds of things you expect ops, you know, what's the load, how much disk space, those kinds of things. So we need it for that. But we can also use metrics to decide on development prioritization. If you have, and also, to be able to tell customers, no. You'll get requests from your coworkers or customers saying, you know, it'd be really great if you could make it do this. I know you do, you handle CSV, but really, we really want an actual Excel download. And you're not too keen on it. You can go look and you know what, you're the only customer that uses that feature at all, right? Maybe we shouldn't prioritize that because it's just not something that's being heavily used. We can also use metrics to make design decisions. All designers are going to hate me for this slide. I did like everything wrong, right? Which is funny because this is the slide that took me the longest to actually design, right? Sorry, I guess that's good for designers. It's gotten so easy to do it right that you have to work hard to do it wrong. I'm pretty sure Chrome doesn't even allow you to use Comic Sans anymore. But there's a lot of things on our sites, on web apps, where we do them kind of out of history. We put the controls at the top and we put the controls again at the bottom. Well, if you track where they're actually used, it may be that the nature of your app is the user always scrolls to the bottom of the page because that's where the newest data is. And so they're only using the controls at the bottom. So get rid of them at the top. You can get some screen real estate back. Maybe other controls should go up there. And if you track the difference, not just I used feature X, but I used feature X from this spot on the page, that can become an important piece of information. We can use stats against our data with internal things, internal politics. Who should get more funding? Who should get a new staff member? Should we bother to have those meetings? We talk a lot in our world about getting rid of the walls between Dev and Ops, but we can use stats to get rid of some walls between Dev and Ops and sales and marketing. Because if I can give the sales guys some numbers that they didn't know about, or the marketing guys some insight into what they're wanting, now they're my friend. And they're going to be a little more understanding when I tell them, hey, this is going to take six weeks to build. Are you sure we really want to do this? I can't do that right now, or whatever else. So this can be a way to kind of break down some walls inside your organization. So we're going to grab lots of crazy metrics, but we have to remember that, you know, correlation is not causation necessarily. So if you see a big spike in your graph, that doesn't necessarily mean that the other thing that happened at that same time point is what caused it, right? So it's usually just a signal of, I need to dig deeper into this problem. So what kinds of things should we be tracking specifically? So most of you can probably come up with the, oh, before I cover that, I need to cover costs. When I talk about costs on metrics, there's a couple of things to consider. There's the actual storage cost, right? I'm going to collect all this data, how much is it going to cost me to store? There's the time it takes me to implement it. And then there's also what this does to the runtime, right? I don't want to slow my app down considerably by tracking 9 million things on every page, on every action. So I need to balance them. But keep in mind that metrics are pretty darn cheap. If you set your retention policies right with Graphi or with hosted services, we'll get into some of that, you can store a metric for like 20 megs of space for a 10 year period. So don't be too afraid of collecting too much data, but do keep in mind that they're not absolutely free. So we know to track the default things, right? Load disk space, memory usage, how many whizgy processes we have going, how many worker processes are running. Most everybody knows to track that stuff. And it's a good place to start, right? It's better than nothing. But there's other things that you might not think about that are good to be able to overlay and look at along with feature rollouts and bugs that can help you solve real problems. So table sizes, right? Not just how much disk space does the database use. What happened? I did this deploy and all of a sudden this table is blown up in size. Does that make sense? Or is that maybe something really bizarre in a bug that we just haven't detected yet? How many deploys are we doing? How long do they take? When did we have outages? How many times is somebody SSHing into the server? That can be a really good metric of we're not doing enough ops automation, right? And it's something that you can easily script up and keep track of. But more importantly than just keeping track of some raw numbers, we need to make sure that we segregate some of this data. So like, I use this bottom example here of support tickets. We don't want just tickets created and resolved, but we want, you know, is this created by somebody internal versus external? What department? You know, we want to have a little bit more dimension to some of these data points so that we can see are these support tickets all generated by customers or are they all generated by that one guy in that one department who's really just kind of, you know, a pain, right? So we have basic app metrics and this is kind of like what marketing and sales wants to see and growth hackers, right? You want to see like signups and free paid and upgrades, downgrades. We all know that kind of stuff. You know, how many people like you on Instagram or whatever. But you know, there's some other things like what emails are we sending to customers by type? I mean, we know how many we sent, probably, if you use mail gun or a service, like they give you an invoice, you said 4,812 last year or last week, but by what type? Are those all password resets? We don't know. You know, how many times are people logging in? Does anyone ever log out? You know, that can be a design decision of we don't spend any time on what our logout page looks like because one person does it every five days. So, you know, most people, the cookie just times out. You don't actually physically log out. So having that data helps us to see how things are being used in the app and where we should be focusing our time, right? In bug fixes, in new features, in performance, you know, a lot of people track how many items are in the queue, but they don't necessarily track how long individual types of jobs take in that particular queue. So, you know, how long does it take from iClick password reset to I've told mail gun to send the email? If that's a second, cool. If that's 15 minutes, not cool, right? Because I'm a user and I think something's wrong and now I'm hitting it five or six times and I'm plugging up your queue because I kind of expect to get an email within a minute or so. You know, I don't necessarily expect it to be instantaneous, but I expect it to be a minute. What's our cash hit rates? You know, we're using caching, but you know, are we using it ineffectively? And on like API usage, is this all unauthed public API access? Is this all, you know, are just our normal customer, our customers, what end points are being hit the most? Sometimes you'll find that a feature is used very heavily from an API and not used at all from your desktop app or used a lot desktop and API but never on mobile. And so you can de-emphasize that in design and in terms of bug fixes. And we should track random internal stuff. When did we have meetings? How many staff members did we have then? Because we might be able to correlate bug fix to how many people were working, right? How many people were on vacation that week? We might see that when one particular staff member leaves, that bugs go up or go down. Maybe that person was more important than we think or that job was more important than we think. How many times your coworkers are having to work more than 40 hours? And you know, are we generating problems because of burnout? We can track these things, right? But we can't correlate them without some sort of visualization. And you know, we should track chat messages and emojis used and you know, but we can have some fun with it, right? I have this coffee example here of did you do it in the office or did you go to Starbucks and was it Grande or Vente or whatever. Not so much that I want you to track your coffee consumption at that level but to think about that in terms of metrics, right? It's not just how many salary tasks did we do. It's which of them and why and you know what kind of spectrum were we doing this was this how many free users used that feature versus paid users. So in the when what you should be doing it now. You should have done it yesterday but we can talk a little bit about time travel here in a little bit on how to do this but you need it before you need to have some metrics in place before you really need them, right? So start collecting them early even if you don't have a good plan to visualize them. So how do we go about doing this, right? There's the easy stuff. You know, Google Analytics, everybody's got that on their site, right? And then there's, you know, some stuff that helps with ops. You can use OpBeat, who's one of the sponsors here. You can use New Relic, Datadog. Those are pretty easy to set up and they give you a lot of insight into your app. I kind of hate some of these services because before they existed everybody had to come to RevSys for performance help and now they have these services and they can see oh it's this one query that's making my site slow and so but then there's other services like Librado and Mixpanel and Keen.io that let you kind of create your own metrics and create ways to graph them and it's the kinds of things that you would do if you were hosting your own metrics without having to do any of the ops work or set up yourself. They have different pricing and some different features so check those out but if you really kind of want to get to the top level right you've got to do it yourself. So how do you do it yourself? The best way to visualize is Grafana. There are other dashboard systems out there but if you don't have one picked out that you really love you should look at Grafana. It is beautiful you can show it to bosses and clients and everybody likes looking at it. Nobody's going to say this looks like Matplotlib or something that you know we did for Grafana but you know so pretty is important. If people don't like looking at it they're not going to look at it on a regular basis and Grafana is actually besides being beautiful it's easy to use. It can target graphite and influx DB for its sources and more importantly it can target multiple of them so you can kind of shard your metrics as you need to to grow. So you can have some metrics pulling from cluster one and some from cluster two all on the same dashboard. So you need to the one of the mistakes I see people make is they have one big dashboard with 9,000 things on it and nobody's going to look at that and nobody's going to find that useful so create special purpose dashboards for the kinds of things that you're looking at. What things do we care about in this release? What things do sales care about? What things do marketing care about? But once you've got them set up you're going to look at them a lot maybe stick them on a big screen in the lobby and have them rotate and you're going to kind of stop looking at them but you need to be able to dig in and look for new and interesting facts so spend some time every now and again just overlaying different metrics to see what falls out. I think the main reason people don't do metrics is because graphite is a pain in the butt to install. Who here has wasted a day trying to get the graphite setup? Okay this is so maybe I'm wrong maybe people don't use it because it's hard so there's an easier way and that's graphite API it gets rid of all the hard parts of the setup and gives just the parts you need to work with Grafana. So check out graphite API and then inflex DB is the new kid on the block it's written in go and it's the hot new thing. Its main difference is you can have individual metric points and then attach arbitrary tags to them so you instead of having in graphite you would have CPU load for US East you know box five with inflex DB you can just have CPU load and have your availability zone and instance type be just tags against that and so you can do more aggregations and query more easily. So how do you transport these metrics around? You use stats D. Stats D is a really easy service to run the biggest mistake people make is they have a stats D system and then they have like 20 web servers all talking to it and that actually gets slow and it can slow your app down so run one on every spot you're going to collect metrics and use it to show them to wherever they end up getting stored. Another thing that people don't tend to notice is that if you're using log stash you guys are taking logs right and you're doing stuff with logs and shoving them off into a service you can use log stash to generate metrics off of logs based on regular expressions. So you don't have to retool your app to put this stuff in you can generate log metrics off of logs and then if you want to do front-end metrics then all you need is a little API that you can just suck these in from your JavaScript and shove them into the rest of your your back-end. So how do you do time travel? Most of these systems you can you can push into with a specific time point so when did this happen? So you can go back and replay all those logs you've been storing in S3 to come up with what the data that you're looking for to show trends over time. So how do we do this with Python? The stats D app is very simple you get a stats D client and then you have a counter that you can increment you can increment and decrement you can increment by more than one these are just simple examples you can have gauges so right now my stress levels at 25 percent out of 100 I'm just picking arbitrary numbers here and then you can do timing how long did I sleep last night well I got about eight hours you just need to store it in milliseconds so you can have timings of how long did this particular thing take and then influx DB this just is very similar but they take a specific format that's the measurement name whatever arbitrary tags you want to attach to that and then the time value and then what the value of the field was so giving Django a 10 out of 10 and I'm attaching that it was in Austin and it barbecue is the predominant food so to the Django bit Django Admetrix is a little utility that is kind of like a haystack in that it abstracts out where you're storing this so you can use Django Admetrix and target mixed panel the brotha stats D store them in your database storm in Redis you know so if you have a personal home page you want to collect some metrics on you don't want to set all that stuff up just storm in the database and you can do some simple aggregations and roll ups and or in Redis so you can put this in when your app is small and change where you're storing them over time without having to retool everything so you can have a simple metric like code examples so that just increments the code example metric or we can time things with the context or we can set a gauge if the NSA knows more about what your users are doing with your application than you do you're doing something wrong you know so like I think that you know if all you're using is Google analytics they probably know more about what is going on in your app than you do so embrace your inner NSA agent and and get in there and be a little Snoopy any questions hey thanks for the nice overview so my question would be how do you decide how to generate the data if you're going to do it directly from your application or you're going to log it and then have something parse it out like how do you how do you make the call and what comes into the decision I mean I think that the biggest is do I already have the code written or not like if I've got a big app that's got a lot of things going on I'm probably going to try to collect the metrics from the outside so I'm gonna I'm gonna try and pull it off of logs the other reason that I might use off of logs instead of inside the app is because of performance if I don't need the answers in real time or near real time but trying to collect that metric any other way would slow the app down I'm going to push it into elastic search and cabana and pull or use log stash and pull those out and maybe retransmit that into something like graphite just so that I can have it overlaid with the other the other bits thanks do you have support for the influx db tagging kind of stuff in app metrics I don't yet I had hoped to have time to add that and for for today but I haven't done the influx db back in yet okay anybody wants to help with that that'd be awesome but prince yeah I was just wondering if you had any tools or recommendations for more like more of those esoteric metrics you were suggesting like the the culturally type ones like so yeah I mean well pull requests and things you can so github will do webhooks right so you can listen for those and then just shove metrics and you can go and backfill that pretty easily by hitting the API things like do we have a meeting you know like I you know do you maybe off your calendar you look for this string meeting and automatically pull them in maybe a very small little web app that you can just hit a button and say hey I'm in a meeting or you know maybe you want to track good meeting versus bad meeting like you know I know that when I was able to you know show that all the requests were coming when I was an IT manager all the most of like 70 percent of my requests that were eating up most of my time was coming from one department when that department complained about how long things were taking I said well you know almost all my requests are coming from you and you know how about you give me some of your budget so I can hire more people and we can both flow better and he's like absolutely I didn't realize that you're doing all of our work right but if I didn't have numbers like that I'd just be you know he said she said kind of arguments of no it's your fault no it's your fault it's Steve's fault you know hi I'm from up thanks for the mention I have a sort of almost cultural question and do you have any tips on making your co-workers or co-developers care more about performance so I think like anything you know if you measure it people care about it right if you can show them that your work over the last week slowed everything down by 10% they're going to start to at least check that it may be valid like it there's that there was no way to make it any faster and the feature was necessary and so you can't get you know you can't get too hard on them for it taking longer with more functionality but you know most of the time nobody is noticing no one's looking you know and you can't if you're not seeing it on a change-by-change kind of basis you you can't figure out it's harder to figure out the root cause if you only look at your performance numbers every six months are only when the boss starts to complain that the home page is taken too long to load like what was it I don't well go look through that last 8,000 commits and we'll try and figure out what where the problems were or who the problem is right it could be that 90% of your developers are kind of performance savvy and there's the 10% that aren't and they just need some mentoring but if you're actually looking at this stuff on a regular basis you know somebody gets an email it's like with testing you know I have projects where if testing falls below 80% coverage it doesn't pass CI and then can't be deployed so boom everybody's just got to keep that up and sure sometimes people just game the numbers but like if it's something maybe you need performance testing like the guy was talking about in the last talk and have that be something that's you know stopping the build. Thanks. So this is kind of a tooling question on Grafana you can get data from a lot of different places and you can put it in a lot of different places but knowing what data to put together to actually get meaningful insights is fairly kind of a pain yeah and is that something where Grafana makes it kind of easy to take these disparate pieces of data and try to match them together and it makes it easier. It makes it easier. It's not easy and that's why I say I think you should go play you know a little bit I mean you nobody has all week to sit around and just you know graph numbers on top of each other and see what sticks but you know try and you know try and look at everything every now and again and see does any of this match up with anything else does it you know does overworked mean more bugs you know. So kind of touching on that last question you touched on earlier in your talk about how these aren't for free but I was kind of curious if there were any like good guidelines or any hard and fast rules in terms of monitoring every little thing in the office or in your project or whatever. I mean I think that you know I think we could all agree that if you needed to hire a person to track shoe size at your company probably not a metric worth collecting right so I'd be like there's time costs actual costs you know so it's a balancing act but if you have a time tracking system already that you're using for other reasons then pulling data out of that and shoving it into something that you can graph against other things you know that's probably worth the 30 minutes it takes to do that if you've got a calendar or you know all meetings happen in the conference room and you have an app that tracks when's the conference room in use it may be worth collecting meetings you know it may be worth collecting employee happiness if you have other reasons that are doing that HR has got a system that surveys everybody every three months it might be worth collecting that and shoving you know min max average mean into into Grafana and see if anything shakes out thank you all right that's our time thank you Frank thanks for life