 Thanks for coming to Ruby Monitoring State of the Union. I appreciate you choosing to come here instead of the future of Ruby keynote given at RubyConf. Cool story, real quick. At RailsConf, I gave a talk this year and Yehuda Katz's future of Rails was opposite mine, so I'm kinda making a thing out of this. All right, and I'm at Joseph Rousio, you can call me Joe, but that's on Twitter, GitHub. I am the co-founder of CTO at Librado, we're a company that does a lot of different things monitoring, hence my love of graphs. You can find me wandering the halls if you want to talk about why you should love graphs. Come say hi during the conference. And so I wanna start kind of with the motivation. So what's all this fuss about monitoring? If you've been paying attention, I think probably over the last two years maybe, there's been a significant uptick in kind of the discussion about monitoring and the mention of new monitoring tools. There's almost what I like to call a renaissance going on in monitoring now, and it's, I think it's interesting to look at why, because haven't we been monitoring for decades, right? As long as there have been computers, haven't been people monitoring them. And so I think it's kind of really instructive to look at what's happened in the way we build and deliver SAS over the last, say, 10 years. So if you looked in 2002, Web 1.0 type properties, your seed round, typically if you were venture funded, you took about a million and a half dollars or so, and you needed that because to get going, you had to buy a bunch of physical servers, you had to have enough capacity for whatever you thought your launch was. You would need a dedicated ops team to service that. And with, you know, on top of your Linux servers, you'd be building basically a custom software stack in-house. And if we fast forward now about 10 years, our seed round now looks more like $20,000. Our infrastructure is usually in the cloud, AWS, Rackspace, wherever. So it's just OpEx, it's a very small initial outlay. If you're lucky you have an ops person, but chances are you probably don't, and you're doing dev ops, and it's just you guys who write the code. And you've built this, you're basically aggregating a lot of off the shelf open source software and external services. And so what that leads to for modern infrastructure is that we now have infrastructure that's far more agile than we used to. So we don't have to wait months for new servers. We don't have to plan ahead and, I mean, we have to plan ahead, but we don't have to do heavy capacity planning and say, okay, this is when we need the step-wise increase. Our infrastructure is now ephemeral, which means not only can we bring it in really quickly, but it can go away really quickly, which is kind of the more interesting part of that when it comes to monitoring. And so the end result is this new paradigm is we have more change, and I would claim worse tooling. And when I say worse tooling, I mean that because if you talk about the custom built software, I promise you the example I always use is inside Google. Google, I'm sure, has the most amazing monitoring tools for their infrastructure, their web scale infrastructure that you've ever seen, but it's only applicable inside of their shop, right? So that's no good to those of us who are building lean services out of tools, open-source software we can find. And so that's a general reason for a shift towards monitoring, but then if you try to go a level deeper, and I try to look at, if you look at companies that I think are really pushing the edge in this space, so like your Etsy or GitHub, for instance, and what the kind of common thread was there. And what I kept coming back to was continuous deployment, right? So these are shops, the shops that are really leading the charge in monitoring all practice continuous deployment. And I think the importance for monitoring is continuous deployment, you require really five things to do this. You require continuous integration, right? Travis CI, your tests that we all ship. You need one click deploy, so it's very cheap to actually ship the site. Ideally, you're gonna have some feature flagging, so because we're shipping the site multiple times a day, there's lots of change and risk, we're gonna roll those out slowly. But then after that, once the features are out, the very next thing we need is actually monitoring. So if we're shipping the site, we're pushing new code out all the time, we need monitoring to understand what's going on in our production infrastructure. And alerting as well is, if you think of monitoring as looking at a dashboard and alerting as passive or active, rather, active alerting. And because I don't just wanna make wild claims that this is the reason for this, I spent about seven seconds doing a very inscientific research, but I looked at Google Trends for the term cloud monitoring and then the trend for continuous deployment. And I think it's at least an interesting correlation, which maybe someday I'll explore further. And so, okay, so continuous deployment. So what does this monitoring get us? Specifically, what does it get us? So some examples. Anyone who's ever managed a queuing system in production fears this graph because this is a stuck queue that's accumulating jobs or work units it's supposed to work through. And so what this graph has shown us is a regression. Someone shipped some code out or pushed something out at about, I think that says one, that caused the queue to stuck and it was noticed and then about an hour later, a fix was pushed out and we can see that the queue started to drain. You can detect outliers. So this is a graph of a cluster of basically frontend instances that are receiving request load balance and this is how long it takes for them to do authentication. So you can play the game, which one of these is not like the other. So we have a bad instance in our frontend that has to be taken care of. We can validate, so that's detecting regressions. We can also use monitoring to validate hypotheses about the code we shipped. So we say we're gonna put a performance improvement out. How do we know that that actually happened? And so here's an example of some work that went into to reduce the latency on a particular type of request and when it was shipped out, it dropped it from 100 milliseconds on average to 50 something milliseconds. We can validate that expected periodic behavior is being caused by what we think it's being caused by. This is like read operations and there's a batch op every 15 minutes that spikes our reads. So we can see that's what is causing it. We can inspect customer behavior. This is a graph during the leap apocalypse of malformed requests that our API was receiving. So we always have some undertale of bad requests coming in but when the leap second hit, a bunch of our customers code broke and started sending us bad requests. And what's even, I also think it's important is we can use monitoring to validate that our features are actually being used. This is just a graph we use, some new feature we shipped. So in terms of validating hypotheses, it's not just performance hypotheses, but it's even a hypothesis that this feature is adding business value. So when you ship a feature, is it being used? And then if it's being used, how does its use correspond to the performance of your site? And the last example, if none of those are interesting to you, is that you can use monitoring to find the chunky bacon in the wild, which is the best kind of bacon. So we can use monitoring to detect regressions. We can validate hypotheses about our code. We increase our code's resilience to change. And so the question I'd ask now is this sounding familiar? And I think it should because I'm gonna make a claim in this talk and kind of, part of my goal is a call to action around how the Ruby community views instrumentation and goes about instrumentation. But I'm gonna claim that instrumentation is unit testing for operations, full stop period. And so, yeah, if you wouldn't ship code without your tests, you shouldn't ship code without instrumentation. And that's decidedly not the state that we as a community are in today. And the reasoning for that, just high level, you literally cannot do continuous deployment. And if you believe, I'm also gonna claim that in the future, continuous deployment will be the norm. And exceptions will be rare cases, like if you have regulations from the SEC, for instance, but for shops to have the option, continuous deployment will be the norm and instrumenting code is gonna be required to do that safely. If you were at the talk just prior, service oriented architectures, right? So this is great because it decomposes our code into these small testable units, we practice that at Labrado. But you're also now running this distributed system with lots of running parts. And there's literally not a tractable way. The only way we can reason the truth of a system that's made up of lots of little services is to observe it in production. That's the only way. We can come up with a lot of fancy load tests and CI and we can get pretty close. But the only way we'll know for sure what's going on is to observe it in production. And lastly, I think it's very important because the devs are, we are the domain experts, right? I think it's a broken pattern to ship code that is patching functional tests over the fence or into interproduction. And then someone there who's like, oh, the site is slow. Well, I have to start digging around. When we write the code, we have the mental model of what the code, we think the code should do and we know where to put the things that can validate that the code is doing in production, what we think it's doing. So if you believe that, then what is the state? If you have this religion now, you say you wanna do this. So you start Googling, you're monitoring for Ruby and you come up with something like this. Just an explosion, right? And you can go for pages and pages through your Google results, just searching GitHub. And I've put up here, there's some solutions up here that I think are really awesome and there are some that I think are really not awesome. And that's part of the challenge, is even trying to discern that. And you end up, what I'm gonna call an anti-pattern of a sort, typically a lot of times you'll see, you wanna monitor certain things. So Project X that you found on GitHub does an awesome job of that. You pull it down, you install its agents, its backend, its little web server, and but then you realize, oh, I gotta monitor this other thing. Project X doesn't do that and it's this monolith and I don't know how to get in there, but this other thing does it. So I'll go get Project Y and then Project Z and before you know it, you're running all these little silos and having a lot of difficulty correlating across them. And so I would say definitely, if you look at the tooling we have today and these are the good tooling, so everybody knows New Relic in Ruby, there are monolithic open source software projects. So if you go into GitHub and search for Ruby monitoring, I'm not gonna name any names, but you'll find these projects that use some kind of an in-memory database and give you a gem to install that talks to the in-memory database and has a web server that runs. StatsD Ruby plus StatsD is and probably graphite on the other side of that or looks something like Labrado. You can use to pull stats out of your Ruby and push it to this Node.js statement called StatsD and services like ours, you know, who are all and so generally what you'll see is people picking several of these. Most people I know run several of these and I think to kind of come at this in a more elegant way, we really need to decompose the problem, right? And so I like to think of monitoring and I think there's this model that's starting to emerge not just from me, but from a lot of people telling this is that we have, we collect, we instrument our code, we collect measurements in our code. At some point those measurements are aggregated, both probably across requests or whatever the unit of work of code is, jobs, and then across processes and potentially even across hosts. Those aggregated metrics then need to be stored somewhere and they need to be stored somewhere that we can do analysis either on the way into storage or we can pull specific pieces out and do analysis as well as visualization. And Travis, one of the Travis guys is doing a talk later today which I think is gonna be good which is gonna delve into each of these blocks. For the purpose of this talk though because my goal is given the current state of the union I want instrumentation to be as common as unit testing. We're gonna hone right in on collection because I think a big problem right now is even if you're subscribing to this model and you have your stats D Ruby doing your collection you have your aggregation stats D, you've got graphite doing your storage and then you've got some dashboard on top of that. The problem with collection is by choosing say stats D Ruby if I say oh I'm just dumping my stats to stats D I've already made a deployment choice, an operations choice. I've said to the guys who are gonna run my code hey you have to run stats D, right? And this is already broken because there are lots of people who have no interest in doing that. They don't wanna run maybe they're, they don't like node period or they don't know enough about it. So I think it's important to look at collection and say what is the bare minimum that we can have in common at the collection point without bringing in any dependencies. So what do devs require? And I'm gonna split this into dev and ops but even if you do both you have to wear your hat separately. So I think devs require a concise set of instrumentation primitives that are powerful but easy to use. Minimal dependencies. Any type of external dependency like stats D or any other kind of aggregator or storage backend or visualization that you bring in is going to reduce the sphere of people who can use your instrumentation. So we wanna have almost no dependencies on what we do with the measurements after they're collected. We obviously have to have minimal performance impact because we're gonna run this in production. Ops require flexibility at all the other layers in the stack. Ops needs a way to get at the data you collect through your metrics and put it into whatever aggregation, storage or visualization tiers. They need simple introspection, right? So they need a simple way to see what you're collecting and they need a simple way to pull it out. So our ideal instrumentation library implements these primitives. It captures the state somewhere in memory in the process and it doesn't do anything else. It doesn't do trending. It doesn't keep samples over time. It just collects the measurements and it puts it in a safe place where something else can come and get it. And if you look, I think the model that we should look to follow, if anyone hasn't heard, you should check out on the jvmmetrics.codahale.com. And so Codi gave a talk at CodeConf about 18 months ago that I think kicked off a lot of interest in instrumentation. And the library he has on the jvm puts forward a small set of powerful primitives. And one thing that's cool about following that model is there have been other people emulate it. So there's Folsom on Erlang. And if you're a polyglot shop, so Labrado, we run things on the jvm. We run things on Ruby. It's really powerful if I can have the same primitives across different platforms and frameworks. So what are these primitives? So the simplest part, think to think of is like, how do we count stuff? How many jobs are in the queue? So we need some primitive. If you, I assume we had some gem called foo that we could, when we queue a job, we could just say, okay, I'm gonna increment this key. So our metrics will be named by a key, that's a human readable string. And when I complete a job, I'm gonna decrement it. And then what the counter is gonna do is basically know how many jobs currently in the queue. Okay, how do I track a rate? With that, I mean, say, how many requests per second are we handling? So I'm not really interested in the total number of requests. I mean, I could use a counter. I'm really more interested in throughput on this case, right? And so there's a primitive we call a meter. So here's an example of a rack middleware that's just gonna track the rates of error into our API, success returned out of API, and then not founds. Okay, so now we can do count things. We can track our throughput. The next thing is how do we track distributions? And by distribution, if you think of the question, how big are these requests, right? So if you have something that's not normally distributed, your average may not actually be that helpful. So if your app serves two types of requests, one that's 10 bytes and one that's two megabytes, the average request size is gonna be garbage. You really wanna know what's the distribution between the 10 byte requests and the two megabyte requests. And so there's just a primitive, something called a histogram. And so we just have some code that just adds a value to the histogram. And lastly, you might be interested in how we time something, or how long are these requests taking? And so you need a timer. And this is probably, if you've ever rolled your instrumentation, some of you probably have, you've written something like this that calls get time of day, does some work, and then calls get time of day again, does a delta. So as a developer, if we were to have this putative world where developers instrument their code like unit tests, that's all you need to know. Like those 10 lines of code. If you can handle that conceptually and then apply your domain knowledge, you can instrument your code and ship your code with instrumentation that other people can use, assuming we have our gem foo. On the flip side, from our, if we put our ops hat on, how do we get this data? And so I would also claim aggregation is interesting. So something like stats D, where you can fire UDP packets out of every request out of your process. So no latency. And you got a daemon that aggregates those all and can do then cross requests, cross process aggregation. It's very easy to get started with, but I'm going to claim at scale that it's better to do cross request processing, cross request aggregation in process and use something like a middleware like stats D or something like Abrado to do cross process and cross host aggregation. And the reason for that is your process and host wise aggregation is almost always, unless you're talking Google scale, a very tractable problem. If you're serving millions of problem, billions of requests, cross request aggregation, Google for stats D scaling and you'll see. So our in memory store, I think if we call the registry. So we just need somewhere that when one of those primitives, one of those lines of code is executed, we'll go somewhere in memory and in a thread safe manner, capture the additional state it needs. And then also I'm going to say we need to be double buffering so when we want to come and get it, we're not going to block anybody who wants to write to it. And it should provide for simple iteration. So we'd have something now if I want to look at this, the data or query the data periodically to push it somewhere that it can do aggregation or trending, I can write a separate thread that will just do this. It'll get the registry, walk through every metric in the registry and based on the type of the metric, take the appropriate action. And so you could very easily build reporters as separate gems, they're very simple to build and you could push these things anywhere. If you are a Splunk shop, you can push to logs. If you're in dev mode, you could push just to your console. If you're using JRuby, you could write something very simple to push this to JMX for J console. Stats D, basically anything that you can write an iteration loop and you know how to push to the thing you're talking to, you could whip a gem together for this. So going back, okay, well that's how you walk it but I've kind of really glossed over like what are these different types actually represented as or what do they look like? And yeah, whoops. So how can we interpret this data once it's reported? And this is where up until now it could seem as if the abstraction is very, very simple but it's important to get the implementation right. And so there's gonna be a small amount of math and that's because streaming data is really hard, right? If you ever took stats in college, like intro to stats or high school even or even just Googled for it, generally what you'll see is discussions about stationary analyses. So if I have a sample population, if I have a set of 200 samples or 1,000 samples, how do I reason about it, how do I check distributions? In something like a web process or a worker long running, I'm gonna have millions upon millions of samples that just keep coming, right? And I can't just aggregate them all in data and say, okay, whenever you wanna know what's going on, we'll just look at the whole population because pretty soon I'm gonna need terabytes of RAM to track all these. And also to go with streaming is that the recency of results is very important, right? So if I'm just tracking in average in memory from the start of time, I'm really more interested when I'm operating something is what's happening now, right? So I need some statistical methods that weight more heavily to what's happening now or more recently as opposed to what's the overall trend. So counters are actually pretty easy because that's just an absolute count. If you're consuming this, you have to think about how you would do a derivative or how you detect rollovers when the process restarts. But from an implementation perspective, it's pretty straightforward. Meters are a little more interesting. So the first thing, we're gonna track a one second rate in our meter. And that's because I think for humans to discuss the throughput of computers in any unit of time larger than a second is very, very silly. That's one of my opinionated things. But we can track a mean rate, but that has this problem of averaging. So we're gonna use, or ideally use exponential weighted moving averages. And the reason for doing that, an exponential weighted moving average takes weights more heavily the more recent samples than the ones that we've just found. So the load on a UNIX machine, if you're just to admin the one five 15 minutes probably looks very familiar. That's similar to the load on your machine. One of my favorite examples of these in computing is the TCP congestion control uses a fairly simple but powerful weighted moving average to kind of reason about whether the latency in the network is a result of a spike that it needs to temporarily route around or if the latency is reaching some new saturation point. And we can also, if we want to, we can pull a count out of these. So histograms, you know I alluded to earlier, we need histograms because in a lot of cases averages really suck. And they lead you to think things that aren't true or miss problems that are existing. And so the very basic thing we can pull out of a histogram is an average, although also our min and max and our variance and if our standard deviation and variance is much larger than our average, it'll tell us that our average is useless. But we can also pull out things quantiles. And this is very interesting. So we can see what's our 75th percent request time? What is our 95th percentile? What is our 99th percentile? And there we can reason very easily about what's our worst cases, what's our biggest problem right now? So how you do this with millions and millions of samples is there's a technique you can, you can go, it's called reservoir sampling. But it's basically a fairly neat technique that uses a statistically significant portion of the overall sample base to generate the percentiles. And in addition, so in order to ensure that we're weighted properly, like our exponential weighted moving averages, we're going to use forward decaying priority sampling. And timers. So timers, time and operation, but this is where abstractions come into play. You can actually build these out of your histogram meters. So inside of a timer, you can just push each timing into a histogram primitive and you can increment your meter. So now we can get both the rate of operations we're timing as well as the percentiles of the timing data. All right, so how do we get started? Ideally we could wave some magic wand and this would just show up in the standard library. I would love to think that, but I'm nowhere near naive enough. Standardization, proper standardization requires adoption. So we're going to need some kind of gem, I think, that focuses on this. So if you Google around, there was a project started, I think it's called Ruby metrics right after code is talk at code conf, which appears to have tailed off, but there's one that's sprang up alongside of it that is still going really strongly that I'm gonna invest in called metrics. You can find this on GitHub, but it provides the instrumentation primitives here as well as all of the math. It's thread safe, so it uses atomics on the individual instances of a metric. So if only one thread is incrementing a counter, then it's very fast. It's performance, so it uses the double buffering on the registry. It also comes with, if you need something to do cross process aggregation, which is separate from this concern, but it's handy, there's a separate gem, I believe, for metrics D, which performs like a stats D, if you wanna go pure Ruby. You can, so you can find it here, github.com, Eric Metrix. So future work, so this, the interesting thing about metrics is that you actually look at it. I'm not even sure if it's clarified now, is it starts to look upon first inspection like one of those, hey, I'm making choices for you, I'm bringing dependencies for you, because it has a set of reporters embedded in it, Lerato, Graphite, stats D, and it's not entirely clear that you can actually run just the instrumentation and just the registry part and nothing else. So I've been working, there's ongoing discussion on github, which if you're really interested, I'd really love to have you guys join, but we're going to try to clarify the purpose and distill that gem down to just what I discussed today. So just the instrumentation primitives and the state in process. So we're gonna pull the existing reporters out into their own gems. So anyone who wants to write a reporter for whatever they're doing can push a gem up and you won't need to get permission to have that pulled in. So the biggest problem with most instrumentation libraries is, and as a vendor I face this because we try to integrate with tons of them, you build a reporter and then you have to make a pull request and the person, understandably, who's managing this open source software and may or may not even be using your particular favorite backend has to decide whether or not something that's, something he will be responsible for maintaining. So a pattern of completely separating out reporters into their own things that you can push and maintain is I think very important. Namespacing practices. So hypothetically saying if we reach this nirvana and every gem has a set of keys associated with it of instrumentation it comes with, most storage and analysis systems have a concept of a single hierarchical namespace. And so how do you prevent collisions? I mean, there's some simple patterns say, okay, just you prepend all of your keys with your gem name. But there's also questions around if you're in an app, should we then prepend each of the gem measurements with the app name? So if I'm running two different services that both have the same gem in it, how do I know which service it came from? So we need to clarify, I think some best practices in namespacing. Testing metrics is actually something I think that's interesting. So with a practice of only, of just collecting the metrics in memory, you can easily see how we could actually start to include into our tests. Okay, I'm going to run this function. And one of the parts of the spec is if I run this op three times, the count should say three, right? Or the rate should say this. But I think for the types like the rates and the histograms where there's some complicated math going on, the instrumentation framework could actually provide testing primitives where you say, okay, here's an op. And I want to verify that when it does this many times a second, the numbers come out correctly. And I think with the right abstractions it could be very easy to include your metrics specs in your test too. And then I think base threading support because there's kind of a lot of interesting stuff. If you've ever looked like at the source of the New Relic gem or we've been, if you've ever worked on a gem that you're trying to do instrumentation, a lot of different types of web apps, there's a lot of different edge cases that are on a background thread handling. How do you start it up? Like in unicorn with forking, how does it work there as well as with other apps? So there may be an opportunity to extract some of just the thread management as an optional component or as a separate gem, but that's something else we're discussing. So in conclusion, to the very highlight, if nothing else, if you take away nothing else, I just implore you to just kind of think about whether or not instrumentation should be as important a test in a future with continuous deployment and service-oriented architectures. I strongly believe that. In your monitoring, really think about decoupling things at the right places. Zero coupling. And finally, don't underestimate the complexity of actually tracking the measurements. I mean, it's surprising when you start looking into it, all the things that you have to do to make sure you have really high fidelity measurements because there's nothing worse than having data coming up in a dashboard and actually being misled by it. So then, I guess the last thing I just wanted to thank, this talk was informed by discussion with a lot of different people who's time I'm thankful for. A bunch of great Rubyists and then Coda's work on the metric stuff, I think is a really good path forward. And that's the talk, are there any questions? Yeah, so the question was, web developers are very, very busy, particularly in modern teams, like my intro was talking about. So if you can just gem-install New Relic, why is something like this important? And I think I'd answer that. Actually, go ahead and add another. And then that. New Relic's fine, I hate the price, right? So what's the alternative, I take it? Yeah, so. Yeah, so the, from the purpose of this talk, I think the important thing to take away tools, the most important thing is to have some kind of instrumentation. So if literally the only thing you have time to do is pick a vendor, install the monolithic thing they give you, then go for it. I think what people are finding, and this is just because I don't, other than say New Relic has a really cool product, I don't want to say anything specific about them. I think it's very important to have custom instrumentation and to have those metrics be first class, right? And if you look at, I know the Yammer guys, if, you know, when Kota was talking about his, and even for us, is we don't ship features without adding the counts and rates specific to those features, those controller actions, right, and those are first level metrics for us, and including their performance. And I don't really look at ever app level stuff, because I think app level stuff is, it's easy to get going, but it's the payoff, right? And so part of, I guess, what I'm claiming is that if we, the problem is not that it's hard to do this, the problem is that right now all the instrumentation things are forcing you to make decisions about almost all the other layers of the stack, just to get started with instrumentation. And so there's like this paralysis by analysis and no way to get cohesion. So if we pull out and really distill instrumentation down to just instrumentation and push everything else out, it should be fairly easy to get done. I guess my point would be that I agree with everything you're saying, but I would like my rails stack implanted all the last time I had on my request taking, I don't have to get trails for instrumenting myself. Yeah, so you should look, if you haven't seen active support notifications, I didn't mention that, I should have. The question was like, how can rails just instrument itself? I, you should check out active support notifications. That is just a PubSub event bus that emits things about every request, but you could hook that up to this with about 10 lines of code and you'd be done. So that's probably how I should answer that. Hi, I'm the, this is John. John, come forward. Hello, who's it? I'm John Graham, I'm the current maintainer of the New Relic region. And I think that this stuff is probably awesome. I think that it, re-implementing the Ruby agent to use these kind of instrumentation primitives would be fantastic so that like framework developers and what I can use, these are, or combination of these active support notifications and then you would just be able to choose which repo you want to use for your instrumentation and it would just sort of work automatically. So it may not be exciting immediately for application developers because there are a lot of solutions you don't want to go through and like use in your framework for instrumenting that doesn't actually report anywhere yet, but it's super exciting to me and you, and those sorts of writing these instrumentation aggregation points. Yeah, yeah, I mean I think what's important also note is if there's even something that approaches standardization, I mean you can bet that vendors such as New Relic and ourselves as well as whoever new is coming up will be ready to consume that and give you all the tool and you need. And it's gonna minimize your lock-in because you're not writing to a proprietary agent and it's gonna increase competition which to your pricing question may bring pricing down, that happens sometimes. I, right, we're all hackers here so, but I've heard that happen sometimes. So what's the line here between like sort of custom instrumentation that you need to more or less litter your code with and that which you can extract from things like your Rails log request logs or whatever? Yeah, I mean specifically to Rails, hooking up active support notifications on 3 Plus which is where the logs come from to this is exactly what should happen. So Rails has already littered their code with instrumentation that just needs to be basically subscribed to by this because their instrumentation is comprehensive but it's very simple, it's just this event happened but you can use this to then collapse those events into this is the distribution of those events, this is the rate of those events. And then so where you would use your instrumentation I think is more, if you have a section of code you believe is gonna be hot. For instance, one of the things in our services you can generate a snapshot of any graph which we basically, because we do JavaScript graphing we do in a headless browser in a background job. So, and then we push up to S3. So I've got timings wrapped around how long did it take to fire up and render the graph in the background job? How long did it take to snapshot it and how long did it take to push that up to S3? So in the areas where I think there could be problems I'm going to, just like with unit tests I'm going to selectively put in instrumentation where I think it's important. And part of that is just, as a developers it's kind of, and the domain experts we need to make reasoned decisions about because you can't, every other line of code you can't just be like, well I just this line did this, this line did this, this line did this. So it's somewhat of an art. Okay, unless there's any other questions just if, oh, sorry. Kind of long those lines about it. If you had a string of web services this would say there were Rails web services and you had a JRuby app or something like that running on the front end. So you're way to kind of string all those together even using Rails notification which and maybe with a broad over, you really be able to report on all of that request all the way through the back, that kind of deal. Yeah, so generally speaking you're getting a little into the weeds but if you want to track, if you've got like cross-service dependencies and you want to track like a given request through that I mean I would just start by if there's like four chunks just measure the chunks in isolation right to start with that. And then if you really, because if you look and you say okay of these four chunks these ones all average four or five milliseconds but this middle service averages two seconds well that's really all you need to know is that that's the part that has to be fixed. If you truly want to track a single request or transaction end-to-end you typically need to use some type of ID as like a tag on the metric. And I think tracers are something like that. But generally speaking you can get away with measuring the services in isolation and then comparing the metrics across them. All right, so yeah if anyone wants to talk more about this GitHub repo grab me in the hallway anytime I'll be wearing I have several of these so don't worry if you see me in one of these it's not the same one but come say hi, so thank you very much.