 I'm going to talk about real-world Ruby Performance. It's a topic I'm very excited about. So if I seem excited up here, it's because I'm excited. And hopefully it's the last talk of the conference for all of you. I mean, it is the last talk of the conference for you, not hopefully. And so I know you're all tired. It's been a long couple of days. Maybe you had a couple of drinks. Maybe you stayed out late talking to people. That's great. So hopefully I can bring enough energy to keep everyone awake. That's my goal. That's my only goal for this talk, is everyone stays awake. If you find yourself falling asleep, just fall asleep. Just don't start snoring. That will insult me. That's okay. But yeah, I think having the last talk on the last day is a little hard, but we'll get through it together. Okay, everybody? Cool. So first I just want to give a shout out before I start anything. There's a lot, I could literally stand up here like Wu Tang and just shout out people all day. But there are three people who worked on a number of tools and just worked on the Ruby language itself and made a lot of things possible in Ruby 2.1 that weren't possible before and are continuing to do that. So I want to give them a shout out. One Amon at GitHub, a good friend, and from that picture you can see a very handsome gentleman. Sam, who I don't think is a cartoon in real life. I've never actually met, but that's how I interact with him online. And Koichi, who's right there? Hey Koichi. Who's done amazing work. I hope everyone saw his talk on the incremental garbage collector the other day, which was pretty amazing. So I'm gonna skip the intro and not talk about myself first. I'm gonna go right into the topic and I'll come back to myself later. First I really want to say that this talk was actually pretty hard to write. I've given a lot of talks and I also just want to say to, I saw a couple people mention imposter syndrome yesterday and they see people like myself and maybe Aaron Patterson and even Sandy get up and you think that they don't look nervous, but I'm very nervous and I'm sure everyone else who gives a talk is still nervous. So if you ever feel like giving a talk, just go up there, give a talk, please share what you're doing. It's really important for the community. So I've really learned a lot about performance and Ruby performance specifically over the past couple of years in scaling a large application and there's so much to share. I didn't know what to talk about, but when I started thinking about and started thinking about the tools I wanted to build and the tools I wanted to talk about, I realized that tips and tricks are like cliff notes for tech learning and Alex mentioned this in the previous talk that you were in here, but it's more important that I teach you ideals and philosophy than snippets because I've been a mentor for the past couple of years. I've been growing this team. I've been working with a lot of junior developers and sure I could be like, oh yeah, don't use this method, it's slow or don't do this, it's slow, but it's more important for me that I teach them how to do things not and think about how to do things, not just the individual snippets. So if you walk away today, I want you to come away with a process of what I'm talking about, not the actual tools because really Ruby is moving fast, the tools are moving fast, everything in the community is moving fast as Sandy was saying, maybe none of this stuff will be around in a couple of years, but the process and the philosophy and how you approach problems will stay with you the rest of your life. So today we're gonna talk about Ruby performance as therapy. I'm in psychotherapy, I have been for the last couple of years so I know a bit about this and if you're Jewish and you have interesting parents, you're probably going through that same thing, but I want everyone to relax and open up, maybe close your eyes for a minute, don't close your eyes too long, you're gonna fall asleep and we're gonna go deep because therapy in any format is a multi-step process and we're gonna look at a couple of those steps and how they apply to not only therapy for your code but therapy for yourself as you move through the code. So step one of any therapy session is acceptance. So it's your fault. Sorry, the Ruby performance, it's your fault. Really, yes, yes it's your fault and in the words of a famous philosopher, it's not you, it's me and what I mean by that is that performance is about context and when we talk about Ruby performance or any performance, we have to talk about the context. There was a lot of conversation a couple of years ago and it's still haunting a lot of us in here when someone said Rails doesn't scale and that permeated the community and there was a lot of violence around it, like emotional violence, not actual physical violence and it's BS, it's bullshit. Why can you talk about anything? I think there's been a lot of talks about this, about context and about how we communicate with each other. How can you talk about any type of performance problem when you don't not only talk about reproducibility but the actual context that we're talking in. The way I like to think about context sometimes is how I talk to my parents about performance. So when I go to my parents and I say we shaved a couple milliseconds off of a page, they say how do you shave a page, right? But then they also ask is that good, you know? They have no concept of what a couple milliseconds means in terms of web performance and I think that's the same when you're talking to anybody about any languages, isn't Ruby specific? But when we talk about Ruby being slow or Rails being slow, we're usually looking like a stack like this. So you have Rails in a single request, this probably takes about 10 milliseconds of your time to just boot up or dispatch the request. Then you have your application and oh yeah, it's a database, your database is slow. We should probably switch to some NoSQL database because Postgres takes 20 milliseconds. Then it's our memcache, our memcache, who knows how long that's gonna take but probably are only around 10 milliseconds. So then we're left with your application which takes 250 milliseconds or whatever it is. This is all relative obviously but you can see pretty clearly, obviously this is a gross generalization but most likely the time spent in any request you're doing if you're doing web performance is in your code. So everyone together now, just let's let's let this out. It's my fault on three. One, two, three. It's my fault. Ah, that felt so good, right? Woo, all right. So step two once we've acknowledged that it's our fault is diagnosing the problem. So in this case diagnosing the problem is where did I go wrong? Not where did you go wrong? Not where did Ruby go wrong? Where did I go wrong? So in order to find out where I went wrong, we need the five M's as I like to call them as something that I just made up. Metrics, measurements, numbers and because milliseconds matter. We're already collecting, you're already collecting metrics for everything, right? Everyone's collecting metrics for everything? No? Okay, good. Because without metrics and without numbers and without justification for what we're doing, what we think is slow or what we think is fast, it's all just shooting fish in the dark and that's dangerous. I know, no, I have no idea what shooting fish in the dark is like. So the point about this is that there are tools for every single type of performance problem you can possibly have. And in Ruby 2.1, especially the tools are getting better. I'll talk a little about the specifics of that later but I just wanna say that any performance problem have theirs, there are tools for it. Step three, once we've figured out kind of where the problem is or what we think the problem is, then it's treatment, which is what are the steps to fix this problem? And the way I like to treat problems often is playing golf. And by this, I mean like VimGolf and other golf type things, it's how do you get the lowest number with the fewest number of strokes or changes or lines of code. And it's a kind of rinse and repeat methodology. Let's change this, measure, try it out, scientific method basically, and then confirm that it works or doesn't work and move on. And when I talk about treatment, I like to think about, and when I talk to my team about this, I like to think about this as a rectangle or sometimes a cube, but my keynote skills aren't good enough to draw a cube. So we have a rectangle. And what I mean by that is there are two ways we can optimize. We can optimize vertically or horizontally. I know people use vertical and horizontal in a lot of different ways, so this is gonna be really confusing to you. This is just how I think about it. So when I think about vertical, I think about a request descending through the stack. So when a request comes in, it goes through our balancers and our application and our data stores and comes back up to the user. And when I think about horizontal, those are the tiers of application or software or hardware that it touches. So if we optimize something vertically, that means fixing an individual element. So like fixing a single action, fixing a single code path, or something that only gets hit often enough. And if we optimize those, that makes our entire application faster. But also there's horizontal, which is going across hardware or software or software as a cluster. So that's something like upgrading our hardware itself, fixing some basic method that gets called for every single request, whatever it is. It's not just hardware, it's software too. So the important themes here for this therapy is context is crucial to acceptance. Visibility, and another word I like to make up and say a lot, is introspectibility. I don't think I actually made it up, but spell check didn't know what it is. So introspectibility meaning the ability to introspect our code is crucial to diagnosis. And then finally, knowing your tools is crucial to treatment. And knowing and working repeatedly with your tools is crucial to treatment. So my name is Aaron Quinn. My title is Chief Scientist. I work at a company called Paperless Post. We're an online invitations and stationary company. We're based in New York. We have one guy in San Diego here, Dan Diego, who I'm gonna shout out right now. He's right there, say hi. And if you wanna know where to get a good burrito, ask him. And we are a company that sends a lot of cards to different people, physical and digital. And over the past couple of years, we've scaled from basically zero. I've been there basically since day one to about 80 million users and using that constantly. And since day one, we've had the exact same wireless application. We've never done a big rewrite. And we've done a lot of work to improve the performance of the application. And we've split a bunch of stuff out into services. But at its core, it's the same exact wireless application that I committed two, five years ago. And because we're a business that deals with customers and sending cards, we're actually a very seasonal business, which is interesting for a startup. We're not, a lot of startups are not seasonally driven. Like you don't think Twitter, Twitter, you know, obviously is driven by big events. But I wouldn't say it's seasonally driven or you know, obviously we're not at Twitter scale. But still there's, we know when our peaks and valleys are basically gonna happen every year. So this is a graph of the past four years of how many cards get sent out daily. Obviously the numbers, the actual numbers are irrelevant. The point is you can see that around February, there's a giant spike, which is Valentine's Day. This is week over week. And then we're now in November and you can see that year over year holidays, we kinda, it's like this, whoop, up to Christmas. So everything is really stressful right now for my team and everyone involved in the company because we're waiting for actually week over week. Our traffic has doubled basically week over week over the past couple of weeks. So we're just getting ready. But even with these spikes, we're primarily a product focus team. And by product focus, I mean we care a lot about features and shipping new features for our customers. And it's really clear to me over, it became really clear to me very quickly that features and shipping features are basically an opposing force to optimization and the speed at which you can write code. That's not to say you can't write features that are fast, but the fastest way to write features is not the fastest way they will run. So the fastest that you can get them out is an opposing force to the speed at which they actually run. But we realize that actually being fast meant being stable. So the faster we could be, not just in shipping code, but in optimizing it, the faster the site would be and the more stable it could be. And this is really important because our ops team is not very big. It's doubled this year, but previously it was only four people for about two and a half years. And even though we know when these big waves of people are coming, it's still kind of like this, like I don't know, I'm trying to think of a good movie analogy, but you're literally like hammering nails into the windows or boards over the windows and just getting ready and waiting for the waves to come. Because that's basically all you can do. But all you can do in that time, I guess making the site fast is kind of like boarding up the windows. So I wanna talk about some case studies and these are some simple examples showing some simple tools that I used and actually how we went in and fixed some of these problems. So case one, and I like using keynote effects so bear with me. Case one is about JSON. And I know a lot of us use JSON in our applications as a communication protocol from different services or from the user to the client, whatever it is. In our case, we have this thing that we call the paper browser. The paper browser is basically the e-commerce part of our site. It serves all the examples of cards that people can get. They click on them, then go into a WYSIWYG design tool where they design and edit them. But this page is, all you have to really know is that it's a page that has a lot of different packages or papers as we call them. And each of these has a lot of data about it, how much it costs, what are the other features of the card, if it has a photo, if it doesn't. And each of this, all of this is injected into the page as JSON and then we render the page as in JavaScript. But that time to generate the page, actually since it's not user specific, we were able over time to actually make it pretty fast because all of these pages are shared between users so we can basically just cache them in memcache or wherever for a lot of different users. This graph is, this story is from about, a little over a year ago, the actual, again the numbers don't matter, this is just kind of an illustration that it was, we thought it was fast and actually if you looked at the individual numbers, they were really fast, but once in a while, basically one out of every 10,000 requests or a thousand requests depending on the time of day would invalidate the cache or hit some page that wasn't cached. And that page that wasn't cached was so expensive to generate that it brought up our 90th percentile and even 95th percentiles. And I know you're thinking, oh, well you can pre-cache the pages and stuff like that and we tried a lot of that but the fact is that there's, because we do search and a lot of other things, there's so many pages that a user can hit where it may or may not be cached. In the end, this is what the data kind of looks like. It's just a big nested JSON blob and we use a very simple and not novel at all but very effective way of caching these pages. So we use what we call self-expiring nested cache keys. So these keys are self-expiring. Basically it's the name of the object, the ID and then that fake UNIX timestamp in the end is when the record was updated at. So each record has to be fetched or at least its timestamp has to be fetched so that we know if we have to invalidate the cache. We actually don't have to manually invalidate the cache because if the record changes, then the timestamp changes and it fetches or provisions a new cache. The cool thing about this is that we actually, all of this is just built in to our JSON methodology so our developers don't even think about this. Everything just kind of gets cached and gets busted when it needs to or booted as we like to say. And so nested within these structures are similar structures that are shared between multiple elements and sometimes we were able to bring these up to the top level and hoist them up but sometimes we actually have to keep them nested because of the way the JavaScript works or whatever it is. But the cool thing is is even if the top level isn't cached and that's a fresh cache, it still fetches other elements of the cache from the cache or as many as it can. So for example, when it turns red, this is invalidated, partner eight and this other package are still loaded from the cache. But the problem is what happens when the whole thing isn't validated and we have designers constantly working on our site, designers meeting our actual product, not product designers but illustrators designing these cards and they're constantly adding new material or updating it. And we got to a place where even though we're only kind of booting the cache every night, we would still see some spikes here basically because our uncached performance was just such a big problem. Like it's not, I'm not exaggerating. It was really, really, really bad. So even if we booted in frequently or not at all or tried not to you or even if just a cache key was kind of flushed out because of LRU, we would see these spikes in performance. So we have a tool internally which I've talked about before and I'll break to say really quickly that I gave a talk at Goruko, Gotham Ruby comps this year that kind of went over the, in detail a lot of different performance tools for Ruby 2.1. So if you're interested in the details of how each of these things work, I'm gonna try to, I'm kind of gonna glance over them but please refer back to that talk or just talk to me. So PB Profiler is basically just a meta tool. It's a tool that we have internally and all it does is it runs a bunch of existing performance checks or tools over a method or over some piece of code and then collides them all together into one kind of like output and the tools that we have, we have this auto cache toggling so it runs the same code with the cache turned on and with it turned off. It uses benchmark and just to get a general idea of how many operations we can run and then it uses RB line prof which I would love to spend 45 minutes or actually probably have Amon come up and talk to you for 45 minutes just about that but we can't but that's a basic, really, really basic but really, really, really good line profiler for Ruby. Active support notification counts like how many SQL calls were made and then now, thanks to Sam Saffron, we use this gem called memory profiler which in Ruby 2.1 will show actual allocations per line in your code and then in the end, it produces this just a bold markdown format and the code for this is probably like 200 lines of code to actually do this. I have it up on GitHub but it's specific to us but it'd be really easy to adapt or write this to your needs but basically it just outputs this thing, you run the code and it shows you how many times it took with the cache on, how many times with it off and then what events happened in line profiler and when I ran this against this new paper generation, it showed me that it took about 162 milliseconds per paper that I was generating. So for a page that had 100 of these or in some cases, a thousand of them, obviously that's not sustainable. It took 16 seconds to generate 100 of these, which is crazy but it was really cool because instantly I just scanned down the file and saw an RB line prof. This is an RB line prof output and this is what it looks like. The one thing I'll say really, really importantly about this is these, as with a lot of profiling tools and if there's one thing you take away from this talk, this is the most important thing, are relative. The numbers are relative. These are not real numbers. This is not how it's gonna perform in production. Nothing that you run locally will ever work the same in production. That number one rule of programming. And so not only are they relative because it's local but RB line prof and a lot of these other profiling tools actually spend a lot of time calculating and collecting metrics about the code. And so they suffer from what's a classic computer science problem, the probe effect which is by measuring the code, we're actually slowing it down. But that's okay because we're running this locally and all we really care about are the relative numbers. So in these two lines that I've highlighted, basically you can see it took about 38 milliseconds for both of them and you can see that actually one is just calling the other one and this thing to generate the list of colors we're iterating over a Ruby array and doing some expensive calculation in there that we definitely shouldn't be doing. It's not memoized and it happens every single time. So obviously that was a pretty easy win and if it wasn't for RB line prof it probably would have taken us a lot of time to figure out exactly what part of the code was slow. And the cool thing about this is we just, it's a really, really simple methodology once we can get down to that level. You just make the slowest lines faster or as fast as you possibly can and you rinse and repeat. So this session looked kind of like this. So I ended up catting all the performance, different runs of it over time. You probably can't see the numbers but it's kind of slightly unimportant. The point is that this is what the output looks like and I've just catted them onto each other, made a change, ran it, made a change, ran it, made a change, ran it and eventually we got to this point where we submitted a poor request to the company and after caching a bunch of this stuff we got from 1200 milliseconds per one of these packages to 30 milliseconds and a lot of this was really, like it was our fault. It was really, really stupid, simple things of just not memoizing something after iterating over a value. It was doing some really complicated SQL for things that we only used once and things like that that eventually got us down to this kind of safe level. But in this case, even though 30 milliseconds is great, this is great depending on what you look at, this is important because it was about multiplication. This one thing was running as many times as the packages or papers on the page were running. So if we 30 times 100, that's a lot of seconds. Okay, so that was kind of an example of vertical, like I was talking about before, vertical optimization and in this case I want to talk a little bit about horizontal optimization in code. So before V-Day, and yes, we call Valentine's Day V-Day because it's that disturbing to us. We were looking for basically like any wins we could find across the code base to just speed up their performance in site. We knew that a lot of users were basically just gonna pummel us at 7 a.m. on February 14th. So what could we do? Luckily, right before Valentine's Day this year, Ruby 2.1 came out or maybe it was the end of last year but either way, Amon started working on these tools and I started collaborating with him on this tool called StackProf and StackProf Remote. And the cool thing about StackProf and StackProf Remote is where you're using active support notifications or looking at graphs, those are really only showing you what's happening in a run of action controller dispatch if you're using Rails or whatever it is. It's showing you that this action started here and stopped there and here are all the things that happened in between but maybe and very probably your Ruby process is doing a bunch of stuff in between those actions that you're not even tracking. So StackProf and StackProf Remote let us read between the lines as I like to say. So really, really quickly, how does StackProf work? StackProf is a StackProfiler based on Google's PProf or PProfiler which is a core part of Google and how they do profiling. Basically, left to right is time. We have kind of what looks like a stack and what this looks like is action controller dispatch calls my controller create which calls template render which calls AR find and we go back and up and down this stack of executing code. And it's like one thing calls another thing calls another thing calls another thing. And the way StackProf works is it uses a method that's in Ruby 2.1 to sample every x milliseconds or every stack x frames, sample what's on the stack at that given point. So you start it, the process runs, you sample, sample, sample and at the end you stop it and you get collected dump. And the cool thing about this is that because it's just sampling and RB profile frames doesn't allocate memory on the heap, it allows you to do this in production which is just like pooh, this is like a really amazing thing. You can run these profiling tools in production against real production code and it will not affect the performance of your app to a degree. Obviously, if you're constantly running this it might, but it actually doesn't suffer from the probe effect the same way RB line prof and other tools do. So at the end you get a dump that looks like this and StackProf remote just makes it a little easier by doing the work of going to the server fetching the dump for you, pulling it down and letting you navigate it in a pry session but really it's just a wrapper around StackProf to do that. So what it gives you is basically what is at the top or what is in one of those sample frames, what's in the stack at all of those points. And then you sum them up and just like other tools like the perf tools, if you ever use the Go Pprof or C++ Google Pprof it's exact same output or relatively the same output. And what it gives you is what are the functions basically that are appearing every time I look at my stack. And if we kind of add those up it gives you a pretty good idea of what your process is doing. Even though it's not every single thing that's happening if there's a thing that shows up a lot we can kind of guess that that thing is probably something that's slow or happening so frequently that's probably not a good thing. So if you look at the top of this maybe you can't read that but it's two calls to stats D. And we were like, huh, that's weird why is stats D slow? Stats D for people who don't know is the way we collect metrics you send a UDP message to a stats D server which is typically Node.js and then that sends it to graphite and so it's UDP, UDP send should be really fast. So yeah, why is that slow? So we pulled out good old benchmark and we kind of started looking around and I started looking around at the Ruby source code and UDP and the UDP class and it turns out that the UDP class if you supply it with a DNS or a host name instead of an IP address and you don't have DNS caching configured on maybe one of your devices that will look up the DNS host name every time and then try to send the message to it which is slow if that happens a thousand times every you're sending a hundred metrics or a thousand metrics per call that can add up. So it turns out that there's a connect method in UDP, a UDP connect and that opens a local socket so that you only have to send to the local socket and it doesn't have to do DNS lookup but actually this would have been just as fast or almost as fast if we just used an IP address too. So again, it is our fault. This wasn't Ruby's fault that it was slow it wasn't even StatsD's fault really. This was our fault but just looking at these numbers and running the benchmarks we noticed that just by using connect we were able to save a ton of time across our processes and this was time that was completely invisible to us before because it was between the lines it was between each of those requests but maybe a request can start or we were queuing a request because StatsD was sending and looking up all this doing all this DNS lookup. So Stack Prof is amazing and that was a big win for us. We saw some graphs go down actually in the end and but this is again, remember when I said probe effect this is the worst case of a probe effect where we're trying to measure stuff in StatsD and it's actually causing our code every single request to be slower than it should be. So on a completely different note I wanna talk about the holiday scale. So like I said, we're about to ramp up into the holidays it's actually next, this Friday is the end is code freeze we're frozen until after Thanksgiving so everyone's in a race to ship stuff. It's pretty crazy I deployed to production five minutes before this talk which felt really good and I'm sure my ops team is really happy about that too but every year we grow and we know we're growing and we kind of have a vision of what that looks like and we can do some really, really, really rough back of the envelope numbers to just see what we think we're gonna grow to. So we use EC2 for some services but we're happy to say actually that most of our nodes are in a hardware cluster in a data center in Virginia and we use vSphere to virtualize these nodes but in general it's like actual physical hardware or Postgres is on physical hardware and it's really, really fast but it wasn't fast enough and we didn't think we were gonna be able to handle the load of holidays so guess what? We bought new hardware. What a novel idea we added resources to our pool and you can see that this is that date that we added a bunch of new vSphere nodes and then we upgraded our PG server and you can see last week our traffic started going up tremendously and so things got a little slower again but in general if we hadn't added that new hardware I don't know where we would have been. So this is just a short message to say sometimes it is your fault but it's your fault for being cheap. So if you want you can just spend a little more money and get a lot more performance and this is so I mean we're obviously everyone knows we're starting to hit this maximum of how the diminishing returns of CPU power but at the same time we can buy more cores we can add more nodes and it makes it faster and especially if you can scale your stuff out in some way concurrently this makes a big deal. So sometimes you can throw money at the problem that's the message there and as an engineer and operator it's important to recognize that we have to consistently be playing this balance between the time, effort and speed and sometimes time, effort and effort equals money sometimes and speed. So finally I wanna talk about shrinking the gap. So this is another view into kind of the approach that we've been using for the past while at Paperless. So when we start thinking about vertical optimization which is optimizing a single request we wanna start making, we wanna make sure that we're actually optimizing the right thing because sometimes you could, I can make the slowest action in our site really fast but if only one person hits that a day, who cares it's not a big deal but if thousands of people are hitting not so slow action maybe that is a big deal. So not so slow meaning if a 10,000 people hit a 100 millisecond action versus one person hitting a 10,000 millisecond action then maybe those two things are not actually equal. Maybe we should optimize even though the 100 millisecond one is fast maybe we should optimize it first. So we developed a simple way of visualizing this which we call a hit list and it's not really a novel idea but basically we start with some really bullshit math. So the bullshit math is we take numbers from graphite every two hours and we take how many requests have happened for a specific controller action and we multiply it by the 90th or soon 95th percentile response time of that action and we get this completely made up number called total time but if you think about it a little bit and maybe stretch your mind and take some hallucinogenics maybe you can visualize that if there's a lot of these thousands of these 100 millisecond requests taking up a thousand times that in terms of CPU power if we have a single if you can imagine a single unicorn or a single worker trying to process those the smaller that time is the less time it'll take up across our cluster and the more requests we can get per second and the more throughput we can have. So we've visualized this as a thing called the hit list and the hit list just is literally sorted it takes those numbers it multiplies them and it gives you this visualization of these are the top requests or the hit list and if I wanted to work top to bottom on what are the things that can save me the most time and be the fastest may help us improve performance the most I would start top to bottom in this in this view but as you can see it's actually kind of interesting because the top one is this private messages controller thing which is just it has more than an order of magnitude more requests than the one below it because it's used by our iOS app and the iOS is constantly polling to see if there are new messages and the one below it is actually used a lot too but it's way slower it's an order of magnitude slower than the one before it so if I was gonna pick which one to optimize I'd probably pick the second one but you can see as we go down there may be even some bigger ones but they've only been requested a thousand times or 4,000 times or there's this events controller or calendars one and that gets requested the most out of any action on our site and but that's 20 milliseconds so it gets sorted down so I went to the top and I saw this API notifications index thing and API notifications gets hit on almost every page it's to see if you have new events and you're kind of notifications view and I was looking at it and you can see this is kind of the detail view in two ways we have some graphs on the bottom and on the top we have kind of individual log lines and then the raw numbers and when I looked at this it was interesting because it was weird because there's this big gap we have our time in the database and then there's this empty space that's not view time because it's a job of a JSON API it's not JSON generation time and it's not time spent in the cache which is where it would be spent in most of our other actions so there's this gap so I wanted to figure out what this controller was doing so here's okay gratuitous effects too here's where we can use flame graphs or stack prof flame graphs to get a better view of what's going on so this is the same tools before stack prof but stack prof has a bunch of different output formats and one of them is this beautiful flame graph it looks really cool you can show it to your mom and she still won't love you but the what we're going way too deep with this sorry, sorry, sorry so but it's the same data as we were looking at before and maybe it's a better visualization of it than just the list but what it shows you for a single request is vertically is how deep your stack call is and horizontally is time but it's not actual time it's sample so each vertical stripe is a new sample and the idea is this is a concept not invented by anybody in the Ruby world Brendan Gregg and a bunch of other amazing researchers at Sun came up with it and the idea is that you can kind of by looking at this if you optimize the very tall repeated stacks and the very wide stacks and if there's something wide and at the top of a stack that's something that's taking up a lot of time in your code and on the right you can kind of zoom in on this this is I wish I had a video of this but I don't but you can zoom in on different parts and see what each individual call is and on the right it's like what type of class or what type of object these methods are happening in so if we look at this like holistically we can see that it looks kind of like it's divided into a couple different vertical sections or horizontal sections so we have our first part which is just authentication and our before filters then we have a bunch of time in active record looking up records then we had to do some sorting and then we spent a bunch of time writing to the cache or it looks like reading and writing from the cache and then generating a bunch of JSON so it's interesting because we even though this is active record time it looks like it's actually time not just querying but building and collecting nodes and then this time at the end just shoving it into JSON so I was still kind of confused but I wanted to look deeper and I had an inkling that all of this blue here might have been unnecessary the blue is time spent in the cache so I started playing with it and I went back to the same PP Profiler tool and I ran I found a way to kind of replicate the data and run it locally and I noticed really quickly that the time with the cache on was slower than the time with the cache off and that's interesting because what was actually happening was actually the individual nodes of JSON were really really cheap to generate they were just a timestamp and a string and those are really cheap to generate but the time spent writing to the cache for each of those nodes since we have that nested caching structure was actually more expensive than the time the time generating them so maybe we shouldn't cache them so I disabled that and I went back and also saw that we were iterating over a bunch of nodes a bunch of different times which we didn't have to and joining these objects in Ruby instead of in SQL and we could have done a better job of that so I made a couple changes but since this was only like a week ago and we didn't want to ship anything huge to Prod I just tried small things at first and shipped it so we shipped it, yay, wait nothing changed and actually something did change overall we saw about a two millisecond or three millisecond dip in performance and if you look in other graphs we started seeing that there were a lot less cache reads and writes especially from this controller and that made up since there are so many actions or so many requests to this controller that actually made up a lot of time but it actually didn't do what I wanted it to do and that was kind of sad but it's okay because in the end these are kind of what I want to leave you with I wasn't really sad that this wasn't a big win even though I would have loved to see that thing just drop and then be like woo, high five everybody and peace out, get a drink but that's not how it happened and sure it's awesome to see big cliffs but at the same time it was these small changes that we've made over time and maybe there aren't these low hanging fruit there really isn't the low hanging fruit to just pick off the tree and get a big performance win anymore we've done a pretty good job of optimizing as far as we've gone but the point is that I want everyone to come away and try to think about the skill and craft and the treatment steps of performance because that's really what it's about it's not about just getting big wins every time and the tips and tricks it's about honing it into a craft and making it a practice and to me making something a practice and honing a craft is 100% about failing I've been baking bread for the past couple of years but very heavily over the past couple of months and I can't tell you if you saw my arm I have a giant burn on my arm because I burnt myself on my oven and I failed, dropped bread on the ground done a lot of stupid things but I feel like I'm getting better step by step so I didn't make a big impact on that graph I didn't do anything but I used the tools and used the methodology to do it and I'll just keep doing it over and over again until I get better finally the last thing is just I want better tools and I want not only to make them but I want to learn how to use them and I think it's amazing we're in an amazing time for operators not programmers as much as operators in the Ruby community because we're finally at a place where the Ruby core team and a bunch of people on the Ruby core team are being like, oh hey there are a lot of gigantic apps running on Ruby not only should we make this faster but let's make tools for Ruby as to use to improve the performance of their applications and the great thing is is that there is about, as Sandy Mets said too we're standing on the shoulders of giants there's about 30 or 40 years of people who have worked on performance tuning for Linux for post-it systems for programming languages that have done a great job of showing us the way of how you think about these problems just like the Brendan Gregg, the flame graph we didn't have flame graphs in Ruby to this year but that idea is almost 10 years old so there's a lot of us a lot of things for us to learn but I really, really urge everyone to go out and try to figure this out try to make better tools but also try to learn the ones that are already there so thanks everybody I'm AQ on Twitter, these are my githubs but thank you for listening and woo, RubyConf, yeah!