 Today we're going to talk about how to performance. This talk is going to focus on tools and techniques to help speed up your Ruby code. The things we're going to talk about today came out of work I did with Aaron Patterson to make integration tests faster in Rails. I'm Eileen Yuchitel and I'm a programmer at Basecamp, formerly known as 37Signals. I'm also on the Rails committers team, which means I'm allowed to push directly to master on Rails. I'm not on core, which apparently means I'm not allowed to tweet from the Rails Twitter account. You can find me anywhere online at Eileen codes. This is my dog, Arya. She is, of course, named after Arya Stark from Game of Thrones. She has her own Twitter account at AryaDog and would love it if you followed her. You should be warned, though, she loves to talk about poop. I'm from upstate New York from a city called Kingston. It's about two hours north of New York City. It was the first capital of New York State in 1777, but then the British burned it down twice and we moved it to Albany because we can't be trusted with it. It's a wonderfully walkable city near some of the best hiking in New York. You can experience some of that hiking paired with inspiring technical talks. If you come to the New Catskills Conf on October 23rd to 25th, it's going to be a wonderful weekend full of inspiration and learning in the Catskills. I'm super happy to be speaking at Garoku this year because this was the first Ruby conference that I ever attended back in 2012. I like the new logo a lot better. It's more appropriate. I'm not going to go into details with those of you who are here in 2012. I'll understand why I can't wear the t-shirt from this year. So you all remember I said I work at Basecamp. Well, DHH knows that performance is one of my favorite things, which is why I'm on the performance and infrastructure team at Basecamp. Back in January, David said to me, hey, I like fast, things like race cars, and I really hate it when things are slow. I want to move Basecamp's controller test to integration tests, but right now they're 20 to 40 percent slower than controller tests. And that's really just not going to do. Can you see what you can do about speeding up integration tests in rails so they're as fast as race cars? Of course, I said yes because this was a good opportunity to improve rails for everyone. If we're going to improve performance, what's the first thing that we need to do? We need to have a baseline. This gives you something to measure against so that you know you're actually improving speed rather than just thinking or hoping you are. So you get a good baseline. I decided it would be a lot easier to not change Basecamp's core code and build a application specifically for testing this. Building a new application has many benefits. One, it's on GitHub, so you all can see all of the scripts that I wrote to test this. And two, it removes anything that can be getting in the way of helping me build that baseline, like other gems and weird stuff. Of course, hello world examples aren't going to show where all of the slowdown is, but you need to start with the bare minimum and build up from there so that you are not changing things that are not the real problem. Controller tests and integration tests don't look that different. Both define a test, process a route, and assert something about that response. Here the index tests for both are getting the index route and asserting the response was a success. To build a baseline, I started with Ruby 215 and Rails 420 because that's what Basecamp was using at the time. I wanted to be sure that integration tests were slow and isolation using the same technology as Basecamp. First, I double checked that David's assertion that integration tests were 20 to 40% slower was true. David was basing this off the time they took to execute, so I first looked at that knowing that it wasn't going to be my real baseline. I ran the single controller test that we looked at earlier and found that it took 1.625 seconds of real time. I then ran the corresponding integration test and it took 2.120 seconds of real time. That calculated to integration tests being about 23% slower than controller tests. That's actually a huge difference and if you were going to run your entire test for your integration test, that would take a lot longer. I then decided to run each of those tests five more times and it got some interesting results. They varied a lot and after a few runs appeared there was no difference between controller tests and integration tests and I knew this to be completely false and it pointed to the fact that time is not a good baseline for the type of performance work that needs to be done. This is because using time only runs one iteration. There are a lot of factors at play like garbage collection, caching, memory. To use time, we'd have to calculate the average of all the runs and figure out how many runs got a good average. Time was not a good baseline for me because it varied a lot and would require a lot more work to prove that I made it faster. So if time isn't a good baseline, how do we get a baseline? We can't start improving our code without one. A good gem for benchmarking your Ruby code is benchmark APs. It was written by Evan Phoenix and the gem measures the number of times per second so the specified code can run in one. That means the more iterations, the faster your code. It completely takes the guesswork out of figuring out how many runs you need to do to get a good average. And it also provides this percent of standard deviation so that you know how spread out your results are from that average. Benchmark IPs allow you to focus on the results rather than the data points when benchmarking your code. Here's an example of the benchmark IP script I use to measure the difference between iteration and controller tests. First, we define the controller test. It's a simple get and we assert the response was a success. Then we do the same for the integration test. Get the index route and assert the response was a success. Then we invoke the benchmark IPs gem. Inside the benchmark block, we report on each of the tests. Using mini test run one method, we can call each of the tests individually and label them integration and functional tests so that they're easy to recognize after the run. Then we use the benchmark IPs compare bang method and to compare the number of times slower, integration tests are than the controller tests. This method is great because it does the math for you. We can then run this like any other test and it tells us that we were able to run 393 iterations of integration tests per second. You ever take 8% in comparison, we were able to run 920 controller tests in one second. You ever take 7%. The data shows that integration tests were 2.34 times slower than controller tests. It's a huge difference and is definitely going to be noticeable when you're running an entire test suite. Now that I had proved the integration tests were in fact definitely slower than controller tests and it had nothing to do specifically with base camps code, I needed to double check that it was still true on Rails master and the latest Ruby, which at the time was Ruby 2.2.0. Improvers to speed are being made all the time so it's a really good idea to be sure that it's still a problem on the latest versions because you don't want to be redoing work that someone has already done. So I ran the same benchmark script with Ruby 2.2.0 and Rails master. And they were about the same, a little bit slower, but nothing to be concerned about. If you check the iterations, they're basically the same and have a slightly higher percent of standard deviation. With Ruby 2.2.0 and Rails master integration tests were 2.5.3 times slower. Here the integration tests are in pink and controller tests are in black. We can, when we compare the difference between Rails 4.2 and master, they're pretty close in a number of iterations but the newest Ruby and Rails was a tad slower. For integration tests, we are now getting on Rails master 395 iterations in one second and 998 iterations of controller tests. This demonstrates how much slower integration tests are than controller tests. We weren't seeing this when we were using time as a baseline. So now we have a solid baseline and we can start improving our code. Having this benchmark script means that we can be confident that we improve the speed of our code and prove it to others. The next step is to find what is slowing down our code. We can't just go changing things without knowing what is actually slow. So the first gem I used to profile our code, the integration test was Ruby Prof. Ruby Prof requires Ruby 193 or higher and it's faster than other Ruby Profilers because it's a C extension. Ruby Prof has been around a lot longer than the other Profilers first being released eight years ago. The first script I wrote to profile the integration test and find where the slowdowns in Rails were was this one. To write this I followed along with the first Ruby Prof example on GitHub. First I set up the documents integration test so I could specifically profile the integration test stack. Here I'm using the Ruby Prof API to run the index test in a block. This will capture the integration test stack trace. And finally I set the printer to the default printer which is flat and output to standard out. And then when you run the Ruby Prof script in this terminal it returns the following output. Ruby Prof has given us quite a bit of information here. I don't know about you but when I first saw this I was pretty confused. I wasn't really even sure what I was looking at and had no idea how to use this to actually speed anything up or figure out what was going on. It felt like it was a secret code telling me where Ruby Slowness was. So let's break down what we're looking at. First we have the percent of time spent in this method. This is calculated from time spent in the method divided by total time. Then we have the total time spent in this method and it's children. Then we have the total time spent in this particular method. Then we have the weight time spent in this method. The time spent in this method is children. The number of times this method was called. And finally the method name that was called. Unfortunately even after breaking this down I wasn't entirely sure what to do with this. It was really overwhelming. I started reading the documentation more closely and playing with available options in printers. First I found a new type of printer called graph HTML printer. There are no screenshots of how these work so you have to do a lot of trial and error to figure out what's good, what's data that you can understand. Instead of printing the standard out I printed to a specified file which is HTML and you open it in the browser. And it's a little bit more readable but I still have no idea what this is saying. At first glance it almost seems like the global two method is your problem because you're spending 100% of your time there. That's definitely not the case and the next printer will show you why. A better way to demonstrate the time spent was to use the call stack printer. This was by far the easiest output to understand. It creates a visual type display of your stack and where your code is spending the most time. What you wanna focus on are the parts of the stack that are not part of the main straight cascade. So if you look closer at the top part where you have that straight line you can see that 100% in global no method is just the first method call. So you're always going to spend 100% of your time in the beginning of your stack. So what you need to focus on if you use this is the ones that are not part of the main stack and actually are jutting out farther. So if we look closer at this one we can see that we're spending 12.56% of our time in action dispatch runner reset and this could potentially be a good place to start looking for problems in your code. So Ruby Prop is really great for showing the big picture but at the same time the data can feel incredibly overwhelming and frustrating if you're not familiar with how to use it. There are a lot of options that come with Ruby Prop so it's worth trying them out. Overall Ruby Prop is most useful for finding the differences between the code stacks for integration tests and controller tests but in the end we didn't use it to improve the speed of the code. Once I had these results I posted to the Rails Project on Basecamp about my findings. I was hoping members of the Rails core team would give me some insight into the data I had collected. I got a lot of feedback and at this point Aaron Patterson chimes in and says hey that sounds like a really great idea for my RailsConf talk, let's work on this together so I can use your code in my slides. The first tool that Aaron introduced me to was Stack Prop. Stack Prop is written by Amaguta and is a Ruby profiler like Ruby Prop but it takes a sampling of the call stack so you can focus on the biggest problems and slow down to your code. Let's take a look at how to set this up. Here's an example of the Stack Prop code we used. Again we add the integration test code like we did in the previous examples. Then we call Stack Prop and set a file for it to output to you. Inside the block you run many tests 3,000 times. We run it so many times we can get a real sample of our Ruby code. We can run this like we did before and since it's outputting to a file we just need to open that file that we specified in the script. You can do that with Stack Prop by doing Stack Prop-S text and the path to the file, dump file. This will show you the sample profile of your Ruby code. And it seems kind of confusing at first. It's similar to what we learned in Ruby Prop section but the stack is sorted by time spent in each frame. Let's take a look at what each column represents. The first column is the total number of samples where this frame was in the stack. Next is the percent of time all samples spent in this frame then Stack Prop shows the number of samples where this was the top most frame. Then we have the percent of time where this method was the top most frame. And lastly it shows the method that was called in this frame. Stack Prop output is sorted by most time spent at the top. So we can focus on the top line where it says we're spending 53% of our time in many tests runnable on signal. And those of you who saw Aaron's Rails account talk to me where I'm going with this. Let's take a look at the on signal method. And basically this method is just yielding. So if the integration tests really are spending 53% of our time in on signal, we can just delete all the code and just make the method only yield. There's no way that only a yield would be the source of slowness in our code. So we can run Stack Prop again and many tests on signal is no longer in our stack and the newest slowest code is sorted for file at 14.7%. So we're done. This is many tests problem now. So let's open an issue, grab a beer and go work on something else. Oh, but we forgot to do something super, super important when you're doing performance. You need to follow the number one rule of verifying that you improve performance. I call it ABB, always be benchmarking. Every time you change something to improve performance you have to benchmark your code. Otherwise you have no idea that you're making it faster and it's just like having unicorns run around and they're not real. Just to be sure, let's run that benchmark script again with the change that we made to on signal and many tests. We should see a massive improvement now that we are not spending 53% of our time in many tests. And we run this again. We can see that we are still at 2.53 times slower which is exactly the same and it hasn't improved at all. At this time I was pairing remotely with Aaron and he was pretty upset too. He said it was impossible and that there was no way that this could be happening and he was right. After a little bit of research and consulting the creator of StackProf he found that there was a bug in OSX that causes CPU time to be completely 100% inaccurate and not correctly profile the code. This is something to remember if you use StackProf or any profiler that defaults to CPU time. Linux does not have this issue so you're free to use CPU time with StackProf on that OS. We can simply resolve this by using wall time instead. So if we change the mode in our StackProf script from CPU to wall time we get accurate results that tell us where the real problems are and that problem wasn't in many tests. It's not even represented in our Stack at all. StackProf and wall time actually pointed to a regression in the delegate method that we were spending 28.3% of our time there and Aaron changed the delegate methods in route set to be just ran out and if we run StackProf again we can see that module delegate is no longer in the list and the slowest sample we have now is 4%. And if we follow our number one rule of always be benchmarking we run the benchmark script again and integration tests went from being 2.5, three times slower to 1.4 times slower and that's a huge improvement. Here we can see that we actually lost some iterations in the controller test after this change but the integration tests have gained back 200 more iterations in a second. So we're definitely starting to see some improvement in integration test speed and closing the gap between the two. StackProf is really great for helping point out slowdowns in your code that quickly slowdowns quickly because it sorts by the slowest profile rather than having to search through all of the information at once. But an issue that we did have with StackProf was there was really a hard to pinpoint problems when methods were dynamically created or modules were anonymous. One such method was URL helpers and we found that it was creating and we found that URL helpers was creating an extra anonymous module. I looked back in the get history and found that this method used to be memoized so I just added the cache back. The diff is much larger than this but you get the idea. URL helpers is now cached and we should see awesome speed improvements and everyone will be happy. After caching the URL helpers we ran the benchmark script again and it saw a minor improvement. They went from being 1.4 times slower to 1.36 times slower and saw an improvement in the controller test as well. But I reported out that there's a better way to fix this besides caching URL helpers. When working with performance it's really easy to think that you should cache all the things because caching is going to fix all your problems somehow but it's actually kind of dangerous. Caching comes with a cost and if you're able to speed up your code without caching you should absolutely do that first. The problem was that URL helpers was being called when it wasn't necessary. The integration test didn't always need URL helpers when they were being called and initialized. So the better path is to move them to where they are needed rather than caching it. The diff is large but if we look closer we can see this change is taking URL helpers include from the integration session initialize and to only be called in the session is reset. Now that we've done this we need to remember to follow our number one rule. Always be benchmarking. So if you run the script again we can see that we have improved the test even more from when we cached them. We cached URL helpers we were at 1.36 times slower but after this change we are at 1.12 times slower than controller tests. Chart makes it clear that we've gained a lot of iterations on integration tests and gained back the ones that we lost on controller tests. Stack off is great for helping us find the largest problems in our Rails source that were causing the slowdowns in the integration tests. Of course we made a lot more improvements that I didn't demonstrate today but we were starting to get to the point where we were making micro improvements to performance which is super tedious. One thing that we noticed after removing the largest culprits was that we were spending a lot of time in garbage collection 11.13%. This is because we were creating a lot of objects that need to be garbage collected to figure out why we returned to allocation tracer. Allocation tracer was written by Koichi and is a tool for finding how many objects are being allocated by Ruby and where they are in your code. Allocation tracer focuses on a specific problem in your Ruby code and helps narrow down unnecessary allocations of objects. Here's an example of the allocation tracer script we used to track allocations in integration tests. To use allocation tracer we needed to include object space then we defined the index integration test as we've done in past scripts. We then set up allocation tracer and call path, line, and type. This allows us to define the information we want from allocation tracer's output. Then we assign the tracer a result. Again, we run the test, many tests for 3,000 times to get accurate example of allocations. Finally, we sort the results by counts so that we have the highest ones. Reverse puts the highest counts at the top and then we return just the first five because we don't need all the allocated objects just the top five biggest offenders. Then we run the allocation tracer script like we've done with past scripts and the output is a little bit confusing at first but once you get the hang of the data you're looking at and know what you're looking for, it makes a lot of sense. This shows us the path to the file that allocated objects. Then we have the line number where the objects were allocated. Next is the type of object that was allocated. This can be a string, a hash, struct, array. The next array gives us the total count of objects created. This is really the number that we care about. The old count of objects which are objects that haven't been garbage collected yet. The total age of objects, the minimum age of objects, the maximum age of objects, and the total memory size. Specifically we found that a lot of object allocation problems were in rack and not rails. At least the top five biggest offenders were at the time that we ran this script. One of the ways in which we reduced time I spent in garbage collection in rack was to reduce the number of strings being created. A simple way to do this is to freeze certain strings. After each of these improvements you would check our allocations. Now here we don't need to be benchmarking because we're not specifically trying to improve speed, but rather reduce time in garbage collection which should eventually speed up the code because GC is expensive. In this run we can see that rails has replaced three of the rack allocations in our top five. This is from changes mostly where we froze strings. The reason we froze the strings is because when you assign a string key to a hash the string is copied which creates more objects. Freezing prevents this but makes the string immutable. Now before you declare that winter is coming and that you need to freeze all the strings you must absolutely first prove that string allocations are a problem. You don't want to prematurely optimize this code because you're making the strings immutable. It's on that, this isn't necessary if it's not slowing down your code and dangerous if the string ever needs to change. This should only be done if it's absolutely necessary so I hope we don't see any of you send a giant floor quest to rails freezing all the strings saying I told you to do it. You absolutely must first prove that the string is a bottleneck before freezing it. Allocation tracer was a great tool for finding where objects were being unnecessary allocated in rails and where GC was wasting our time. We unfortunately didn't see much improvement from this but it's also because there are micro improvements which can be very tedious and get boring after a while. Committing integration tests to where we started to before we started to where we ended up were almost faster than the original controller tests. We closed that gap quite a bit. Originally we could only run 393 integrations of integration tests. Now we current 828 iterations of integration tests. That's a huge difference in calculus integration tests being almost three times faster in current rails than they were in rails four or two. In fact, most of these changes didn't actually alter the behavior of the code so they're available in the newest rails four or two so you don't have to wait for rails five to get these improvements. I hope this helps so you better understand how to use performance tools. Be sure you know the advantages and disadvantages of the tools that you're using. You need to understand the limitations so you don't end up frustrated with your tools. No tools can do everything so be sure to use multiple tools. We were only able to speed up integration tests because we combined benchmark IPs where we propped stack prof on allocation tracer and of course always be benchmarking. It will help you be confident that you are actually speeding up your code and able to defend it. It may even help you find a bug in your profiler. If you do ever end up sending a pull request to rails or any other project that's improved performance, make sure you include those benchmark IPs in your commit message. I want to say thank you to a few people. Thanks to Aaron for helping me with this work. I know there was some concern that he was serious when he said that he stole my code for his RailsConf talk but it was a joint effort, I promise. And thank you to all the Gem creators of the Gems that we used and thanks to Geroku for having me speak here today and thank you to all of you for listening.