 All right, so today we're going to talk about how to performance. This talk is going to focus on tools and techniques to help speed up your Ruby code. The things we're going to talk about today came out of work that I did to make integration tests faster. I'm Eileen Yuchitel, and I'm a programmer at Basecamp, formerly known as 37 Signals. I'm on the security and performance team, which means I make sure that things run fast and smoothly, safely as well. You can find me anywhere online at Eileen codes. I'm also on the Rails committers team, which means I'm allowed to push directly to master and on the Rails security team. I'm not on Rails core, which seems to mean that I'm allowed to tweet from the Twitter account. I'm not sure if that makes you feel more or less secure. This is my dog, Aria. She is named after Aria Stark from Game of Thrones. She has her own Twitter account at AriaDog, and she'd love it if you followed her. Be warned though, she likes to talk about poop a lot. I'm from upstate New York from a city called Kingston. It's about two hours north of New York City. It was the first capital of New York in 1777, but then the British burned it down twice, so they moved it to Albany because we can't be trusted with it. A fun fact, I'm not actually the only speaker from Kingston, New York. Aaron Quint literally lives on the same street as me, and he's speaking after lunch, so. Kingston's representin' hard. I'm really happy to be speaking at Fullstack Fest this year. This is actually the first European conference that I've spoken at. Barcelona's a lovely city, and when I told my coworkers that I was going to speak at Fullstack, they asked me if I was going to be talking about pancakes. They were pretty disappointed that I was not. And good thing about before lunch, so that you can all be really hungry after looking at this picture. So before we can talk about speeding up integration tests, we need to go over the difference between benchmarking and profiling. Benchmarking measures how slow our code is. This is all about collecting data on where your code currently stands. Benchmarking tools are also instrumental in proving that you made your code faster. And profiling tells us what parts of our code are slow. We can use profiling to pinpoint the exact part of our code that's slow. Improving performance is a lot harder, if not impossible, if you don't use these types of tools together. So let's get started. Back in January, DHH said to me, hey, I like fast things like race cars, and I really hate it when things are slow. I want to move BaseCAN's controller tests to integration tests, but right now they're 20% to 40% slower than controller tests, and that's just not going to do. Can you see what you can do about speeding up integration tests so they're as fast as race cars? Of course I said yes, because this is a good opportunity to improve for everybody. If we're going to improve performance, what's the first thing that we have to do? We need to have a baseline. This gives us something to measure against so that we know that we're actually improving our code speed instead of just thinking or hoping that we are. To get a good baseline, I decided it would be a lot easier to not change the BaseCAN code and rather build an application specifically for testing integration test speed. Building an application has a few benefits. It's public and it's on GitHub, so all of you can download it and play with it and see what we did to make them faster. And it removes anything that can be getting in the way of helping me build that baseline, like setup code, authentication, copy-bearer and all that crazy stuff. Of course a bare test, like a Hello World example, isn't going to show you where everything is slow, but you have to start with the bare minimum and build up from there. Controller tests and integration tests don't look that different. Both define the test, process a route and assert something about the response. Here the index tests for both are getting the index and asserting the response was a success. To build a baseline, I started Ruby 215 and Rails 420 because that's what BaseCamp was using at the time. I wanted to be sure integration tests were slow and isolation using the same technology as BaseCamp. First I double checked DHH's assertion that integration tests were 20 to 40% slower than controller tests was actually true. He was basing this off the time they took to execute, so I first looked at that knowing it wasn't going to be my real baseline. I ran the single controller tests that we looked at earlier and found that it took 1.625 seconds of real time. I then ran the corresponding integration tests which took 2.12 seconds of real time. This calculated to integration tests being about 23% slower than controller tests. It's actually a huge difference. It would make running your entire test suite on integration tests quite a lot slower. I decided to run each of the tests five more times and got sort of interesting results. The time a single test took to run varied a lot and after a few runs, it looks like there's no slowdown at all and at the same speed, but I knew that this wasn't true. It points to the fact that time is not a good baseline for this type of performance work. It's good for getting the initial is it slower but it's not great for actually benchmarking. This is because using time only runs one iteration and there are a lot of factors that play like garbage, garbage collection, memory, caching and all that stuff affects each individual run. To use time, we'd have to calculate the average of all the runs and also figure out how many runs we need to do to get a good average. Time wasn't a good baseline because it varies a lot and it would have required a lot more work than I was willing to do to prove it. I was making it faster. So if time isn't a good baseline, how do we get a baseline? We can't start improving our code without one because we don't know where we currently stand. A good gem for benchmarking your Ruby code is benchmark IPs. It was written by Evan Phoenix and the gem measures the number of times the specified code can run in one second. This means that the more iterations, the faster your code is. It completely takes the guesswork out of figuring out, there's like notes for me on the screen telling me to move over, but then I can't see my notes. So that's not going to work. Sorry. So benchmark IPs means that the number you can figure out like your code is faster because you can run more iterations. So you can run a block of code like 20 times versus five times. It really takes the guesswork out of figuring out the number of runs it takes to get a good average. It also provides the percent of standard deviation so that you know how far spread out your results are from that average. Benchmark IPs allow you to focus on the results rather than the data points when benchmarking your code. Here's an example of the benchmark IP script I use to measure the difference between integration test and a controller test. First we define the controller test. It's a single get, a simple get with a, and we assert the response was a success. Then we do the same for the integration test, get the index route and assert the response was a success. Then invoke the benchmark IPs gem. Inside the benchmark block, we report on each of the tests. Using many tests, run one method. We can call each of the tests inside the code and label them integration test and the controller test so they're easy to recognize. And then we use the compare bang method from the benchmark IPs gem to compare how many times slower integration tests are than controller tests. This method is great because it does the comparison math for you. We can then run this like any other test and it tells us that we were able to get 393 iterations of integration test, give or take 7%, 8%. And that we were able to run 920 controller tests and give or take 7% in the same one second so a lot more. The data shows that integration tests are 2.34 times slower than controller tests. That's a huge difference and running that on your app would make you sad. Now that I had proved that integration tests were in fact definitely slower than controller tests and I had nothing to do specifically with Basecamp's code, I wanted to be sure that this was still true with Rails Master and Ruby, the latest Ruby which at this time was Ruby 220. Improvements to speed are being made all the time so it's a really good idea to be sure that your problems still exist on the latest versions because you don't really want to redo work that other people have already done. I ran the benchmark script again with Ruby 220 and Rails Master and they were about the same, a little bit slower but nothing really that we need to be concerned about. If you check the iterations, they're basically the same and have a slightly higher percent of standard deviation. With Ruby 220 and Rails Master, integration tests were 2.53 times slower. Here the integration tests are in pink and controller tests are in black. When we compare the difference between Rails 4.2 and Rails Master, the two versions were really close in the number of iterations but the newest Ruby and Rails was a little bit slower. For integration tests, we're getting about 395 iterations in one second and getting 998 iterations of controller tests on Rails Master. This demonstrates really how much slower integration tests are than controller tests. We weren't seeing this when we were using time as a baseline. So now we have a solid baseline and we can start improving our code. Having this benchmark script means that we can be confident that we improve the speed of our code and prove that to others as well. The next step is to find out what's slowing down our code. We can't just go changing things without knowing what the culprits are. The first gem that I used to profile the integration test stack trace was Ruby Prof. Ruby Prof requires Ruby 193 or higher and it's faster than other Ruby Profilers because it's a C extension. Ruby Prof has been around a lot longer than other Profilers first being released eight years ago. This is the first script I wrote to profile the integration tests and find what was slowing down Rails. To write this I followed along with the first Ruby Prof example on GitHub as one does when they first start using a new gem. First I set up the documents integration test so I could specifically profile the integration test stack trace. Here I'm using the Ruby Prof API to run the code, run the index test in a block. This will capture the integration test stack trace and then finally set the printer to the default flat printer and output to standard out. And so then you can just run the Ruby Prof script in the terminal and it returns the following output. Ruby Prof has given us a lot of information and I'm not sure about you but when I first saw this I was pretty confused. I wasn't sure what I was even looking at and I had no idea how to use this to speed up integration tests. It felt like some kind of secret code telling me where the slowest methods were. Let's break down what we're looking at. First we have the percentage of time spent in this method. This is calculated from time spent in the method divided by total time. Then we have the total time spent in this method and it's children. Next we have time spent in this particular method. Then we have the wait time spent in this method, the time spent in this method's children. The number of times this method was called. And finally the method name that was called. Unfortunately even after breaking this down I wasn't entirely sure what the data was telling me. I felt overwhelmed and frustrated. I started reading the documentation more closely and playing with available options. There are a lot of different printers that you can use with Ruby Prof so I encourage you to experiment with all of them. First I found a new type of printer called Graph HTML Printer. There are no screenshots of these or how they work so I just was doing trial and error. Instead of printing the standard out this will print to a specified file and if we open that in our browser we can see that this happens. It's a bit more readable but it's still kind of confusing. And if you just look at it without really reading it it kind of seems like we're global no method is a problem. We're spending 100% of our time there. But that's probably not the case. And a better way to demonstrate why that's not the case is using the call stack printer. This was the easiest output to understand. When it runs it creates a visual display of the stack where your code is spending the most time. And if you look closer you can see that it starts at the top with 100% global no method. And this makes it clear that the reason we see that in the other one is because this is just the first thing that's called. You're always going to spend 100% of your time in the first thing that's called. So what we want to focus on are the parts of the call stack that jut out with higher percentages like this one. It shows that we're spending 12.56% of our time in action dispatch runner reset. That's not part of that first main cascade. Like it sticks out farther which means we're spending more time there than we need to be. And this is a good place to start looking for problems in your code. So Ruby Prof is a great tool for showing you the big picture but at the same time the data can feel incredibly overwhelming and frustrating if you're not familiar with how to use it. There are a lot of options that come with Ruby Prof so it's worth trying them all out. Overall, Ruby Prof was most useful for finding the differences between the stacks for controller and integration tests. In the end we didn't actually use it to improve the speed of the code. Once I had these results I reached out to members of the Rails core team to give me some insight into the data I had collected. I got a lot of feedback and at this point Aaron Patterson chimes in and says, oh hey, speeding up integration tests sounds like a really great idea for my RailsConf talk. Let's work together on this so that I can use your code in my slides. So the first tool that Aaron introduced me to was StackProf. StackProf is written by Amon Gupta and is a Ruby profiler like Ruby Prof but it takes a sampling of the call stacks so you can focus on the biggest problems and slow downs in your code. Let's take a look at how to set this up. Here's an example of the StackProf profiler code we used. Again we add the integration test code like we did in previous examples. Then we call StackProf and set a file for it to output to you. Inside of the block we run the mini test method 3,000 times. We need to run it so many times so we can get a real sample of our Ruby code. And we can run this like we did before and since it's outputting to a file we're gonna need to open the specified file after the script runs to view StackProf output type StackProf dash dash text and the path to the dump file named in your script. This will show you the sample profile of your Ruby code. And this seems a bit confusing at first. It's similar to what we saw in Ruby Prof which is all this text. But the stack is sorted by time spent in each frame at the top so you know that the top is gonna be your biggest problem method. And if we look at each column represents the first one is the total number of samples where this frame was in the stack. Next is the percent of time all samples spent in this frame. Then StackProf shows the number of samples where this was the top most frame. Then we have the percent of time where this method was the top most frame. And lastly it shows the method that was called in this frame. Since StackProf output is sorted by most time spent at the top we can focus on the top line which says we're spending 53% of our time in many tests runnable on signal. Let's take a look at that on signal method. So basically this method is just yielding. So if we wanna make sure the integration tests really are that slow like that method is actually slowing us down that much we can just delete all the code and then just yield and that's it. So there's no way that this is going to be so there's a slowness in our code because if it was another method that was yielding to there that's the method that we would see as spending the most time there. So we know that this is the problem. And so if we run the StackProf script again we can see that many tests on signal is no longer in our stack and the new slowest code is search profile at 14.7%. So we're done. This is many test problem now. So let's open an issue and we'll grab a beer and go work on something else. Oh, but we forgot to do something. We forgot to follow the number one rule of verifying that we improve performance. I call it ABB, always be benchmarking. Every time you make a change to improve performance you have to benchmark your code to make sure you made it faster. So let's just be sure let's run that benchmark script again with the change that we made to many tests run on signal. And we should see a massive improvement. Now there are not spending 53% of our time in this in many tests. And when we run this we can see that, huh, it's still 2.53 times slower. It hasn't improved at all. So after a little bit of research and consulting the creator of StackProf we found out that there's a little bug in OSX that causes CPU time to be completely inaccurate and not correctly profile the code. So this is something to remember if you're using StackProf or any other profile that defaults to CPU time, Linux doesn't have this issue so you're free to use CPU mode with StackProf and that OS, just don't use it on OSX. So we can simply resolve this by using wall time instead. So if we change the mode in our StackProf script from CPU to wall time we get accurate results that tell us where the real problems are. And that problem was not in many tests. It's not even represented in this stack sample. StackProf and wall time actually pointed to our regression in the delegate method where we're spending 28.3% of our time. Aaron changed the delegated methods to be in route set to be written out and so if we change that and we run the StackProf script again we can see that module delegate is no longer in the list and the slowest sample we have now is only 4% of time spent in per thread registry. Now we need to follow our number one rule of always be benchmarking. So if we run the benchmark script again we can see that integration tests went for being 2.53 times slower to 1.4 times slower. That's quite an improvement. We're doing a lot better now. Here we can see that we lost some iterations in the controller tests but the integration tests have gained about 200 more iterations per second. We can see that we're definitely improving the speed of integration tests and slowly closing the gap between those two. StackProf is really great in helping point out slowdowns in your code quickly because it sorts by the slowest profile rather than giving you all of the information at once. But an issue that we had with StackProf was that it was hard to pinpoint problems when methods were dynamically created or modules were anonymous. In fact, we wouldn't even have known there was a problem with an anonymous module if it weren't for Ruby VM's stat method. Ruby VM stat is a method in the Ruby Virtual Machine that provides information about the current method cache. This is useful for being able to see if your method cache is being busted. Method cache is important to the performance of Ruby code because creating a new global as a constant is very expensive. Here's an example of the script that we used to track the global and constant state of objects in our tests. First, we add the same integration tests for the index method like we've done with previous examples. Then we output the Ruby VM stats before we run the test, run the test again and output the Ruby VM stats after the test runs. And if we run the test with the Ruby VM code we get the following output. What we wanna focus on is the global constant state which tells us that before the test ran we had 7,823 constants but after we had 7,824. And this means that we're creating a new constant method while running the single integration test. So if we ran this test five more times we'd have five more new constants. Creating constants while running the test is expensive because each time when we bust that cache Ruby has to resolve the constant again every single time. We found that the method that was creating an extra constant was URL helpers. Now finding that method took a lot of digging. We couldn't rely on stock prof or standard debugging tools to find it because the module is anonymous. We had to go spelunking and the Rails internal is to find it. And there it is, module.du do. And I looked back in the githist tree and found this method used to be memoized. So I was like, just add the cache back. Why would you remove it? The diff is much larger than this but you get the idea. URL helpers is now memoized and we should see speed improvements in our benchmarks. After caching the URL helpers we ran the benchmark script again and saw a minor improvement integration test from being 1.4 times slower to 1.36 times slower and we saw an improvement in the controller test as well. But Aaron pointed out there's a better way to fix this besides caching URL helpers only like that. When working with performance it's really easy to think that you should just cache all the things because caching's just gonna fix all your problems. But caching isn't a free solution to all your performance problems. It comes with a cost and if you're able to speed up your code without caching you should do that first and then cache later. You don't want you putting a bandaid on problems that you have in your code that you could fix easily. Optimize first, cache second. The problem was that URL helpers was being called when it wasn't necessary. So integration test didn't always need URL helpers when it was being called and initialized. So the better way to fix the problem is to move it where it's needed rather than caching it. So this diff is large but if we look closer we can see that removing the URL helpers include from when the integration session is initialized to only be called when it's reset. Now that we've done that let's run our Ruby VM script again to check the global method cache and we can see the global constant state remains the same at 7,822 before and after the test runs. So if we run our back to our script again we can see that this improved integration test even more from when we cached them. When we cached URL helpers we were at 1.36 times slower but after this change we were at 1.12 times slower than controller tests. This proves how expensive busting a global constant method can be. This is a massive improvement for a fix it seems kind of minor. Ruby Proff, Stack Proff and Ruby VM were instrumental in helping us find the largest problems in Rails source that were causing slowdowns and integration tests. Of course we made a lot more improvements that I didn't demonstrate today but we were starting to get to the point where we were making micro improvements to performance and we weren't seeing those big jumps from like 2.53 to 1.4. One thing we noticed after removing the largest culprits was that we were spending a lot of time in garbage collection 11.13%. This is because we were creating a lot of objects and you need to be garbage collected and to figure out why we turned to allocation tracer. Allocation tracer was written by Koichi and is a tool for finding how many objects are being allocated by Ruby and where they are in your code. Allocation tracer focuses on a specific problem in your Ruby code and helps narrow down unnecessary allocations of objects. Here's an example of the allocation tracer script we used to track allocations in the integration tests. To use allocation tracer we need to include object space then we define the integration test as we've done in the past scripts. We then set up allocation tracer and call path line and type. This allows us to define the information that we want from allocation tracer's output. Then we assign the tracer result. Again we run the integration, we run the test 3,000 times to get an accurate sample of allocations and finally we resort the result by the number of allocations. Reverse puts the highest counts to the top and we return only the first five because we don't need all the allocated objects just the top five highest offenders. And then we run the allocation tracer script like we've done with the past scripts and the output is a little bit confusing at first but once you get the hang of the data you're looking at it makes a lot of sense. This shows us the path to the file that allocated the objects, the line number where the objects were allocated. Next is the type of object that was allocated. This can be a string, array, struct, hash, et cetera. The next array gives us the total count of objects created, the old count of objects which are objects that haven't been garbage collected yet, the total age of the objects, the minima age of the objects, the maximum age of the objects and the total memory size. Specifically we found that a lot of the object allocation problems were in rack and not rails. At least the top five biggest offenders were when we first ran this. So one of the ways in which we reduced time spent in garbage collection in rack was to reduce the number of string objects being created. A simple way to do this is to freeze certain strings. After each one of these improvements we check our allocations again and here we don't need to be benchmarking each change because we're not specifically trying to improve the speed of our code but rather reduce the time spent in garbage collection which should reduce the time our code takes to run because then we're not wasting our time in garbage collection. In this run we can see that rails has replaced three of the rack allocations in our top five. This was from changes where we first used the strings and the reason that we froze the strings was because when you assign a string key to a hash the string is copied which creates more objects. Freezing prevents this but it also makes the string immutable. Now before you declare that winter is coming and that you need to freeze all the strings you must first prove that string allocations are a problem in your code. You don't want to prematurely optimize this code because you're making the strings immutable. This isn't necessary if it's not slowing down your code and not good if the string needs to change. This should only be done if absolutely necessary so I hope I don't see any of you send a giant pull request to rails with all the strings frozen telling them I said to do it. You have to first prove that the string is a bottleneck before freezing it. Allocation Tracer was a great tool for finding where objects were being unnecessarily allocated in rails and where GC was wasting our time. We unfortunately didn't see much improvement from this but it's also because these were micro improvements in performance which is tedious. Comparing integration tests to before we started to where we ended up were almost faster than the original controller tests. Originally we could run 393 iterations of integration tests now we can run 828 iterations of integration tests. It's a huge difference and calculates the integration tests being almost three times faster in current rails than they were in rails 4.2. In fact a lot of these changes didn't alter behavior and were back ported so you actually can get them in the newest rails 4.2 and you don't have to wait for rails 5 to take advantage of these optimizations. I hope this helped you better understand your tools and how to improve performance. Be sure to know the advantages and disadvantages of the tools that you're using. You need to understand the limitations so you don't end up frustrated with your tools. No tool can do everything so be sure to use multiple tools. We were only able to speed up integration tests by combining benchmarking P's Ruby Prof, Stack Prof, and the Ruby VM and allocation tracer. And of course, always beat benchmarking. It will help you be confident that you're actually speeding up your code and able to defend it, the changes that you're making. It may even help you find a bug in your profiler. I want to say thank you to a few people who helped me with this work. First, thank you to Aaron Patterson for working with me on speeding up integration tests. I know there was some concern that he was serious when he said he stole his code at his RailsConf talk but it was truly a joint effort. And thank you to all the creators of Gems we used and thank you to FullStack Fest for having me speak here today and thanks to all of you for listening.