 I hope you enjoyed what just happened before this. Today we're gonna talk about how to performance. This talk is going to focus on tools and techniques to help speed up your Ruby code. The things we're gonna talk about today came out of work that I did with Aaron to speed up integration tests in Rails. I'm Eileen Yuchitel and I'm a programmer at Basecamp, formerly known as 37Signals. I'm on the security and performance team so I make sure things run fast and safely. And you can find me anywhere online at EileenCodes. I'm also on the Rails committers team, which means I'm allowed to push directly to master and on the Rails security team. I'm not on Rails core, which seems to mean I'm not allowed to tweet from the Rails Twitter account and I'm not sure if that makes you feel more or less secure. This is my dog, Aria. She is, of course, named after Aria Stark from Game of Thrones. She has her own Twitter account at AriaDog and she would love it if you followed her. Be warned though she likes to talk about poop a lot. I'm from upstate New York from a city called Kingston. It was the first capital of New York in 1777 but then the British burned it down twice. And so they moved the capital to Albany because we couldn't be trusted with it. Where's my British? Really happy to be speaking at RubyConf this year. Not only is this my first time speaking at RubyConf, this is my first RubyConf ever. I've always, I always like to mention something that I find interesting about the city that I'm currently speaking in and since this is my first time in San Antonio, I think it's really interesting that you have this river walk thing where you have a really, really narrow walkway with a river in the middle and you have to cross bridges to get to the other side to go to the restaurants you actually wanna go to. Do people fall in a lot? So perhaps you're wondering why we wanted to make integration tests faster other than the obvious faster is better and why I'm giving this talk. While integration tests and controller tests both process are out and make assertions about the response, they are two completely separate code bases in Rails. We figured that if we make integration tests as fast as controller tests, then we can just combine the two code bases. If we didn't speed it up, first, everybody would notice their tests, we suddenly got slower in Rails five and then they would complain to me. This has the added bonus of when we consolidated the code bases, nobody has to change their code. There's no deprecation notice, you can just continue writing tests as you were, and they, but for us, it's faster. The purpose of this talk is to show you the techniques and tools that Aaron and I used to speed up integration tests. We're going to go over what worked and what didn't and ultimately how successful we were. Before we can talk about speeding up integration tests, we first need to go over the difference between benchmarking and profiling. Benchmarking measures how slow our code is. This is all about collecting data on where your code currently stands. Benchmarking tools are also instrumental in proving we made our code faster. We can use benchmarking tools to measure our code before and after we make changes so that we can be sure that we've improved the speed. Profiling tells us what parts of our code are slow. We can use profiling to pinpoint the exact methods that are slowing down our code. Improving performance is a lot harder if not impossible if you don't use these tool tools together. Let's get started. So back in January, David said to me, hey, I like fast things like race cars when things are slow. I want to move base cans controller test to integration tests, but right now they're 20 to 40% slower than controller tests and that's just not going to do. Can you see what you can do about speeding up integration tests in Rails? So they're as fast as race cars? Of course I said yes because this was a good opportunity to improve Rails for everyone. So if we're going to improve performance, what's the first thing we need to do? We need to get a baseline. This gives us something to measure against so that we know we're actually improving the speed of our code instead of just thinking or hoping we are. To get a good baseline, I decided that it would be a lot easier to not change base camp code and rather build an application specifically for testing the speed of integration tests. Building a new application has many benefits. It's public on GitHub so you can see all the different kinds of tests I wrote to measure the integration test performance and to remove anything that can be getting in the way of helping me build a solid baseline like login code and all that other stuff because that's part of your application not part of the integration test code. Many benefits. Of course, hello world examples won't show me where all of the slowdown is but you need to start with the bare minimum and build up from there to find the underlying issues. Controller tests and integration tests don't look that different. They both define a test, process a route and assert something about the response. Here the index tests for both are getting the index route and asserting the response was a 200. That's it. To build a baseline, I started with Ruby 215 and Rails 420 because that's what base camp was using at the time. I wanted to be sure that integration tests were slow in isolation using the same technologies base camp. First I double checked that David's assertion that the integration tests were 20 to 40% slower than controller tests was true. He was basing this off the time they took to execute so I first looked at that knowing that wasn't going to be my real baseline. I ran the single controller tests that we looked at earlier and found that it took 1.625 seconds of real time. I ran the corresponding integration tests and it took 2.12 seconds of real time. This calculated to integration tests between 23% slower than controller tests. That's actually a huge difference that would make running your entire test suite on integration tests only quite a bit slower. I then decided during each of those tests five more times I think that's some interesting results. The time a single test took to run varied a lot and after a few runs appeared that there was no difference in speed between controller tests and integration tests. But I knew that this was false and it points to the fact that time is not a good baseline for the type of performance work that needed to be done here. This is because using time only runs one iteration. There are a lot of factors at play like garbage collection, memory and caching. To use time we'd have to calculate the average of all of the runs and figure out how many runs got a good average. Time's not a good baseline because it varies a lot and would have required a lot more work than I was willing to do to prove that I make it faster. If time isn't a good baseline then how do we get a baseline? We can't start improving our code without one because we don't know where we stand. A good gem for benchmarking a Ruby code is benchmark IPS. It was written by Evan Phoenix and the gem measures the number of times the specified code can run in one second. That means the more iterations the faster your code is. It completely takes the guesswork out of figuring out the number of runs you need to do to get a good average. It also provides the percent of standard deviation so you know how far spread your results out are out from the average. Benchmark IPS allow you to focus on the results rather than the data points when benchmarking your code. Here's an example of the benchmark IP script I used to measure the difference in speed between integration tests and controller tests. First we define the controller test. It's a simple get and we assert the response as success. Then we do the same for the integration test. Get the index route and assert the response was a success. Then invoke the benchmark IPS gem. Inside the benchmark block we report on each of the tests. Using many tests run one method we can call each of the tests inside the code and label them integration tests and functional tests so they're easy to recognize. We then use the compare bang method from the benchmark IPS gem to compare how many times slower the integration tests are than the controller tests. I really like this method because it means I don't have to do math. We can run this like any other test and it tells us we were able to get 393 iterations of integration tests per second, give or take 8%. In comparison we were able to run 920 controller tests in one second, give or take 7%. The data shows the integration tests were 2.4 times slower than controller tests. That's a huge difference and is definitely going to be noticeable if you're running your test suite on integration tests. So now that I had proved that integration tests were in fact slower than controller tests and it had nothing to do specifically with base camps code, I wanted to double check that this was still true on Rails Master which at this time, and the latest Ruby which at this time was Ruby 220. Improvements of speed are being made all the time so it's a good idea to be sure there's still a problem on the latest versions. You don't want to be redoing work that someone has already done on Master. Around the same benchmark script again with Ruby 220 and Rails Master, they were about the same a little bit slower. If you check the iterations they're basically the same and have a slightly higher percent of standard deviation with Ruby 220 and Rails Master integration tests were 2.53 times slower than controller tests. Here integration tests are in pink and controller tests are in black. When we compare the difference between Rails 4.2 and Rails Master, the two versions were really close in the number of iterations but the newest Ruby and Rails was slightly slower. For integration tests, we are getting about 395 iterations in one second while we're getting 998 iterations of controller tests on Rails Master. This demonstrates really how much slower integration tests are than controller tests. We weren't seeing that when we were using time as a baseline. So now we have a solid baseline that we can start improving our code. Having the benchmark script means that we can be confident that we even prove the speed of our code and prove that to others as well. The next step is to find what is slowing down our code. We can't just go changing things without knowing where the culprits are. That's just gonna end up with us being really frustrated. The first gem I used to profile the integration tests and find out what was slowing down Rails was Ruby Prof. Ruby Prof requires Ruby 193 or higher and is faster than other Ruby profilers because it's a C extension and it's been around a long time, first being released eight years ago. This is the first script I wrote to profile the integration tests and find where it was slowing down Rails. To write this, I followed along with the first Ruby Prof example on GitHub as one does when they use a brand new gem. First I set up the documents integration tests so I could specifically profile the integration test stack. Here I'm using the Ruby Prof API to run the index test in a block. This will capture the integration test stack trace and finally I set the printer to the default flat printer and output to standard out. I then run the Ruby Prof script in the terminal and it returns the following output. Ruby Prof has given us a lot of information here. I'm not sure about you when I first saw this, I was pretty confused. I wasn't sure what I was even looking at and I had no idea how to use this to speed up integration tests. It felt like some kind of secret code telling me where the slowest methods were. Let's break down what we're looking at. First we have percentage of time spent in this method. This is calculated from time spent in the method divided by total time. Then we have the total time spent in this method and it's children. Next we have the time spent in this particular method. The wait time spent in this method, time spent in this method's children. The number of times this method was called and finally the method name that was called. Even after breaking this down I wasn't entirely sure what this data was telling me to do with it. I felt overwhelmed and frustrated. I started reading the documentation more closely and playing with available options in printers. There were a lot of options so if you decide to try out RubyProf I encourage you to just play with it for a while until you get data that you like. First I found a new type of printer called Graph.html printer. There are no screenshots of how these work so you have to play around a little bit with them. Instead of printing the standard out this code will print a specified file and when I opened that file in my browser this is what I saw. It's a bit more readable but still super confusing to understand where the real problems are. At first glance if you don't know how to read this it seems like global in the method is the problem because we're spending 100% of our time there but that's definitely not the case and I'll show you why. A better way to demonstrate time spent in the stack was with the call stack printer. This is by far the easiest output to understand. When run it creates a visual display of the stack where your code is spending the most time. And if we look closer at the top we can see that it starts with 100% in global in the method. This is, okay my screen went off here. This is just the beginning of our stack. You will always spend 100% of your time in the top of your stack. Always, because that's where you started. You can't spend less than 100% of time there. What we wanna focus on though are the parts of the call stack graph that jut out with higher percentages. One of them is this one which is action dispatch runner reset and we're spending 12.56% of our time there. This could be a good place to start looking for problems in our code. Ruby Prop is a great tool for showing you the big picture but at the same time the data can feel really overwhelming and frustrating if you're not familiar with how to use it. There are a lot of options that come with Ruby Prop so it's worth trying them all out. Overall, Ruby Prop is most useful for finding the differences between the code stacks for integration tests and controller tests but in the end we didn't really use it to improve the speed of our code. Once I had these results I reached out to members of the Rails core team to give me some insight into the data I had collected. I got a lot of feedback and to this point Aaron Patterson chimes in and says, oh, speeding up integration test sounds like a really great idea for my Rails Conf talk. Let's work together on this so that I can use your code in my slides. Now the first tool that Aaron introduced me to was Stack Prof. Stack Prof is written by Alman Gupta and is a Ruby profiler like Ruby Prof but it takes the sampling of the call stack so you can focus on the biggest problems and slowdowns in your code. Let's take a look at how to set this up. Here is an example of the Stack Prof profiler code we used. Again we add the integration test code like we did in previous examples. Then we call Stack Prof and set a file for it to output to. Inside the block we run many tests 3,000 times. We need to run it this many times so we can get a real sample of our Ruby code. And when we run this, we can run this like we did before and since it's outputting to a file we're gonna need to open that specific file after the script runs. So to view Stack Prof output, you can type Stack Prof dash dash text and the path of the dump file that you named in your script and this will show you the sampled profile of your Ruby code. This seems confusing at first but it's similar to what we learned in the Ruby Prof section but the stack is sorted by time spending each frame. First we have the first column is the total number of samples where this frame was in the stack. Next is the percent of time all samples spent in this frame. Then Stack Prof shows the number of samples where this was the top most frame. Then we have the percent of time where this method was the top most frame. And lastly it shows the method that was called in this frame. Since Stack Prof output is sorted by most times spent at the top we can just focus on the first line which says we're spending 53% of our time in many tests runnable on signal. Let's take a look at that on signal method. So basically this method is just yielding. There's a lot of other stuff but in the end we just yield. So to test integration tests are really spending 53% of our time here we can just delete all the code. And so all the method is doing is yielding. There's no way that just this yield would be the source of slowness in our code because it's yielding something else so that would be where this source of slowness is. When we run the Stack Prof script again we can see that mini test on signal is no longer in our stack and the new slowest code is search for file at 14.7%. So we're done. This is mini test problem now. So let's open an issue, grab a beer and go work on something else. But we forgot to do something. Something very, very, very important. We need to follow the number one rule of verifying that we improve performance. I call it ABB, always be benchmarking. Every time you make a change to improve performance you have to benchmark your code because otherwise you have no way to prove that you actually made it faster. So just to be sure let's run that benchmark script again the change that we made to on signal and mini test we should see a massive improvement now that we're not spending 53% of our time there. When we run it again we can see that, huh, it's still 2.53 times slower and hasn't improved at all. After a little bit of research and consulting the creator of Stack Prof we found out that there is a little bug in OS X that causes CPU time to be completely inaccurate and not correctly profile the code. This is something to remember if you use Stack Prof or any other profiler that defaults to CPU time on OS X. Linux does not have this problem so you're free to use CPU mode with Stack Prof on that OS. So we can simply resolve this by using wall time instead. If we change the mode in our Stack Prof script from CPU to wall time we get accurate results that tell us where the real problems are. And that problem wasn't in mini test it's not even represented in this stack sample. Stack Prof and wall time actually pointed to a regression in the delegate method where we were spending 28.3% of our time. We changed the delegated methods in route set to be written out. If we run the Stack Prof script again we can see that module delegate is no longer on the list and the slowest sample we now have is 4% of time spent in Perth Red Registry. Now we need to follow our number one rule of always be benchmarking. So if we run the benchmark script again we'll see the integration test went for being 2.5, 3 times slower to 1.4 times slower. That's quite an improvement. Here we can see that we actually lost some iterations in the controller test but integration tests have gained about 200 more iterations per second from where we started. We can see that we were definitely improving the speed and slowly closing the gap between the two types of tests. Stack Prof is really great in helping point out the slowdowns in your code quickly because it starts by slowest profile or rather than giving you all the information at once. But an issue that we had with Stack Prof was that it was hard to pinpoint problems that were in methods that were dynamically created or modules that were anonymous. In fact you wouldn't even have known there was a problem in an anonymous module if it wasn't for Ruby VM stat method. Ruby VM stat is a method in the Ruby virtual machine that provides information about the current method cache. This is useful for being able to see if your method cache is getting busted between runs. Method cache is important to the performance of Ruby because creating new globals and constants is expensive. Here's an example of the script we used to track the global and constant state of the objects in our test. First we add the same integration test and index method that we've done with every previous example. And then we output the Ruby VM stats before we run the test and then run the test again and then output the Ruby VM stats after we run the test. And then when we run the test with the Ruby VM code we get the following output. What we wanna focus on is the global constant state which tells us that before we ran the test we had 7,823 constants but afterwards we had 7,824. That means that we created a new constant while running a single integration test. And if we run five more tests we'd have five more constants. That's not good. Creating constants while running test is expensive because each time we're busting the method cache and then Ruby needs to resolve that method again. We found that the method that was creating an extra constant was URL helpers. Now finding this method took a lot of digging because we couldn't rely on StackCraft to find it or standard debugging tools because the module is anonymous. We had to go spelunking in the Rails internals to find it. I looked back at the good history and found this method used to be memoized so I figured just add the cache back. Whatever, how hard can it be? The diff is much larger than this but you get the idea. URL helpers is now memoized and we should see some improvement in our benchmarks. And so we run the benchmark script again and we saw a minor improvement. Integration tests went from being 1.4 times slower to 1.36 times slower and saw an improvement in the controller test as well. But I haven't pointed out that there was a better way to fix this besides just caching the URL helpers. When you're working with performance it's really easy to think that you should just cache all of the things because it will just fix all of your problems magically but caching isn't a free solution to all of your performance problems. Caching comes with a cost and if you're able to speed up your code without just caching you should do that first. You don't want to put a band-aid on real problems in your code. Optimize first, cache second. The problem was that URL helpers was being called when it wasn't necessary. The integration test didn't always need URL helpers when it was being called and initialized so a better way to fix this is to move it where it is needed rather than caching it. The DIP is quite large but if we look closer we can see that this is moving URL helpers from when the integration session is created to only be called when the session is reset. Now that we've done that we can run the Ruby VM script again and to check the global method cache. And here we can see that the global constant state remains the same before and after we run the integration test. So if we run our benchmark code again we can see that this improved integration tests even more from when we cached them. When we cached URL helpers we were at 1.36 times slower but after this change integration tests are now 1.12 times slower than controller tests. This proves how expensive busting a global method cache can be. That's a massive improvement for a fix that on the surface seems kind of minor. RubyProf, StackProf, and Ruby VM were instrumental in helping us find the largest problems in Rails source that were causing slowdowns and integration tests. Of course it made a lot more improvements that I didn't demonstrate today but we were starting to get to the point where we were making micro improvements to performance. One thing that we noticed after removing the largest culprits was that we were spending a lot of time in garbage collection, 11.13%. That's because we were creating a lot of objects that needed to be cleaned up afterwards. To figure out why we turned to allocation tracer. Allocation tracer was written by Koichi and is a tool for finding how many objects are being allocated by Ruby and where they are in your code. Allocation tracer focuses on a very specific problem in your Ruby code and helps narrow down unnecessary allocations of objects. Here's an example of the allocation tracer script we used to track allocations in integration tests. To use allocation tracer we need to include object space. Then we define the index integration tests as we've done in the past scripts. We then set up allocation tracer and call path line and type. This allows us to define the information that we want from allocation tracer's output to get the path line and the type that the object was called on. Then we assign the tracer a result. Again, we run the test 3,000 times to get an accurate example of allocations. And finally we sort the result by the number of allocations, reverse puts the highest counts at the top and then we return the first five because we only want to find out what the top five offenders are, not all of the allocated objects. Then we run the allocation tracer script like we've done with the past scripts. The output is a bit confusing at first, but once you get the hang of the data you're looking at, it makes a lot of sense. This shows us the path to the file allocated in the objects. Then the line number where the objects were allocated. The next is the type of object that was allocated. This could be a string, array, struct, hash, all sorts of things. The next array gives us the total count of objects created, the old count of objects, which are objects that haven't been garbage collected yet, the total age of the objects, the minimum age of the objects, the maximum age of the objects, and the total memory time. Specifically, we found that a lot of object allocation problems were in rack and not rails, at least the top five biggest offenders were. One of the ways in which we reduced the time spent in garbage collection in rack was to reduce the number of string objects being created. A simple way to do that was to free certain strings. After each one of these improvements we check our allocations. Now here we don't need to be benchmarking because we're not trying to specifically improve the speed of our code, but rather reduce time spent in garbage collection which will speed up the time of the code because we're not spending our time wasted in garbage collection. In this run we can see that rails has replaced three of the rack allocations in our top five. This is from changes where we froze strings. The reason that we needed to freeze the strings was because when you assign a string key and a hash the string is copied which creates more objects. Freezing prevents this but makes your string immutable. Now before you declare that winter is coming and that we need to freeze all of the strings, you must first prove that string allocations are a problem. You don't want to prematurely optimize this code because you're making the strings immutable. This isn't necessary if it's not slowing down your code and dangerous if the string needs to change. This should only be done if absolutely necessary so I hope I don't see any of you send a giant pull request to Rails with all the strings frozen saying that I told you to do it. You absolutely must first prove the string allocation as a bottleneck before freezing it. Allocation Tracer was a great tool for finding where objects were being unnecessary allocated in Rails and where GC was wasting our time. We unfortunately didn't really see much speed improvement from this but it's also because these are micro performance improvements that can be kind of tedious. Comparing integration tests to before we started to where we ended up we are almost faster than the original controller tests. Originally we could only run 393 iterations of integration tests now we can run 828 iterations of integration tests. That's a huge difference in calculates to integration tests being almost three times faster in current Rails when they were in Rails 4.2.0. In fact, most of these changes didn't actually alter behavior and were back ported so they're available in the newest Rails 4.2 and you don't need to wait for Rails 5.0 to take advantage of these optimizations. I hope this helps you better understand your tools and how to improve performance. Be sure to know the advantages and disadvantages of the tools that you're using. You need to understand the limitations so you don't end up frustrated with your tools. No tool can do everything so use multiple tools. We were only able to speed up integration tests this much because we combined the usage of benchmark IPS, RubyProp, StackProp, RubyVM, and allocation tracer. And of course, always be benchmarking. It will help you be confident that you're actually spinning your code and able to defend your changes. It may even help you find a bug in your Ruby profiler. I wanna say thank you to a few people who helped me with this work. First, thank you to Aaron Patterson for working with me on speeding up integration tests. I know that there was some concern that he was serious when he said he stole my code in his Rails Conf talk but it was truly a joint effort. And thank you to all of the creators of the gems that we used and thank you to RubyProp of RubyConf, RubyProp, RubyProp, RubyConf for having me speak here today. And thanks to all of you for listening. I've got Basecamp logo and Happy Camper stickers if you, so if you want one come find me. What? Oh, sorry. Oh, yeah. He asked if there are any tools that automatically include benchmarks in your commit message or pull request. There are that I know of so if you wanna make one that might be kind of cool. But it's also because you can't... A lot of times when you are, if you're setting a pull request it might not have, it might not have every, you wouldn't be able to pinpoint the part of the code that you want to benchmark necessarily. So you add a pull request with 10 changes and only one of them really improves or breaks performance. You would need to like pinpoint each of those and measure each of those and then measure them together. And so that's why a lot of times when you send performance pull requests, they're specifically performance pull requests. And when you do that, you should include your benchmarks in your commit message so that when someone is going through and looking about why did I merge this, you don't wanna have to go back and ask for those benchmarks. Oh, the question was did I use other tools that I didn't introduce in the talk? No, not for this one. That doesn't mean that there aren't other tools that I would use for performance stuff. So if I was gonna do performance of an applications, MySQL calls, I would use mini profiler. So that's a really good tool, but that's not really relevant to this. So there are lots of other tools that I would use depending on the type of performance that I'm measuring, but for this, we didn't use anything else.