 I can be behind this podium here because I'm not wearing any pants. Anyway, welcome. I'm really honored to be here. I'm happy to be the first person to welcome you to red.ruby.com. I want to say thank you to everybody for coming. Thanks to the organizers. Thank you, Winston. Thank you, all of you. Please give yourselves a round of applause. Very happy to be here. One interesting thing about being the last speaker is that the only thing between you and the party is me. So I was thinking that I would just do everything as slow as law. That's actually the longest transition Kino will do. I was also thinking maybe we would just stand around and watch animated GIFs all day. I like this one. It's pretty funny. Standing there dancing. When is that party? Sorry, my name is Aaron Patterson. I've come to you from the United States to bring freedom, whether you like it or not. Do we have any Germans here tonight? Anyone from Germany? Yeah, I heard we won. I couldn't figure out how to make any jets fly across. Anyway, I'm on the Ruby core team. I'm also on the Rails core team. And this is my first time to give a talk. I'm just kidding. So you can find me on Twitter as Tenderlove. I'm also on GitHub as Tenderlove. Instagram as Tenderlove. And also on Yo as Tenderlove. So you can yo me there. I think about weird stuff all the time for some reason. I really don't know why. Just weird stuff always comes to my mind. For example, people always say separation of concerns. I heard that a lot at this conference. I hear it a lot all the time as a programmer. And the only thing that I can think of is basically this. Just... That's all I can think of. I can't take it seriously. I'm like, okay. Anyway, I recently became the number one committer on Rails. I'm at the top. I'm at the top. A ton of internet points. So many internet points. I think this is out of date. I think I actually have more commits than that now. But let me tell you, I'm going to give you all the secrets of getting internet points. I'm going to give you... This is the secret. This is my secret. I'm giving it away here. Is that revert commits count two. So more mistakes equals more points. So don't be afraid to make mistakes. You get more internet points for it. I'm a short stack engineer. I think that's actually a full stack, but I don't know. I'm a short stack engineer. I enjoy pair programming. That's me pair programming. The interface is a little sticky on that. Getting the TTY to work was difficult. I guess I'm also a dad joke programmer too. I tell a lot of bad dad jokes. Anyway, I have a cat. This is my cat. His name is Gorbachev Puff Puff Thunder Horse. Actually I have two cats. This is my other cat. Her name is SeaTac Airport YouTube. Facebook too is her name also. That's her. Actually, yeah, my wife told me the reason we got the other cat was so that... The reason we got two cats is so that I would stop looking at pictures of cats online. And I was like... That's not how this works. You see, now we have two cats and I look at cat pictures online. Anyway, I'm also a very shy person. You might think, hey, Aaron, how are you getting up in front of all these people and talking to them if you're very shy? Well, it's actually because I'm really excited about my work. I love my job and I also love... I want people to know what I do. So I use that to give myself the courage to be up here. But also I brought a whole bunch of stickers of my cat and I am really terrible at starting conversations. I'm very shy and awkward. So if you want to come talk to me, come up and say, hey, Aaron, I would like a sticker. And then I'll say, here's a sticker of my cat and then we can talk about cats or programming or whatever. So it's like a little icebreaker there. Anyway, so I was thinking about this conference and I looked at, like, so last year I thought about the stuff that I did here in Red Dot RubyConf last year. Last year I went to Hocker Centers and there's amazing food. I ate some really, really awesome food. This is one of the things I ate. I also thought about, like, so last year I thought about this is... So this is a map of the world. I came from here. I'm from Seattle in the United States there and we're down here in Singapore. And the problem is that, like, I love food a lot. I love food a lot. I eat all the time and it's very, very bad for my weight. But I was thinking that, well, so I am very far north typically, right? And we are now pretty far south and if we take a look at the earth from the top, like there on the right, that's where I am in Seattle and down there on the bottom right, that's us in Singapore. And actually the earth has the same... We're rotating faster down here than we are up in Seattle, so I should be being flung away from the earth faster, like if we move that up there, right? So that means that down here in Singapore I weigh less because I'm being pushed away from the earth, all right? So I was trying to calculate this. I was trying to figure this out and I thought, okay, there's all that. And I'm like, okay, let's plug in some numbers here and I plug in these numbers and I plug them in and try to figure it out and then I'm like, I have no idea what I'm doing. I couldn't figure this out. But I was trying to figure this out last year too. I couldn't figure it out last year and actually one of my fond memories of this conference from last year was talking to Jim Weirich about this and he actually calculated all of this for me and presented it in his slides and told me exactly how much weight I lost when I was down here. I was digging through old photos that I took from last year and I found this one where he was comparing himself to my cat. So anyway, I wanted to have a slide in here that just said we missed you, Jim. He is... So he was a huge inspiration to me and one of the reasons I try to do my best presenting is because I know that he enjoyed watching me speak so I'm doing my best all the time now to try and remember him. Another thing that I did last year is I ate durian fruit for the first time and this is a picture of the durian fruit that I ate and I actually really liked it. I enjoyed it. At first I thought it smelled incredibly bad and then I tried it and I got it and I thought it was great. So I had it and my wife came with me too and she tried it as well and so I was taking some reaction shot photos so I've got like this is her before before eating durian fruit and then after. So she didn't really enjoy it as much as I did but I really like it. Anyway, I've also been... I've been studying node jazz a lot lately. Yeah. And the reason I'm doing it, the reason I'm studying node is the metal and I think that I've accomplished that. Like... pretty close. Pretty close to the metal now. Anyway, so my talk is called speed up rails, speed up your code but I guess really it should be called rails.inspects or maybe this is not magenta. Something. So I'm going to talk a lot about performance in rails and performance in your own code like basically benchmarking stuff, how to benchmark stuff, how to measure performance of your applications and I have an ulterior motive for teaching you how to measure performance of your applications and I will talk about that at the very, very end of this talk but basically we're going to look at how to measure performance of code and then how I measure performance of rails and how we've increased the performance of rails and hopefully we can take that you'll be able to take these tools that I show you home and apply them to your code at home be it rails or not. So the first thing I want to talk about is performance trade-offs and typically when we're talking about performance we have to make trade-offs, we have to think about two things speed versus memory. We typically care about like I typically care about runtime speed like how fast my program runs and how easy for me to buy more RAM and throw it in a machine. It's getting cheaper. RAM is pretty cheap I think. We also have to talk about time versus space so when I say speed or memory I'm actually talking about time and space so I think it is really interesting terms because I just think about Star Trek then like so we have Q representing time and Picard representing space for some reason I don't know I just whatever anyway. So the point is that space is not free because you know we may we may be able to add more memory to our machines but it's still not free it costs something it costs some RAM but time is also not free because maybe we want to serve up requests quickly and really what the thing the point of this is is that you know nothing is actually free we have to make a decision when we're trying to improve the performance of our systems whether we choose Q or whether we choose Picard or there's actually one other thing that we can do we can find a better algorithm sometimes there's certain cases where we're able to improve both of these but I find that to be pretty rare we'll find it there's going to be an example of that in my slides we'll find we'll find a mystical unicorn in these slides but I think that this is a pretty rare rare thing that we can do so most of the time we're making tradeoffs most of this talk will be making tradeoffs and most of the time we're going to talk about giving up RAM or improving the speed of our system so RAM is cheap for us web developers sorry Heroku so we're going to focus on runtime speed we're going to focus on speed at the cost of RAM and the point that I'm trying to make here is that time and space are related we can typically trade one for the other when we're talking about performance of our applications so the first thing I'm going to look at is performance tools this is basically the advertising section of my presentation where I'm advertising other people's tools because I have no idea how to build this stuff I am just using it to make our code better so yes advertisement time so one of the tools that I like to look at for raw performance like first we're going to look at raw performance this is when we have like two bits of code and we just want to know how they compare to each other with speed which one of these is faster and my go-to tool for this is a gem called benchmark IPS and I'm going to show you how to use that with your code but first we're going to compare that to benchmark that comes with Ruby standard library this is a benchmarking tool that comes in Ruby standard library it looks exactly like this you say like create a new benchmark and run some test in times right you may have written a benchmark like this before now the problem is if you've written a benchmark like this before you might notice one of the problems is well how big do I make n you'll sit there and you'll like say okay well maybe I need to make a 10 or maybe I need to make it a thousand I'm not exactly sure so you run it and this may look pretty familiar to you you run it and it's like wow that's super fast it took zero time well obviously it did not take zero time it probably took more than that and your n was not large enough for you to study the performance of this method the other problem is that we have to deal with noise on our machines you might be sitting there running your benchmarks but you've also got iTunes going and Twitter running and that YouTube video as well and you have the Nyan cat going trying to get that high score and this is causing a lot of noise with your benchmarks so it would be nice if you had some sort of standard deviation to tell you like hey this is it took at longest this amount of time and at least this amount of time you can say it's kind of in the middle here and this is where benchmark IPS comes in so what benchmark IPS says is like okay let's figure out the iterations per second that's where IPS comes from it says like okay how fast can I run this code in five seconds so we have an example here where we're comparing set time how fast can we access a set versus how fast can we access an array like is this a member of an array or a member of a set accessing the set is much faster over there on the right right hand side we have iterations per second so we can say like the number the higher this number is the better it is how fast can we do that per second so a set include is about I don't know some big number I'm not sure 30 bajillion and then is that a number I don't know and then array.include is like this which is smaller I can tell it's smaller because it's shorter fortunately I'm using the same size font so since the set the set include is higher that means that using a set in this particular example is better than using an array so the point to take home from this is for iterations per second higher is better remember that higher is better the other important thing that this provides is a standard deviation so we can say like well if we at the slowest this could be 12% slower or 12% faster on either side of this particular number and the reason that this is important is maybe you've implemented some algorithm two different ways and you're not sure if one is faster than the other maybe you run it one way and it seems faster but if you don't have this standard deviation it could just be noise in your system like maybe this time you're not watching you know YouTube videos and listening to iTunes so if we compare this like I want to drive this home by comparing this benchmark using the benchmark library from Ruby's standard lib this is the same benchmark is like well except excuse me not the same benchmark we're doing set inclusion versus hash inclusion and comparing those two they should be roughly the same we would think that it's roughly the same and if we run this with standard lib's benchmark we might get an output like this and we'll see that they look almost exactly the same or that set access is faster and you say well okay set access is faster than hash access how can that be possible because if you look under the hood set is actually wrapping up a hash so you should have a little bit of overhead in there if we run this with benchmark IPS we'll see this is the same benchmark using benchmark IPS we'll see the output looks like this and the hash access is actually faster than the particular benchmark so we get a better idea of what our numbers are using this and we can graph this and see like okay well set access is up there at whatever number hash access is even higher so we know iterations per second higher is better we plot our standard deviation the standard deviation the bottom of the standard deviation on the right side doesn't even touch the one on the left side so we know for sure that the hash access is faster in this particular case the other thing I like to use these benchmarking tools for is black box testing so many times when I'm dealing with Rails source code like I don't know how something is implemented I have no idea this might be a surprise to you but even when I'm looking at Rails I'm on the Rails core team but much of the time I have no idea what's going on I really don't and I try to figure this out and one of the tools I use for figuring out what is going on is benchmarking tools I'm going to show you a short example of this let's say we have two cache implementations cache one and cache two and we're just measuring how fast it is to access an element from our cache so what we can do is say well I want to study how one cache I want to study how these caches perform as the size of the cache increases what we can do is say let's look at how big the cache is at 10, 100, 1000 and then 100,000 depending on those particular sizes and we can actually collect a report IPS will return to us a report object that collects this particular information so I can take this information and compile it down so this is the code to compile that down what I do is I say okay well I want to know how fast it is for me to access this particular cache 10,000 times so I have a fixed number I want to say how fast does it take for me to get to do a get 10,000 times well IPS it takes 1 over iterations per second so we'd say 1 over iterations per second to convert that to seconds per iteration then we multiply by 10,000 to say okay I want to know how fast this is for 10,000 and then we plot that we say okay for each of those particular tests how fast is this so we plot that out and we say okay well our cache 1 as we grow our cache 1 seems to stay the same the entire time it's constant time is going to be growing and it may be linear growth, may be exponential growth we're not sure exactly but one of them is definitely constant time the other one is growing and if we look at the cache implementations it's pretty obvious the reason it's pretty obvious why we get these performance differences it's because one of them is using a hash as the internal data structure where the other one is using an array and we know that with an array it's going to take a linear amount of time to scan that array so I'm going to show you a real world example of using this when I'm trying to figure out what is going on with Rails one of the things that I was trying to study is like well how is the performance of the routing system what is the routing system like how fast does it take let's study how fast it takes to generate a link to so when you say link to how fast how long does it take to generate an a tag so what I did is I said okay let's create a small route set here we'll add one resource to it and then we'll time how fast it takes to generate an a tag then we'll do that again for ten routes so we'll add ten routes to this this is actually ten resources then we'll do it again for a hundred routes and the reason I said resources is because resources actually adds like four a million routes I'm not sure how many adds more than one and then we'll do it for a thousand and then we'll plot that and see like how does this change how many variations per second can I do as we grow the number of resources in our system so if we plot that out it looks like this so along the x-axis there is the number of resources that we have in the system and along the y-axis is the number of seconds per a hundred thousand calls if we plot this as we grow the number of routes it stays about linear it's a linear performance this doesn't actually grow as we get larger now the next thing I said is like okay so we know that the rails router it doesn't matter how many routes we add to the router it'll always take the same amount of time to generate an a-tag well what about the size of the URL like how long can the URL be so I wrote another benchmark here that said okay well let's say like I want to generate a slash a slash id I think is what this is doing or no this is just slash a and then I want to say like okay so this one slash a and then this one says slash a a a a like ten a's then the next one is a hundred a's and we keep growing that and we say like okay how long does it take to generate a URL that's you know ten of length ten a hundred a thousand etc and then we plot that using our iterations per second calculating how many seconds does it take to do a hundred a hundred thousand calls and we see like this along the x-axis again is the number that the URL is how many segments are in that URL on the very right is the 24 segments and then along the y-axis is the number of seconds per hundred thousand calls and we see the longer that the URL gets the longer it takes to generate the a-tag so now we kind of know what the algorithms are going inside the inside the routing system we probably have some sort of hash table that keeps track of all of the that keeps track of the particular routes in the system so that we can come up and generate a tags and we have maybe some sort of array that's actually calculating this a tag or calculating the URL part of the href part of the a tag so we kind of know what what's from a high level what's going on under the hood the next thing I want to know is like where is time spent exactly so this is great we're able to use IPS to figure out like how fast two particular algorithms are but what about like where is the time actually being spent and for that I like to use this tool called stack you should check this out this is a profiling or this gives you it's a sampling profiler so it says like okay every clock tick every constant tick we say what what is our stack frame so the idea is that functions that are slower have a higher probability of being there when we hit that clock tick right so if you have a method that's called the other implication of this is let's say you have a method that's called 100 times it may not show up in your profile 100 times it'll show up less it'll show up the number of times it was actually sampled so if we run profiling for URL4 similar to the link to we can see the results like this dumps out that profile to a file called URL4.dump and we can actually view those results using the stack prof stack prof binary on our in the terminal and say like just give me a report of this this is what the report looks like you don't need to read that too closely but what it is it's saying like at the very top there we're spending about 26% of our time in this method called URL4 so that's where we want to focus our attention when we're trying to improve performance is we have this one function that takes 26% of our time let's go look at that function and focus on that in order to make our code faster. The next thing I like to look at is GC.stat I think Koichi wrote this Koichi did you write this or was it okay Koichi wrote this he should advertise this more it's really awesome I like to use this this method gives you statistics about the garbage collector so one thing that I need to know when I'm profiling Rails or making it faster is how many objects do we create so GC.stat will say like okay I can ask it how many how many objects have we allocated in this process so this says how many objects have we allocated so far now and this is a total so this number is always incrementing if you allocate a new object this will go up by one it's always incrementing so what I can do is I can say okay I'm going to say how many objects are in the system now I'm going to run some code I'll ask how many objects were allocated again later and then I can just subtract those two to calculate how many objects a particular chunk of code allocated and we I'm doing that in this particular example like I wanted to find out how many objects do we use for allocating active record objects or how many how many Ruby objects do we allocate so the first thing I did is I warmed up the system and the reason I did this is because we have a bunch of caches so we don't want to take these caches into account so I warm up the system then I count the number of count the number of objects the total number of objects that have been allocated in the system then we run a benchmark and say like okay n times let's look up this person object n times then count the number of objects in the system again and then subtract and divide by n and then we know the number of object allocations it took for call so a real world example of this is for benchmarking views I was very interested in how many objects were allocated every time we calculated a view so this is the benchmark I used for that and I said okay we're creating a new Rails application what I did here is this particular benchmark benchmarks the application but cuts out the middleware excuse me not the middleware the web server and the reason I'm doing this and not using just a rack middleware or one of the oh what is it the performance stuff that's included with Rails is basically because that stuff is lying to you to be frank the way that I'm doing this benchmark is I said okay I'm going to create a real application I'm going to send one request through that and I'm going to record the environment the rack end hash and I'm going to later replay that and the reason I'm saying the other ones are lying to you is because they're not actually instantiating all of the rack middlewares that are in the Rails stack also they're mocking out certain things like the session so you're testing the performance of a mock session rather than your real session you don't know for sure what your real numbers are you know in this thing that maybe kind of acts like your acts like your real system but doesn't I wanted to get something that was as close to a real system as possible and the reason I'm cutting out the web server is because I'm not responsible for the web server I can't improve the performance of the web server I'm improving the performance of Rails and also maybe Puma performs better than Unicorn or doesn't you can choose a different web server and plug it in depending on which one performs better so what this benchmark does is that benchmarks books new so this just has a books resource benchmarks books new what we do is we first instantiate the application then run one request through it in order to warm up any caches then we run this then we run our actual benchmark which is very this is very similar to what we were looking at previously only this is with a Rails application and then we plot that so this says like how many objects does allocate per request and what I want to show you is how we've improved over time or how Rails has improved over time like if we look at this graph this is a graph of the objects allocated per request along the y-axis there is the branch so on the left is 40 stable then 41 then master and you can see that we're dropping there and one thing that I want to point out about this graph is that we're actually starting at 2000 right there the other thing that I want to teach you in this talk is how to lie with graphs so if we actually make that very bottom zero the graph looks like this and you hear like ah that doesn't feel very good but the good news is that we have about a 19% reduction in objects since 4.0 stable and a 14% reduction in objects since 4.1 stable and later in the talk we're going to look at why that is or how we got those performance improvements and we're also going to talk about why well it does matter but it doesn't actually matter it does matter because there are many caveats we'll go over some of those the next tool that I like to use is this gem called allocation tracer also written by Koichi this is a tool for finding where objects are allocated in your system so you can use this gem you give it a chunk of code and you can say like tell me what type of objects have been allocated and where they have been allocated and how many times they have been allocated so here's a very short example it's like alright we give object allocation tracer a block of code, run some stuff in there, we allocate an array a thousand times a string and a hash and then we say okay tell me the total number of allocations that happened inside that block and give me a breakdown by type of object so if we run this code we'll see the output looks like this I've greatly reduced the amount of output it's actually a lot more than this but it really wouldn't fit on a slide you can see like we have a thousand strings a thousand arrays and a thousand hashes like you would expect from that code so the next thing I want to talk about we've looked at a bunch of benchmarking tools and these are most of the tools that I use for benchmarking rails and rails applications and these are also all the tools that I use for making the performance improvements that we're going to talk about the first one I want to talk about is using up active record so we've done some work we me I have done a lot of work to speed up active record the stuff I'm going to talk to you about is actually about three years of my work so I spent about three years on this particular project all the stuff that I'm going to talk to you about today is three years worth of work it actually sucks because I have about 30 minutes left and I'm trying to compress three years of my blood sweat and tears down into less than 30 minutes so first in order to talk to you about these performance improvements we need to talk about how active record works so this is from a high level how active record works let's say you're doing a post.find what we do is we take that post.find and we turn that into a tree of objects called active record relations this has nothing to do with Errol except that we take those active record relations and then we turn them again into a different tree this time using Errol Errol is actually a representation of the SQL statement that we're going to execute it's an AST of the SQL statement so we take those active record relations constructed in the AST from that AST we compile it down into a string that string is a select from that actually gets sent down to the database the database responds with some records to your database and then we return those back to you this is from a high level a high level view how active record works so part one of this performance improvements was basically bind parameter introduction and you may have noticed this I think in 3.1 or 3.2 before 3.1 or 3.2 you'd see in your logs stuff like this so if you did post.find 1 you'd see a SQL statement that looked like this and what's interesting about this is that the only thing that changes in these SQL statements are actually these numbers the IDs that were passed in now in later versions of Rails I think 3.1 or 3.2 and up you'd see something that looks like this where we'd say select star from post where ID was question mark and you'd see like some square brace ID and some number and what this was is we were introducing bind parameters to this the point of introducing bind parameters was to separate static and dynamic content so we have this SQL statement that's mostly static and then we have a little bit of dynamic content the theory being that well if we look at this if we look at this view of how active record works these things that generate the SQL statement this active record relation every time we're executing this active record relation in these arrow statements all to compile down to the SQL statement if that SQL statement is always the same if all of those calculations are always the same then we should be able to calculate all of that if the SQL statement is the same every single time we run this why are we doing the calculations over and over again we should be able to cache that particular computation so step 2 that was the theory it did not exist in part 1 it was just a sparkle in my eye so part 2 is about code decoupling we have a very complex code I'm going to show you some code and I want you to read this code example very closely it's very important here it's all readable you can read that in the back so obviously don't read this what's important here is this is one method that is one method and what those arrows are pointing at those arrows are embedding dynamic content into our SQL statement so we want to get rid of that dynamic content if we ever hope to cache these calculations we need to get rid of that dynamic content so the only way that we could do this is by refactoring why do I keep saying we the only way to do this is to refactor this so that we could extract those extract those dynamic values every time I stare at this function I just thought of Lord of the Rings one method to rule them all in legacy code so the way to reduce the complexity of this particular method was what I did is I removed has and belongs to many I'm sorry it was gone but it's not actually gone don't worry if you upgrade your Rails applications the method has and belongs to many actually exists still but what we do is we translate that under the hood to has many through the reason we can get away with this is because has and belongs to many actually is has many through if you think about it so has many through has three tables but so does has and belongs to many it's just that in the has and belongs to many case we don't have a middle model there so what we do now is whenever you call has and belongs to many we generate an anonymous middle model so if you dig around in the core of Rails and you find that middle model don't touch it we generate that model and then your has and belongs to many are now translated into has many through and actually this is one of my favorite it's just all red we're deleting stuff deleting stuff I just made me super duper happy the reason it made me really really happy is because we're deleting a whole bunch of conditionals they're just these conditionals just went away now we don't have to think about has and belongs to many anymore in our code we're able to get rid of that and it made extracting that dynamic content much easier so part three of this was now that we've got that dynamic that static and dynamic content separated we can introduce a cache I'm going to show you an example of this cache code this is our cache code example what it is is we generate an active record relation object here so we say okay person dot where name with some by and primary we limit it to one this creates an active record relation object what we do down here is we're able to execute that same relation object multiple times but using different values so we're able to execute it with Aaron and with evi it'll go against the database but actually only executes that block once calculates that relation only once so we generate a cache object we calculate a cache object from this relation this relation I'm going to show you what the cache object internals are because this is fairly important to people upgrading into people who are watching their memory and also to us on the rail theme the cache object internal looks like this we keep a list of the bind parameters the reason we we keep a list of these bind parameters along with columns is that we need to be able to typecast things so let's say when you do person dot find you'll typically pass in params ID right and that ID when it's coming in from the URL is actually a string and we need to know how to typecast that before we send it off to the database so we look at we look up the ID column and we say ah the ID column is an integer and you've bound this particular place to an integer so what we're going to do is whenever you pass in an object in that particular place we're going to cast it to an integer before we send it to the database. The other thing that we cache is the compiled SQL so this is actually the SQL statement that's sent to the database and that's it it's just a string so we take this cache object we have a higher level cache and by the way that code that code that I showed you on the previous slide this cache stuff does exist internally to rails don't touch it you can use it if you want to but it's absolutely not supported we may change the API at any moment but you can do exactly the same things I'm showing you but don't tell anybody I said you could do that please this isn't being recorded is it alright so what we did next we we we I I updated the internals to use to cache these particular relation objects so where can we use this cache where let's look things that use relation objects post.find, post.find by SAR has many, has many through, hasn't belongs to many, belongs to actually I'm missing one from the list is post.find by and then you pass in a hash there that syntax as well uses the use relations I omitted that from the list for some reason but all of these are cacheable they're all cacheable so I'm going to show you an example of implementation this is from very similar to what we have inside of Rails so for person.find this is part of the implementation for person.find if you look at this we'll see right there in the middle we have a relation object and you can see we're saying where the primary key is equal to some bind parameter and we limit that to one so this code looks almost exactly like what you would write in your controller you might say post. where ID limit one we're actually using this in our internals so the next part that you'll see down here is the ID parameter this is our dynamic parameter this is the ID that you passed in to post.find so we pass that in and we execute it and just as a fun fun side note there are many different ways that you can call post.find and I'm going to show you the many different ways you can call it with an integer you can call it with an array with an array of arrays with a hash of integers you can call it with another active record object with an integer with a block you can call with a scoping inside of a block and those last two you can call it with any combination of those ones above it on those last two as well I just didn't want to put that many heart points in my slides and the fun thing is that only that top one is cacheable that's the only one that's cacheable the rest of these I have no idea I have no idea what they're even used for I don't even know what they mean anyway so what's very annoying about this is that if you go look at the implementation of find today in active record you'll find this giant conditional that's like okay if we got a block, if we got an array if we got all this other stuff then we can't cache it if we slimmed down our API and got rid of those things that I don't even know what they're for if we got rid of those things we could just delete this code it would just go away it would be gone I'm just watching all the things all the time so let's look at the performance of active record now after these changes been applied these changes are in master today and they'll be coming out with Rails 4.2 whenever we release that which I think is this year sometime so let's look at post.find and find by name and what I did is I executed these two I benchmarked these two together and the reason I did that is because they're essentially the same thing one is finding by ID, one is finding by name the queries should be mostly the same and I also chose these particular forms because this code these two lines actually work all the way back to Rails 2.3 so I was able to take this code and benchmark it against Rails 2.3 through every single version of Rails all the way until today and this is what it looked like so if we go from 2.3 stable all the way over to the right-hand side which is the experimental branch you'll see that for post.find the number of calls per second it actually decreased from 2.3 stable so this is iterations per second like we talked about earlier so higher is better so we started out higher with 2.3 and we actually went down through the 3.0 series and we came up a little bit through 4.1 stable and then we increased here on the experimental branch by a lot and if we look at the same thing for post.find by name if we look at the calls per second there we'll see that it goes way down way down there into the 3.0 series comes back up a little bit in the master we just increased a lot and one thing that I thought was actually very interesting is that if we look at calls per second with MySQL so I'm calling out MySQL in particular there was a strong bias towards MySQL in the Rails 2.3 dates MySQL was much faster than the other databases and I think this is just because most people used MySQL back then but now Postgres is a million times better use Postgres please because it is awesome if we look at MySQL you can see way down here super sadness in the 3.1 and 3.2 series it's just super duper sad I'm really sorry seriously like what anyway so we went way down then come way back up here and you'll see that it's actually faster it's faster today on master than it was in 2.3 but only slightly faster it is faster but slightly faster faster than 4.1 stable this is what it looks like the percentage faster than 4.1 stable so you can see we're faster all the way across the board for every single database than 4.1 stable if we look against 2.3 stable again we're faster across the board but not much faster for MySQL the other thing that we looked at is object object allocations per call so if we look at object allocations per call you'll see like they went way up in the 3.1 series we can't say necessarily that allocating more objects caused the run time to go down it doesn't have a 1 to 1 mapping because maybe the garbage collector is faster we can't say that these are absolutely correlated but it does have it does cause your process to do more work so you can see these objects allocated went way up and then came way back down the same thing was find by name and when we look at these numbers we find that actually faster today creates 70% fewer objects than 4.1 stable when you do a post-off fund and 55% fewer objects when you do than 2.3 stable so I think that this is a really good a really good achievement the next thing I want to look at is belongs to we can see the exactly the same performance with belongs to it goes down for the 3.0 stable and then it spikes way up and then if we look at belongs to percentage faster than 2.3 4.1 stable we're just faster across the board and there's not a number for mysql2 because mysql2 didn't it didn't exist on 2.3 stable again we'll look at has many and has many through exactly the same things here where we've improved so much this is has many calls speed over time again same sort of graphs with has many through it looks nearly the same percent faster than 2.3 stable again much faster much faster than 4.1 stable the next thing I really want to look at is about has many through growth so what I mean by has many through growth it's like let's say we have a has many through relationship here right what happens if we add another has many through relationship another has many through relationship another one another one and we measure that performance that's the number of has many through relationship grows what does that look like So if we look at the time it takes to make 100,000 calls to a particular has many through relationship and we plot that, we'll see that it looks like this. So 401 stable is going across, going linearly, or linearly increasing whereas the bottom one master is just constant time. This makes me extremely happy because it's my golden unicorn of constant time, constant time improvements. So I'm really excited about this. The TLDR to take home from this is that we're about 100% faster. 9,000, over 9,000, 9,001% better. So challenges, the challenges for this is that we had to trade memory for speed. It's pretty clear because we are keeping these cash objects around and so I wanted to think about how much memory are we trading. So I'm gonna give you a very simple formula for calculating how much memory that we're trading and unfortunately I can't tell you. This is the formula, I can't tell you exactly but if we can calculate our total cash size to be about the number of find by calls that you do in your system or find by hash, that number, the number of those multiplied by the size of the cash object. So I can't tell you exactly what the size of the cash object is because it depends on the size of your SQL query and I can't tell you how many find buys you have in your system but the important thing is that it should be bounded. If it's not bounded then you have a problem. Not bounded would mean that a user can say like can query against a user to find or that somebody could pass in a column from the URL and that might be bad. You probably don't want users to say like, I don't know, query by whatever they want to. These, this cash size should be bounded. Now the next question is can we cash raw relations? So lots of people ask me like, hey Aaron, you were able to make posts up find fast and find by defast and all that fast. Why can't you just make, why can't you just make post.wear.wear.wear.wear fast? And my question is why don't you make it faster? Anyway, so let's talk about that a little bit. Let's look at an example controller. Here's an example controller where we say person.wear, find by name and then get a list back of people by name. So why can't we make this faster? People say like, we wanna know can we make this faster? Can we cash this particular thing? And the answer is we can cash this, we can cash that. We can make that faster. So I put together an experimental branch that attempts this and I'm gonna show you the performance difference between the two and talk about the complexities with both of them. So let's say we have this experimental branch. We can see this is again, iterations per second so higher is faster. The experimental branch is faster. We cash that, it's faster, it's faster than master, we're happy, it's about 30% faster, okay? The problem is each time you call a method on a relation object that impacts the cash key. So if you say .wear.select.include.whatever.whatever.whatever, we have to take all of those particular values into account when we calculate the cash key. So let's look at the code on the experimental branch. This is up there in blue is the cash key calculation and down there in red is the actual query execution. So what I found is that there's about over 11 variables impact the cash key. There's 11 different ways that you can change that you can change a relation object and I'm being conservative. I'm not 100% sure that that's correct. So that code that I showed you, that code only handles two types. That code only handles two of those variables. Imagine if it handled all 11 of those. It may be that calculating the cash key might be way more expensive than actually, than bothering with it. I mean, why bother if it's so expensive to calculate the cash key? So I did one more experiment. I'm gonna do one more experiment, find a comparison. So what this is is this, this is a benchmark that runs wear name versus find by name. And this is against the experimental branch that has both the performance improvements that we talked about earlier, along with the performance improvements of post.wear, a person.wear. I compared those two. And if we look at that, we run this. This is again, iterations per second, so higher is faster or higher is better. On the right side is find by name. On the left side is doing post.find. This is even with our new caching code put into place. And still, find by name is three times faster than the previous version. So the reason that it's so much faster is because even though we can cache that stuff, every time we execute that particular block, we have to create a new active record, a new active record relation object. We always have to do that. So the question is should we even cache these relations? Should we bother with this code? And I'm not sure. I don't know that we should. I personally, I don't think we should. I don't think the complexity is worth it. What I would prefer is I would prefer we have a new, we introduce a new API. And the new API would look something like this, where we say, okay, let's cache a query. Let's have a cached query. We'll put a relation, we'll create a relation object inside that and we'll cache that and then we can execute it multiple times. Many of you might be looking at this and saying, hey, that looks very similar to a scope. And why don't we just use scopes? And we can't do that because people expect that scopes are executed multiple times, where this block is only executed once. For example, you might have a scope that's like give me all of the comments for today. Well, you probably wouldn't be happy if today was the same day every single day. That wouldn't be very fun. The other thing is that the cache key is extremely easy. So I think if we introduce a new API like that, the cache key is extremely easy. Plus, you get to keep using the nice relation API. Relation API is maintained. So the next thing I wanna talk about is speeding up helpers. That was all ActiveRecord stuff. And I'm pretty, after three years of doing that stuff, I'm pretty burnt out on ActiveRecord. So I'm looking at speeding up helpers and more of the view end side. So this is stuff I've been working on very recently. So one of the things that I wanted to do is reduce object allocations. So I was profiling requests in response time like I was showing earlier. And I wanna look at a little bit of that. What I did is I said, okay, here's our test code. Again, we're creating a new application here, running one request through it, running StackProf against that. And what I found is that I'm running StackProf, the tool that we looked at earlier. I wanted to see, look at bottlenecks in our rendering system. So what I found is, I found our benchmarks look like this or our results look like this. At the very top there, we have this object called ActiveSupportSafeBufferInitialize. That is the very top line. It's chewing up about 9% of our processing speed every time, so we're calling initialize a lot. So I said, okay, where are we calling ActiveSupportSafeBufferInitialize? I need to be able to find this. How did I find this? So one thing I use is I use a tool called TracePoint that's in Ruby 2.1. I'm not sure when it was introduced but it's in later versions of Ruby. And I use this code to say, okay, every time we get a call, I check to see if the call is on the ActiveSupportSafeBuffer class and if the method is initialize. If it's that class and initialize, I'm gonna say, give me the call stack. Tell me what the call stack is. I get the binding and a val caller. I'm sure Koichi is cringing at my code. It was really super terrible. But I say, okay, give me the caller and down here below those two lines at the very bottom are our tests so we can see what the actual output is from this and if we look at the output, we'll see, okay, we have two calls from that. One is inside the HTML safe method and the other one is actually inside our test script on line 13. So I found the inside of Rails, this is actually our test script and where I found these inside of Rails, in Rails I found them inside this method called the tag options method and what the tag options method is is it's a helper method that generates the actual, the attributes inside your tags. So, I don't know, any of the attributes that are inside your tags and it actually comes from inside this erbutil.h method. In order to talk about this, I should hurry along. I have a million, billion slides left. Okay, so let's talk about HTML sanitization in Rails very quickly. HTML sanitization works with an active support safe buffer. So if we look at an ordinary string in Rails, we'll say like, okay, we check the string, the string type, we ask if it's HTML safe, it says false. The safe buffer, if we use a safe buffer, we say like, okay, give me an HTML safe version of this string, we get back a safe buffer, we ask is that HTML safe, it says true. So an important thing to note here is that HTML safe just tags the string. If you say string.html safe, what you're saying is, I am okay for these bytes to be output across the wire without being escaped. Okay, it does not mean that it is safe. Despite the name, it does not mean that it is safe, it just means that we will not, Rails will not escape it when we send it out across the wire. So what does erbutillsh.do? What that does is it says, okay, if the string is not HTML safe, then will gsubit will say like, okay, let's escape everything and then we'll call HTML safe on that and return to you an active support safe buffer. But what happens here is we're actually creating two different strings. The safe buffer is a subclass of a Ruby string. So we say, okay, we do a gsub, that creates a new string object with the escaped code in it and then we create an active support safe buffer here. So it actually creates two strings. So how does this relate to tag options? If we go back and look at tag options, we assign the return value of that to this, we say, okay, escape it, we get it back here into value and then we actually interpolate it down here into another string. So what's happening is we're taking a string, we're generating a string, we're generating a safe buffer, then we're generating another string. It's like, so many objects everywhere. So I was wondering to myself, what is the point of generating the safe buffer if we're just gonna interpolate it back into a straight up Ruby string? We can remove the safe buffer, we can remove that completely. So we ended up extracting that and say like, okay, let's just do string to string to string. I guess it's better. I guess it's one better. Anyway, so we extracted a method, say like, okay, unwrapped HTML escape. We extracted the inside of HTML escape to say like, okay, let's just have a method that escapes the string, but doesn't return a safe buffer. So we just say, we just do the gsub and then we're still backwards compatible. The original method calls this unwrapped HTML and then calls HTML safe on it. Then we update our callers to just call the unwrapped version. Call the unwrapped version. Gets assigned a value, value gets interpolated and now we're just doing string to string to string to string. So what happened was we were looking at about 200 allocations per request, 200 string allocations per request for, or this saved about 200 allocations per request for books.new. Well this is just a scaffold page. What I'm testing here is a scaffold, if you had generated a book scaffold then you're looking at that particular form. So that alone saved 200 allocations per request. And if we look at our request benchmark, it looks like this. The way that I benchmarked this is I said, okay, we do exactly the same warmup, we're requesting books.new, warmup here, using allocation tracer here to get the number of allocations per request. And if we break this down by object type, we can see this is what our object type allocations looks like across versions. So blue is 40 stable, green is 41 stable and then whatever that orangey thing is is master. So you can see our number one problem is t-strings and we're reducing these, we're reducing these objects as time goes on. So we're improving rails here and this graph doesn't look super impressive but remember it's a 90% reduction since 4.0 stable. Remember that, that's very large. So it's 14% reduction since 4.1 stable. I'm very happy about this. Now unfortunately, unfortunately with this particular change your mileage may vary. So I said like, what we were talking about here is string buffer allocations inside of tag options and obviously this depends on what your HTML looks like. So I can tell you that this saved us 19% of the strings for this particular page but what it does for your application, I'm not sure. Like if you're just generating JSON this will save you nothing. But if you're actually generating an HTML page it may save you a lot depending on what your HTML looks like. So the next thing I wanna talk about is string object reduction. Like what happened between 4.0 and 4.1? We didn't talk about that. Now, oh man, so many slides. Let's go through this quickly. All right, so I get extremely nervous before my talks and I just think like, okay, if I add more slides everything will be okay. I'm extremely paranoid. I'll get up in front of everybody, take like 10 minutes and be like, I'm sorry. But now I have millions of slides. Okay, so in Ruby we have mutable strings and what that means is that every time we evaluate this block we get a new object allocated. This is very apparent. Look at these object IDs. They're different every time that block executes. It shouldn't be a surprise to anybody but what's really cool is in later versions of Ruby I think it was introduced in 2.1 there is now a compiler trick in there that says if we freeze the strings we actually get just one object back. So you can't mutate that string anyway. So what we said is okay, every time the compiler comes across a string literal with a dot freeze on it we're just gonna return the same object to you every time. We won't allocate a new object every time that's evaluated and you can see that the object IDs are the same every time. So the way we took advantage of this in Rails is if you look at ERB templates we have stuff, this is an example ERB template. Here is a compiled version of the template that I want you to read extremely closely. That is a joke. So this is the compiled version of the template. We took the ERB and compiled it down into Ruby code and if we look at that, if we zoom in on it we'll see something that looks like this. We have a string literal there that's the actual HTML, that TR or that TD that we had in the ERB. It's the HTML literal and what's interesting about this is you can't mutate it. It's never mutated. Template literals can't change. So what we did for 4.1, we, me, I'm not sure if anybody noticed I did this. I noticed I did it. Anyway, we added freeze in there. So these string literals are now frozen. So the HTML literal is frozen and we actually reduced, that's where these savings came from the 40 to 41. That's how we reduced that there. The next thing that I'm looking at is speeding up output and I want to warn you, this is a work in progress. The code I'm going to talk about does not exist on master today, it exists on my machine. So hopefully my machine does not die or maybe I should push a branch somewhere, I'm not sure. Anyway, so I'm going to talk about how the law of the meter is helping us speed up our code and really this is just, I think this is just a suggestion of the meter. I don't like that it's called law of the meter because it's like, if you screw that up, it's like somebody going to come along and arrest you from breaking the law. Well, it turns out in the United States they will. And then I wondered to myself like, so if you get arrested for violating the law of the meter, does that mean you're doing a vested development? So to me, like I'm not sure exactly what the definition of law of the meter is. Like I'm not, I'm totally not sure about it. I know it's like you don't want to have too many dots or something. Like if you're calling too many methods down, then you're doing something wrong, which is probably true, but the way I look at it is it's not about the dots, it's about the types that your function handles. So like what I think is that the fewer types that your function actually handles, the faster and easier your code is going to end up being. So what I mean by that is we'll look at this compiled template again that you can totally read very well. And we see this, we have this HTML literal here again. What's interesting about this is if we go look at the safe append method, that safe append equals method, and why there's an equals sign there, I have no idea. Super stupid, we should probably remove that. But if we look at this, it has a line there. Look at that line, it says return self if value.nil. Why? Guess what value can never ever be? It can never be nil. The ERB compiler guarantees that this will never be nil. So why are we doing this? Why is that there? We don't need to handle nil. This is a type that we don't need to deal with. So we just say like, I don't care about nil, I don't handle it. If you pass a nil to me, whatever, I don't care. So we just remove that. We don't handle nils, just delete that. Why are we doing this runtime check? We're checking that every single time. Your ERB templates, every time you have that HTML literal, we're checking whether or not that's nil. What a waste, it's a huge waste. So I have to realize that that probably wasn't actually law of the meter. I think that was actually more defensive programming, but my joke would not have worked with defensive programming. So I used law of the meter. Now the other thing that we're working on with this is this output buffer, this thing that it's appending to, happens to be a subclass of, God, this kills me. Happens to be a subclass of safe buffer that we were looking at earlier, which is also a subclass of string. Now if you look at safe append in the superclass, it's saying, hey, if I'm not HTML safe, if this safe buffer is not HTML safe, then do something special. But this can only happen. This conditional only returns true if the safe buffer has ever been mutated. Well it turns out that you can't actually get access to the output buffer. We don't give you access to that, which means that you can never mutate the output buffer. You can never make this conditional will never return true. It'll always be HTML safe. So why do we have that? It kills me. I look at this. I'm like, how can you even get to the output buffer? I look at this, I look through Rails, I'm like, how can you get access? Who mutates it? And I think the answer is no one. So we should be able to just reduce this conditional to that. So if I could sum this up, what I think that you should be doing in your applications in order to increase speed, caching invariance, any time that you can find a calculation that you're doing over and over again, this always ends up being the same. You should cache it. Eliminate objects and where this comes in is no code is faster than no code. If we can delete code, it's gonna be faster. Limit types. Limit the types that your functions handle. If you can reduce the number of the types of objects that your functions handle, it'll make your code, you'll have less code. And just as we learned a few slides earlier, less code is equal to faster code. And the other thing that I want you to do is now that I've taught you about all of these performance tools, all of these tools that you can use to test the performance of your Rails application, report performance issues to us, please. That dip, when we were looking at those graphs, that dip from two three down to three O and three one, that should never have happened. It should never have happened, in my opinion. And I think the reason is that nobody was measuring their applications. Nobody was saying like, hey, people were like, ah, Rails seems slow. It's gotten slower. And I hate it when people do that. I say, the reason I hate it is because saying something like Rails is slower is not helpful to me. So you say to me, Aaron, Rails is slower. I'm like, cool. I'll just go speed up Rails. I, okay. So, if you can tell me what is slower, we can make it faster. So I actually heard this, it warms my heart. I heard yesterday that GitHub was finally upgrading to Rails 3.0. And I know that they had tried this a long time ago. They were upgrading to Rails 3. And I talked to my friends that worked there and I said, well, why couldn't you upgrade? And they said, well, it's slower. And I'm like, what slower? No response. Thanks, I'll get right on that. Anyway, Rails 4.2 will be the fastest Rails ever, I think. I think this will be the fastest Rails ever. I'm confident in saying that. So, thank you. If you have any questions about this stuff, like I don't know if we have time. I'm a little bit over an hour. Otherwise, I'm gonna be at the party later. So please come say hello to me. I will give you a sticker. We can talk about performance issues or whatever. You can just say Rails is slower and I'll still laugh. That's fine. Thank you. Thanks, thanks, Aaron. Any questions for him? Don't be shy. I am not always in Singapore to answer questions. Yes. Can I talk? Oh yeah. So you showed a few things you think are bad DSL or API design choices in Rails. So if in Rails 5, you could pull your beta card and get rid of something, what would it be? You think I have a veto card? Yeah, I don't have that. If I could pull something from Rails 5, what would it be? I have no idea. I can't even tell you. Well, actually, some of the stuff like we provide, we have some methods for doing introspection into ActiveRecord. I wish that we could remove those because some of those things just don't make sense, but people like, for example, one of the things I broke is that somebody would say, iterating through all the reflections on ActiveRecord, looking for hasn't belongs to many ones, and now that hasn't belongs to many is implemented in terms of has many throughs, these didn't show up anymore. So I broke their code. If I could remove some of this introspection stuff, I think that might help. The other things I'd like to do is honestly, the next thing I'm working, the things that I really wanna get rid of are things that impact thread safety, really. I can't give you anything super specific now, but anything that impacts thread safety I would like to remove. So globals. I know that's not super specific, but I hope that answers your question. I had a question, yeah. So on those performance graphs, was that all with the same version of Ruby? Yes, it was. Have you tried it with a new version of Ruby just to see what the garbage collector does? That was with the newest, that was with trunk Ruby. But, so all those tests were performed with trunk Ruby. They should be, the graphs should probably be similar on any particular version of Ruby, maybe not exactly the same numbers, but still the same percentage increase. Aaron? Hey. All your slides spoke about performance improvement, but after master you altered adequate. Yes. To talk a little more about that. So I originally named this project adequate record after my consulting company, adequate. We do everything adequately. We have just enough clients, so don't ask me to do work. No, we named it, so we named it after, we named I named it after that. And now that it's been merged back into master, it's dead, so master is it. All those numbers are from master. So most of the performance improvements are actually in adequate record, and it's all just, is that separate gem anymore? No, no, no. Adequate is just the branch name. It's actually, all of it was just merged back into master. So it's all, there is no such thing as adequate anymore. It's not adequate. It's just rails. And this is the first time I'm seeing you in formal interest. What happened to your suit? What happened to my suit? I'll tell you what happened to my suit is like extremely hot temperatures and high humidity is what happened to my suit. I would die up here if I was wearing that. So does the Rails core team, is there any discussion around automated performance regression tests? Yeah, we've talked about, so we've talked about that, but some of the problems are like, so theoretically we have performance tests. We do have performance tests, but they test things like, if I remember correctly they only test active record stuff and we never look at the output. So it doesn't super help very much and we don't have historical data, we don't have historical data for them either. So we don't know across particular branches. As far as I know, the stuff that I've showed you all here is the most comprehensive performance tests backwards in time. We've talked about, another problem is we can do, another problem with doing performance tests like that is API stability. So like, when I was showing you controller performance tests from 401 or 4041 and master, I actually couldn't show them to you for 3.2 and 3.1 because the API changes in Rails made it such that I would have to do major work on my benchmarks in order to show you those tests over time. So the answer is yes, we've talked about it and it is hard, but if someone can come up with a good idea, like please, we'd love to have that. I saw a talk of yours at some point and you were talking about actually submitting prepared statements so that the database could cache the query plan and I was wondering, I think at the time, you guys decided not to do that in ActiveRecord, is that? That is not true, we do that. That is not true, okay, cool. Yes, we do that. We do do prepared statements and you can shut them off if you don't want to use them, but we do them by default, except on MySQL. Yeah, we can talk about that. We can talk about how awesome MySQL is at the party. Hello. I know you're a little shy, but I think your performance is fantastic. Oh, thank you. I appreciate that. Can we give it up for Aaron again? Thank you.