 This talk is titled, Three Rails, but what does this title mean? This talk is about speeding up Rails framework, but I'm so sorry, I kind of failed to bring something like, hi guys, I brought a magical patch that makes Ruby on Rails three times faster, so let's just merge this and release Rails 15 now. I kind of plan to do this on stage, but I'm sorry, I failed. So instead, I'd like to discuss some possibilities or points of view, right? So again, what does the title, Three X, mean? Actually this title is inspired by Matt's keynote at RubyKaigi last year and RubyConf I think. In that keynote, Matt stated that he promised that Ruby 3.0 is going to be three times faster than Ruby 2. What's happening? So instead, it's actually, it's so easy to make Ruby on Rails three times faster. So easy because everything we need to do is just don't make no more performance regression in the Rails side and wait for Ruby 3. Then run your Rails applications on Ruby 3. That obviously should be three times faster Rails, yeah, win. So anyway, my name is Akira, I'm on internet as a Matsuda like this. I work on some open source projects like Ruby language and Rails framework. Also I authored and maintaining some gem libraries like Caminari, the Pozination library, Active Decorator, Motorhead, Stateful, Aenome, et cetera, et cetera. I run a local Ruby user group called Asakusa.rb in Tokyo. Asakusa.rb was established in, I think, 2008. We're meeting up on every Ruby Tuesday and we had so far 356 meetups so far. So we have so many Ruby core committers in our members, like more than 30 people. And we had attendees from like about 20 different countries from all over the world. So it's quite a global local group, right? We welcome every visitors from any other countries like, I mean, countries that are not listed here. So if you're interested in visiting our user group and if you're having a chance visiting Tokyo, please contact me and come to our meetup. Also I'm organizing a Ruby conference in Japan named Ruby Kaigi. Ruby Kaigi aims to be the most technical Ruby conference focusing on the Ruby language itself. Last year's Ruby Kaigi was like this. And this year we're having another Kaigi in September in Kyoto. Please know that the conference is not in Tokyo this year. Kyoto is an ancient capital of Japan. There remains so many historical like temples and shrines, gardens and so on, like showed in these pictures. I just googled Kyoto. This is the result. So I think Kyoto is the most beautiful city in Japan. So if you haven't been to Ruby Kaigi before and you're willing to, I think this year's one is a really good chance to enjoy both the conference and your trip. So please consider joining the conference. This year's venue looks like this. This is the picture of the main hall, the second hall. Ruby has nice looking garden, Japanese garden. So we're already selling the tickets and CFP is already open. So please check out this official website and submit your talk or buy your ticket. So anyway, let's begin the actual talk. As I told you, this talk is about speeding up the Rails framework, not your Rails application. To speed up a software, firstly, we need to know its speed. And in order to measure the speed, we usually use benchmarking software, like for example benchmark IPS or Ruby's built-in benchmark library. I prefer this benchmark IPS. For example, if you actually want to measure the performance of your Rails application, for example, you can do something like this. I made a monkey patch, monkey patching Rails application.call and we run benchmark IPS. Actually, it kind of runs the request like 100 times. I know it's horrible, horrible idea, but it kind of works. And it benchmarks purely the Rails part, right? I mean, it escapes the browser sites. So this outputs some score. So how can we improve the score? That's the topic of today's talk. My first trial is, of course, RubyGC, because everyone knows that RubyGC is so slow. Okay. I believe it just like stopping GC will improve a performance like 30%. So let's do this first. To observe the GC, we have GCStat in the core library and we have GCTracer, which is made by Koichi. So for example, adding GCStat calls to the previous module. It shows something like this, like it iterates 45 times in five seconds. And it outputs some like GCStat result. It shows that it's surely GC is happening there like 50 times, right? So let's stop this, like GCDisable, then run the benchmark again. Then I got this result, 50 iterations per five seconds. So the GC adds about 10% overhead in this benchmark. I think because RubyGC is improving recently like this, we had so many improvements on GC module. So GC is actually no more 30% overhead. It's like just about 10% overhead, which is I think not a big deal. It's acceptable in my opinion. So I'd like to thank Koichi for doing this amazing work. Keep on doing this amazing work and also thank you, Heroku, for supporting his activity. Thank you very much Koichi and Heroku. By the way, let me now talk a little bit more about Ruby 2.3 new feature, somewhat concerning to the garbage collection about strings. Strings in Rails used to be a big concern of the community. There actually was a trend like sending a pull request with .freeze, .freeze, .freeze, .freeze in Rails and shows some microbenchmark, which aims to make Rails faster. But honestly, I didn't like that kind of pull request because it kind of pollutes the code base, right? This just looks ugly to me. So I proposed a magic comment to Ruby to freeze all string literals in the file, just in order to stop the .freeze pull requests. It's like this frozen string literal true. It's already introduced in Ruby 2.3. It's already available. So if you're interested, you may try. Actually I have not tried myself yet, but maybe this will add some performance like several percent, three or five percent, I guess. Maybe. Anyway, let's stop caring about these strings now. It's already solved problem, I think. And another Ruby myth is Ruby is slow because it's a scripting language. We have to parse and compile every time. So it's slower than compiled language. Is it true? I think it is true, but Ruby 2.3 has new features that you can precompile Ruby code into a binary and you can load the binary. I'm not going to talk about this in detail because it's going to be described by Koichi, the implementer himself. So don't miss Koichi's talk tomorrow about this. So which part of our simple Rails application takes time? This profile. To measure the whole performance, I used a benchmarking software to profile which part is actually slow. We use profiling software like stack prof or RB line prof, but again, I'm not going to describe them in details in this presentation because you may have already known, heard of this and you may know these tools. These are so powerful and so popular. Maybe you may have heard of this before. And also we have TracePoint, which is a built-in library in Ruby. Again, Koichi's work. You can simply count the number of method calls and you can hook into Ruby method call and put a hook into every method calls. So you can count the method calls like this. This is a sample example rack middleware that counts every method call happening inside that rack middleware stack. So with this middleware, I get this output from my scaffolded Rails application. The most happening method call is save buffer HTML save and HTML save, escape HTML, attribute something, things like this. However, these are just theories and I'm sorry, I'm going to talk something different today through my experience and I know some like weird parts of Rails, weak parts of Rails, low parts of Rails. I'm going to talk about some of these in the rest of my time. So Rails consists of MVC. Which one do you think is the most heavy part? How about Action Pack, the C part? Action Pack sits on top of so many rack middlewares that would make the method call stack very deep. Maybe that would be a bottleneck. And actually Rails 5 introduces a new feature called Rails API. In order to reduce this rack middleware depth, I think. So let's measure. This is again very roughly written rack middleware benchmarking tool. This outputs how long did it take for each rack middleware? And I got a result like this, less than 0.0 something for every middleware. So it turns out there's no slow middleware in the default stack. I don't actually see any other particularly slow part in Action Pack, actually, besides Rails, Resolution and Vora Hoppers, which I'm not going to talk about today. So let's leave Action Pack. And let's see this list again. There are some safe buffer things and escape HTML things, which is obviously Action View. Action View actually has some performance problems. I know that. So Action View consists of roughly these processes. It looks up the template, compiles the template, and returns the HTML string to the browser. So let's start with the template lookup. Every implementation of template lookingup is like this. It calls directory glob for every single template lookup. So the resolver queries to the file system per each request, actually per each render, layout render partial, each render. Couldn't we speed this up? So I tried to make more optimized resolver over the default optimized resolver. The concept is like this, just read the whole file system once and cache that. Cache all the file names on template file names in memory. So this is the trial implementation, which is already on GitHub. This basically just scans through the view path directory only when the application got the first access, then caches all the file names, then it performs the view file name comparison in memory, as I told you. And here is the benchmark proving the speed. The result is like this. My version of template resolver is 18 times faster than the default resolver in a very like carefully crafted microbenchmark. So another issue, I think, is render partial. Render partial is basically slow because it creates another buffer per each render partial. But in some cases we don't need a new view context for each partial, like simply rendering footer, header, et cetera. So we can do, we probably can do something like PHP include and simply concatenate the partial into the parent template. And the implementation is, I'm sorry, still work in progress. This wasn't very easy as I expected. So another idea is we can pass the full path file name into render partial call so that the template resolver doesn't have to look up all the view paths. The API will look like this, render path with a full path file name or render relative, like require relative in Ruby. The implementation is, again, not yet done. Another idea about rendering is render parallel. So we can parallelize render collection. So if you have 100 collection, maybe we can make the render collection 100 times faster with using threads, right? We tried this, but I saw so many, too many connections error from ActiveRecord. It's obvious. So this turns out to be a failure, I think. Another render method is render remote, which performs rendering via Ajax, particularly for a very heavy partial. Here's an implementation, which I did like two, three years ago. I found a repository. I looked at the repository like yesterday, but I forgot what does the name mean. Anyway, the API is like this. Very simple. Add remote true to your render call. This would perform the render partial call through the Ajax. It kind of already works. I'm sorry, but I'm not using it. So another topic is encoding support in template rendering. The current implementation of rendering the template into Ruby method is like this. It first dupes the given template source, the whole template string, and forcing coding the source binary, sorry, the source text to binary, and dupes the given template source again for detecting the encoding magic comment, then forcing coding again for some reason, and finally, encode in ERB. So many encoding conversions. Who needs this feature? Who actually writes a non-UTF view file in your application? If any one of you does, please raise your hand. Wow, you do. No? No? Okay. So nobody in this room actually does use this feature. Actually we... Sorry. Okay. That might be possible, I think, but the actual use case is probably for Japanese people because I see test cases like Shift-GIS, which I think is written by Yehuda, but I'm sure nobody does this in Japan. That's just ridiculous. So the current state is nobody needs this feature. So we just can remove this. So here's my suggestion, let's do this. So here's a benchmark for this new version of ERB handler, and this is the result. It kind of shows some improvement, but only 1.5 times faster. Because in this case, it includes the whole compilation process in the ERB side, not just the encoding conversions. And moreover, this would reduce the memory consumption, I suppose. So let's profile that with memory profiler. The code looks like this, benchmarking the memory consumption in, again, the benchmark IPS inside the block that repeats the whole template resolution. And the result is like this. It kind of shows some memory reduce in strings, string objects. And in my opinion, memory usage is very important. It's about speed, actually. Because if we could reduce this, then we could put more containers, I mean, web workers in the web application container. So this really is about speed, right? So I'd like to propose removing the encoding support, maybe in row six. So by the way, this is about the ERB handler. So if you're using Hamel, we have some alternative implementations like this. So please try using these instead of the official Hamel. The next topic is active support safe buffer. As we saw in the method calls graph, we call this so many times, which is currently a very ad hoc implementation. It has a flag inside the string object and flips the flag on and off. So I tried to use Ruby's built-in tainted flag, but I failed. But maybe we could make a faster version of safe buffer somehow, maybe in C extension, I guess. The next topic is I-18N. Sorry, I have only five more minutes, so I'll speed up my talk. Again, it's not yet done, but I have some work in progress in this machine, which probably I'll publish within a few days. The next topic is active record, and I have 40 minutes for active record. So my main concern about active record is aerial objects when building queries. It just builds so many aerial objects, aerial node objects. So what if we directly build SQL strings from the, like, find or where parameters for very simple queries, like just where name equals something or find by ID. It's still not published, but it's almost working, and the product is called RNI. So this is the implementation, the example. If the find call accepts some complex parameters, then it will pass the query to super. But for the simple ones like find by ID or find by ID string, it directly compiles the SQL query. This is actually very cheap. It's cheaper than compiling the cache, I mean, the aerial node cache for, what's that? What's the name? Adequate record. I'm going to skip this part. So my next topic is model present. My advice about model present is never, do never hit model present because it causes massive method calls inside. Like if you call, for example, current user present, how many method call will occur? So this is the answer. I see 85 method calls just for user.present, which is ridiculous. I suggested a patch fixing this situation, but this turns out because the Rails core team expects you not to do this. So please, please don't call present method on your active record model or put something like this in your application. I think I have no time running through all these slides. This is about speeding up the real-size initializers. This is about don't require pry doc, pry, by bug, pry anything in your gem file. This is about squashing all bundle gem files into one directory, which is currently not yet working, using require relative instead of require, which didn't show any significant speed improvement. Detecting autoload, which causes some speed regression in production environment, actually I found two occurrences of autoload in production in Rails 5, which happens inside rack 2. So please fix this, Aaron. About speeding up test, previously our application took one minute on CircleCI just for preparing the schema, inserting 600 tables into the schema migration. So I changed this to this one single query, which makes, in our case, 600 times faster. This is already committed into Rails 5, so it's available in Rails 5. Some slow parts in active support, like multibyte, time zones, so like multibyte. It consists of multibyte charts and multibyte Unicode. It loads the whole Unicode database version 8, which sits inside active support library, but we actually need this, I'm not sure. And I suppose at least we Japanese don't use this. So we can just remove this, in our case, and make the framework smaller and make the boot time faster. The next one is time with zone. Here's a benchmark for time versus time with zone. The result is time with zone is 25 times slower than the built-in time. So if you're sure you don't need time with the zone, you just can replace your time with the zone into time. I mean, if you're 100% sure what you're doing. We can also boost some slow parts of Rails with C extensions. Here are some examples, like CGI, SK, CGI, SKHTML, fast blank, hash building in different Xs. Some of these are already introduced into recent versions of Ruby, so please just use new versions of Ruby, which will bring you the speed. That's for the time over, the conclusion. So there is really no one single performance bottleneck for everyone, for every Rails application. Some apps might have 1000 models, some apps might have 3000 lines of Rails RB, and the bottlenecks will change. In my opinion, Rails is a makase, which is nice, but in some cases we want to customize certain points of Rails framework. Maybe what we need is more flexibility, like Merb used to have. So there are so many slow parts in Rails, and there can be more alternatives to these parts of Rails. So I would suggest to make Rails more flexible, to be like Merb a little bit, and I hope everyone here to reveal your hack and bring more modularity, diversity into the Rails community. Thank you. Thank you very much.