 It's great to be back here at Mountain West RubyConf. The first time I presented at a conference was back in 2009 here at Mountain West. I'd like to apologize to all of you in the audience who had to endure that. Now I last had the privilege of speaking at Mountain West back in 2010, and I'm thrilled to be back in 2015 giving this presentation on better routing through trees. Now my name is Jeremy Evans, and I'm the lead developer of the SQL Ruby Database Library, and the author of quite a few other Ruby libraries, and I'm also the maintainer of Ruby ports for the OpenBSD operating system. So in this presentation, I'm going to discuss an approach to routing web requests that I call a routing tree, and explain their advantages that a routing tree offers compared to routing approaches used by other Ruby web frameworks. And before I can speak to the benefits of a routing tree, I first need to discuss earlier approaches to routing. So let's first delve into a brief history of routing web requests in Ruby. Now I first started using Ruby in late 2004, and at the time there were only a few choices for web development in Ruby, and one choice was the old school approach of using CGI without a framework, and with CGI, you in general had separate files for each page, and the routing was handled by Apache, so routing really wasn't an issue. Now another choice was Nitro, which is a web framework I'm guessing many of you have never heard of, since the last release was in 2006. Now at the time, Nitro used static routing, where the first segment in the request path specified a render class, and the second segment was a method to call on that render class. Now another choice was Rails, which at the time was at version 0.8.5, and back then Rails did not even perform routing itself. You could use Rails to generate pretty URLs, but Rails could not actually route URLs itself. If you wanted to use Rails with Apache, you had to use rewrite rules in Apache. So here's an example from the Apache.com that shipped with Rails 0.9.1, and it takes the controller action ID path and splits it into separate parameters for the controller action and ID. And this example implemented the static routing that Rails supported by default, but by using more advanced rewrite rules in Apache, you could get it to do custom routing. Now in addition to supporting Apache, Rails at the time also worked with WebRick. However, if you were using WebRick, you were restricted to the controller action ID static routing, as that was hard-coded into the WebRick integration. It was not possible to use custom routing with WebRick. So Rails started supporting custom routing in 0.10.0, and the default was still the controller action ID supported by the previous static routing, but you could use different patterns to support custom routing. Now internally, Rails stores each route in an array. And when a request came in, Rails would just iterate over the stored routes, checking each route to see if it matched to the current request. And as soon as it recognized the request path, it would return the appropriate controller class, which would then handle the request. Now Rails continues to use this basic approach of iterating over an array of routes for over three years. This code is from Rails 2.0, which still just iterates over every route looking for a match. Now around the same time Sinatra is released, with a radical simplification for how web applications can be developed by specifying the routes directly with each route yielding to a block to handle the action. And while externally, Sinatra looks much simpler than Rails, internally they use a similar process for routing, storing the routes in an array, and just iterating over the array of routes when a request comes in, stopping at the first matching route. Over the years, Sinatra's basic approach has not changed much. This is the current Sinatra code for routing, slightly simplified, and the main change from the original implementation is Sinatra now has separate arrays of routes per request method. However, after getting the array of routes for the request method, it still just iterates over every route looking for a match. And while Sinatra's approach has not changed much over the years, Rails' approach has changed significantly. So from Rails 2.1 to 2.3, Rails tries to optimize route matching by checking for initially matching segments in the path. And if the current routes prefix and the current request prefix do not match, it can skip subsequent routes with the same prefix. I wasn't able to find a current web framework that uses a similar approach, and modern versions of Rails use a different approach, so I will not be discussing this approach further. So in Rails 3.0 and 3.1, Rails uses Rackmount to handle routing. And Rackmount is a dynamic tree-based router. It organizes routes into a tree based on the parameters you provide, such that it can also skip similar routes if the current route does not match. Now in case you want to use Rackmount for routing, but you do not want the overhead of Rails, there's a web framework called Sinfeld that is a thin wrapper over Rackmount. Now Rails 3.2, Rails switched to Journey for route handling, and Journey implements a deterministic finite automata engine for request path matching in pure Ruby, which can take a request path and return all possible routes that could match it. So Journey then iterates over this array of possibly matching routes, checking each route to see if it actually matches the request, and then it sorts the routes by priority. And in general, the first of these resulting routes will be used to handle the request. Now if you want to use Journey for routing, but you don't want the overhead of Rails, there's a web framework called NYNY that is a thin wrapper over Journey. So back in January 2010, while Rails was at version 2.2, Christian Neckerjian, the author of Rack, was working on a proof-of-concept router named RUM, and the fundamental difference between RUM and the other web frameworks I've discussed is that in RUM, routing is not separate from request handling. Instead, for each request, RUM yields to the block, and routing is performed by calling the on method with arguments. If all of the arguments are true, on will yield to the block that is passed to it, otherwise on returns nil without yielding. So the getPath and parameters here are predicates that check against the current request. The get method returns true if the current request uses the getRequest method, the path method with the greet argument returns true, if the current request, the first segment in the request path is greet, and the paramet method with the person argument returns true if the request has a parameter named person. So by nesting these calls to on, you build what is basically a tree using RUM's DSL, and at any point in any of these blocks, you can handle the current request. Now, one issue with RUM is that it was never released as a gem. So in April 2010, Michael Martins took RUM and added support for Hamel templates and released the Cuba gem, and over the next four years, he and others improved on RUM's initial design. Now, in July of last year, after many years of using Snatcher as my primary web framework, I was trying out Cuba and found that using a routing tree made certain aspects of web application significantly simpler, but some aspects of Cuba's design and implementation had issues that made it more cumbersome to use than Snatcher. So I forked Cuba and released ROTA, which keeps the same routing tree approach that was introduced by RUM, but otherwise tries to make it more friendly and Snatcher-like as well as significantly faster. So let me go over the routing approaches that I've discussed so far. The first is completely static routing, either using separate files with CGI or using very early versions of Nitro Rails. And while static routing is fast, it is also inflexible. And these days, I don't think anyone will consider using a framework that did not support custom routing. The next approach was used by early versions of Rails and is still used by Snatcher, by swung routes in an array, and just iterating over the array of routes when a request comes in, testing each route to see if it matches the current request. And this process is fairly simple to implement and understand, but it makes routing performance decrease linearly as the number of routes increases. Next we have Rackmount, which is the underlying router used by Rails 3.0 and 3.1 and also by Sinfeld. And Rackmount organizes routes into a tree based on the parameters you provide so that matching prefixes are shared by multiple routes. And this significantly increases routing performance, but at a large increase in complexity. So next we have Journey, which is used by modern versions of Rails and also by NY and Y. And while it is certainly faster than previous approaches used by Rails, it's probably the most difficult for the average Ruby programmer to understand. And finally, we have the routing tree approach that was introduced by Rum and is currently implemented in Cuba and wrote up. So I think there are three basic ways that these routing implementations are different. And the first way is a quantitative difference. And the quantitative difference is in the performance. These different routing implementations all show different performance characteristics, especially as the number of routes increases. So in order to determine what the performance differences are, you need to benchmark the implementations with a varying number of routes. Now the issue here is that comparative benchmarks in general are biased specifically to show the advantages of the benchmark creator's preferred choice. And the benchmark I'm using is no different. It's called R10K and it benchmarks each of these implementations using 10, 100, 1000 and 10,000 routes. And I wrote R10K because the only other comparative benchmark I could find only benchmarked hello world applications with a single route. While the structure of the sites benchmarked by R10K is certainly friendly to a routing tree approach, it should be friendly to most other routing approaches as well. So R10K is open source in my GitHub and my welcome external review to make sure I'm doing anything stupid or unfair to the other web frameworks that I'm benchmarking. So let's first look at the results for 10, 100 and 1000 routes. So here are the runtime results and pay no attention to the absolute numbers as it's only the relative performance differences that matter. And one thing to note about these numbers is that R10K benchmarks using the RAC API directly. So this does not include any web server overhead. From this graph, you can see that at 10 and 100 routes rails is an outlier, taking about three times as long as the next slowest framework. However, when you get to 1000 routes, Sinfeld is significantly slower than rails and Sinatra takes almost as much time. When you go to 10,000 routes, Sinfeld and Sinatra take much more time than all of the other web frameworks put together. I'm not sure why Sinfeld performs so poorly in this benchmark. It's supposed to be a very thin layer over RAC mount, so it's possible it's an issue with RAC mount or it's possible that's how Sinfeld uses RAC mount. Anyway, because Sinfeld Sinatra rails sort of throw off the scale of this graph, I'm gonna take them out of the picture. So with those frameworks gone, the performance picture is a little bit more clear. Near the bottom is the static route implementation, which is basically the fastest routing you can get. But again, I don't think anyone would really consider a static routing framework these days. Next fastest is Rota followed by NYNY and then Cuba. And from this graph, you can see that a routing tree approach is not necessarily the fastest. Performance is also highly dependent on the specific implementation. So part of performance is also the amount of memory used and here are the memory results for 10, 100, and 1000 routes. And as you can see up until about 100 routes, all of the web frameworks except Rails are clustered around 20 megabytes of memory. At 1000 routes, there are free basic groups with the static route implementation Cuba and Rota under 20 megabytes, Sinatra, Sinfeld, and NYNY between 30 and 40 megabytes and Rails up about 70 megabytes. So when you go to 10,000 routes, the picture is pretty much the same except that Sinfield jumps to the top of the memory list. So at 10,000 routes, Sinatra uses about twice the memory of the routing tree implementations and Rails about four times the memory of the routing tree implementations. And one of the reasons that routing tree implementations are very friendly on memory is that the routes themselves are never stored in a data structure. The tree in a routing tree is really Ruby's abstract syntax tree for the routing tree block. So from a review of these benchmarks other than the static routing approach, which I don't think anyone would consider, Rota has the fastest implementation and uses the least amount of memory. NYNY does fairly well, showing that journey's approach to routing is also fast. And next comes Cuba, which is significantly faster than Sinatra or Rails, but significantly slower than Rota despite using a similar approach. Now Rails is a fairly heavyweight framework, but its performance does not change drastically even with large numbers of routes. And finally, we have Sinfield and Sinatra which both have significant performance issues with large numbers of routes. Now one thing to keep in mind is that these performance numbers are pure routing performance numbers. In many, if not most applications, routing performance will not be the bottleneck of the implementation, as the application will spend much more time handling a request than routing it. However, I can say in the applications that I've converted to Rota, performance is noticeably faster compared to Sinatra or Rails. As shown by the amount of time it takes to run the tests. After converting applications from Sinatra to Rota, using the exact same rack test base integration tests, tests ran about 50% faster and after converting from Rails to Rota, the tests ran about twice as fast. But I think that's probably due more to Rota having lower per request overhead not due to its faster routing performance. So I mentioned earlier that there were three ways the routing implementations differ, and the second way that they differ is a qualitative difference. Now the qualitative difference is the internal complexity of each implementation. These routing approaches very widely in their implementation complexity. Static routing is the simplest in terms of complexity, just parsing the request path, using a single regular expression and using the captures from that regular expression to call a method on an object. Iterating over an array of routes and checking each route to see if it matches the current request is also simple and easy for the average Ruby programmer to understand. Rackmount's approach of analyzing the route set and building a tree is much more complex. I think the average Ruby programmer would have trouble understanding how it works without significant time to study it. Journey is even more complex than that, and if you want to try to understand it, you should probably have a good memory of the compiler courses you took in college or get ready to do some research in terms of how compilers are implemented. Now a routing tree approach is similar in complexity to the array of routes. You start off at the top of the routing tree block, each method called checks to see if the current route matches the request. If so, the process is repeated for the block that you passed the method, otherwise you continue to the next method. So a routing tree's processing is equivalent to iterating over a small number of routes at each branch of the tree instead of one large array of routes. So the second type of difference between these routing approaches is the implementation complexity. Static routing, iterating over an array of routes and using a routing tree all have fairly simple implementations that are easy to understand. Both rackmount and journey have more complex implementations that would take a lot of time for the average Ruby programmer to understand. How does the internal complexity of the routing implementation impact users of the framework? Well, the higher the implementation complexity, the more difficult it is to find other programmers who could understand the code, add features to it and fix bugs in it. In general, more complex code is harder to debug than simpler code and as a general rule, unless there is a substantial benefit from the complexity, simplicity should be preferred. Ultimately though, I think most users of the framework treat the internal complexity as an externality, something that does not affect them directly and therefore does not affect their decision to use the framework. So we come back to the three types of differences between the routing implementations. And the first two were performance and implementation complexity. The third way the routing implementations differ is also a qualitative difference. And that difference is how routing integrates with request handling. Routing request is not an end in itself. It is purely a means to make sure that the request is handled correctly. With a routing tree, routing is not separate from request handling. The two are integrated. So as you are routing a request, you can also be handling the request. For all the other implementations I've discussed, routing is separate from request handling. This integration may not sound important, but I think it has by far the most impact. So I'm gonna discuss the advantages of integrating routing with request handling and then explain what web frameworks that lack this integration offer in terms of similar functionality. So let me start with some example Sinatra code. And this is fairly simple. We have two routes. One for get and one for both, both related to a specific album. When I was using Sinatra, this was pretty typical in many of my Sinatra applications. So the main issue with this approach is that it leads to duplication. Here you see the path is duplicated in both of the routes. And the retrieval of the album from the database is also duplicated in both of the routes. Now using a routing tree, you can simplify things. Instead of duplicating the path in both cases, it is specified once in the branch. And as soon as the branch is taken, the album is retrieved from the database. And in both get and post routes, the album instance variable is available for use. So the primary advantage of using a routing tree is it allows you to easily eliminate redundant code by moving it to the highest branch where all routes underneath the branch would share it. Now it's certainly possible to do something similar in Sinatra. You can use it before blocks in Sinatra and provide a path to the before block. And Sinatra will iterate over all the before blocks before routing the request, checking each to see if the request path prefix matches the block. And if so, it will yield to the before block. So using this, you can still only retrieve the album from the database in a single place. However, now know that you need to specify the path itself three times instead of just once. And unlike the routing tree example, the shared behavior is in a separate lexical scope, which either makes it more difficult to understand how it is connected to the two routes. And the two routes themselves are also in separate lexical scopes, which makes it more difficult to understand how they are connected. Additionally, using before blocks like this in Sinatra has a negative effect on performance. Before blocks are processed pretty much the same way as route blocks. So adding a before block is equivalent to adding a route. And since routing performance degrades linearly as the number of routes increases, adding before blocks like this in Sinatra hurts performance for the entire application. In Rails, you specify the routes in a config.rout-srb file. And the code to handle the routes goes into controller class and a separate controller file, usually using a separate method per route. And as in the initial Sinatra example, this duplicates the retrieval of the album from the database in both methods. Now Rails also offers a way to eliminate the redundant code. You can specify a before filter to specify method to call before the action for a given set of actions. And the main issue with this approach is that as you want to add more routes where you want to retrieve the album, you need to remember to manually update the only option to the before filter. Also, as in the second Sinatra example, this shared behavior is in a separate lexical scope, which I think makes it more difficult to understand the connection. So Sinatra in Rails and most other Ruby web frameworks can use before filters to emulate the code that you would place at the top of a routing tree block. However, a routing tree block is really just Ruby code, and you could execute arbitrary Ruby code at any point during routing, not just at the top of blocks. So one of the common places where this is useful is when doing access control. If part of your site allows anonymous access and part of your site does not, you can place the part that allows anonymous access first and then run the check for a login and then have the rest of the routes where anonymous access is not allowed. Note that this is an issue with most sites that support logins since the login action is usually available to anonymous users. Now the site of access control is more complex to handle in Sinatra. When I was using Sinatra, the general way I would handle this would be to specifically whitelist each path or prefix that allowed anonymous access, and this works okay if you only have a small number of paths that allow anonymous access, but it quickly becomes challenging if you have a large number of separate paths that allow anonymous access. Similarly, this type of access control is more complex in Rails. Usually in Rails the way you would handle this is by using a before filter and application controller that required a login, and then in each controller where you want to allow anonymous access, you need to add a before filter, well you add a thing where you skip that before filter, and this spreads the access control handling out to multiple places in the application, and again requires you to specifically whitelist all of the allowed actions. Now because a routing tree is directly executed, you can include arbitrary logic that affects routing, and in this example you have a routing tree that makes the list of albums available to everyone, but for admins it also routes other requests to an admin site rack application. And in Sinatra this is more difficult. You have to add routes for each request method that you want to support with splats so that Sinatra will only do a prefix match on the path. And then you have the block pass if it is not an admin request, and if it is an admin request you need to create a new environment to call the admin site rack app with, and then you need to call the rack app. Now I'm sure this is possible in Rails as well, but in the interest of time it will not be providing an example. So these improvements may not seem large, but taken together they can result in much simpler applications. While it is possible to eliminate the redundant code using before filters, in most Sinatra applications I've looked at, that is not done is it is not natural. The usual case is the code is just copied into all of the routes that need it. And this does not surprise me because when I was using Sinatra that is what I would do. Using separate before filter for every set of get and post routes in your application feels unnatural in Sinatra. So I analyzed one of the applications that I've been working on for a couple of years which is a process automation system that my office uses. This application was originally built using Sinatra and it was switched over to Roto last year. And when it was using Sinatra it had duplicate code in most of the routes. And when I switched it to a routing tree I was able to eliminate all the redundant code by moving it up to the highest and closing branch where it was shared by all routes underneath that branch. So currently the application has 79 total routes. To get to those 79 routes there are a total of 36 branches in the routing tree where the branch contains multiple routes. And of those 36 branches containing multiple routes 25 contain code that is shared by all routes underneath the branch. Now in most cases the code that is shared is either retrieving objects from the database or it is enforcing access control. So this means that 70% of the time that I'm branching in the routing tree the integration of routing with request handling is resulting in the elimination of redundant code. And it also means that if I wanted to eliminate the same redundant code in Sinatra or Rails I would have to add 25 separate before filters. Using a routing tree makes sharing code for all routes under a branch natural. So web applications that use a routing tree tend to avoid redundant code naturally. Using before filters to eliminate redundant code is not natural in most other Ruby web frameworks so even though it is possible it is often not done and the natural approach leads to redundant code. Note that in order to extract maximum benefit from using a routing tree you need to structure your paths in such a way that they are naturally routing tree friendly. And these days this type of path structure is pretty natural. If you structure your paths like this they are naturally routing tree friendly. That's because as soon as the routing tree has routed the albums one prefix it can retrieve the album from the database so that all routes under the albums one branch can share it. However if you use Rails one style, controller action ID routes you cannot drive as much benefit from a routing tree. And that's because the segment containing the albums ID is at the end of the path after the branching for the show tracks similar segments has already taken place. So I have multiple applications that were originally developed pre Rails 1.0 and upgraded all the way to Rails 4.1 without changing the path structure. And when I switched them to using a routing tree I still ended up with redundant code in many of my routes. So you should keep these path structure considerations in mind if you are considering converting an existing application to use a routing tree web framework. Obviously if you are creating a new application or willing to change the path structure of an existing application you can design the paths to be naturally routing tree friendly. Now so far I've discussed what I think are some advantages to using routing trees. However I would be remiss if I did not mention that there is one trade-off when using the routing tree approach. And the trade-off is the loss of route introspection because routes are not stored in a data structure when using a routing tree. Since a routing tree is really just Ruby code you cannot introspect your routes like you can in most other Ruby web frameworks. Not in my applications this doesn't matter but there are some applications that rely on introspection of the routes and those would need to be handled differently when using a routing tree. So again this is something you need to keep in mind if you are converting an existing application to use a routing tree. You need to check that it is not relying on introspection of the routes or provide an alternative if it is. So I'd like to finish up this presentation by reviewing the advantages that I found from using a routing tree approach. First and most importantly a routing tree approach it makes it simple and natural to eliminate redundant code. This may not seem like such a big deal but when converting applications with redundant code to using a routing tree I noticed that there were unintended differences in the redundant code because changes were made to only one of the routes and they really should have been made to all routes underneath a branch. By using a routing tree you eliminate the redundant code and you only need to make changes in one place to have it affect all routes underneath a branch. By eliminating redundancy in your code your code becomes easier to read and understand which makes maintenance easier. I've been the only programmer where I work for over 12 years and I've continually maintained multiple applications for over a decade. So the ease of maintenance is something that is very important to me. In most cases moving to a routing tree approach especially using Rota will improve performance of your application. And this is not just because Rota is fast at routing but also because it has lower per request overhead. So if you're interested in using a routing tree I recommend checking out Rota. It's the fastest and most featureful Ruby web framework that uses a routing tree. It has a very small core but it ships with a set of plugins for a wide variety of use cases. And these plugins add support for simple things like rendering templates in JSON to more advanced features like template streaming, asset packaging, and even sending emails using a routing tree. The plugins are tested alongside the core code base so that they continue to work as Rota evolves. So if you're interested in learning more about Rota check out the website at rota.jarmaevans.net hop on IRC, push in the Google group or just come talk to me. That concludes my presentation. I want to thank you all for listening to me talk about routing trees and other approaches to routing. And if you have any questions that I have time I'll be happy to answer them now.