 Or are you all still out there? Cool, good, well, I've never actually done the 5k, but I've always wanted to, and then I find out it's at like seven or something, and that's when I wake up. Hot, go running, so. Cool, well, I wanna first say thank you to the organizers. This wouldn't be possible without them putting it all together and bringing us all here, so let's give a round of applause for organizers and sponsors, and everyone who makes it happen. So I'm Eileen Yuchitel. I live in Kingston, New York. It's about two hours north of New York City, or a seven and a half hour drive to Pittsburgh. My flight was canceled, so I did the math and getting in a car was gonna be faster than waiting for Newark Airport to get their shit together. You can find me anywhere online at the Handel Eileen codes and I very rarely but occasionally blog at EileenCodes.com. I'm a senior systems engineer on GitHub's app systems team. We do a lot of stuff with open source and Rails and Ruby and Rack and how those interact with the GitHub application, keeping GitHub up to date, or at least trying to get it up to date on Rails. We'll get there eventually. I'm a member of the Rails core team. For anyone who's new to the Rails community, the core team is responsible for defining what the future of Rails is going to be and we are responsible for making releases, releasing releases, fixing releases, and all the good and bad things that come along with releasing new versions of Rails. So if you've been in the Rails community for any amount of time or just saw Mark's clickbait title yesterday, you've probably heard the phrase Rails doesn't scale. We've heard it on Hacker News, we've heard it in VC meetings. Recently, we heard it in the misguided Vanity Fair article about Twitter with the claim that they couldn't effectively combat harassment because Rails was just FisherPrice software. It's funny they called it that. I know they meant it as an insult saying Rails was for children and not serious adults with serious web apps, but FisherPrice and Rails are alike and that they are robust, well-designed and unforgettable. The interesting thing about Rails getting a bad rap for not being scalable is that it's taken many companies to a scale they'd never dreamed of. If Twitter had tried to build their app to handle the scale it is today, they probably would have been bankrupt before they had a proof of concept. The reality is that Rails does scale. It just doesn't scale easily yet. Historically, Rails hasn't done a great job at making scaling easy. While we pride Rails on developer happiness, we were missing the mark of scaling and frankly falling short. In this talk, we're gonna look at ways in which Rails was not scaling easily and what we're going to do about it. I want scaling Rails to be as enjoyable as running the Rails new command. Scaling means different things depending on who you're talking to or who is trolling you. Or what your app is supposed to do. For the purpose of this talk, scalable means that literally over time, an application's growth in code, data, and traffic doesn't really change how well it performs. While Rails can't take on every part of scaling your web application, there are two important areas that I want to focus on. The first is that when you're scaling your web application, your test suite shouldn't block development. As an application grows, new code shouldn't really cause a ton of friction. Developers should be able to easily prototype and build features quickly without being hindered by the time that it takes for your test suite to complete. While we've made many improvements to test performance in the past, it's really no replacement for concurrent testing. The other area that Rails was failing to scale easily is that as your application grows, it should be able to handle increased data and traffic over time. The database shouldn't just crash because you got more users or because you have too much data. It should be easy to distribute the load on your database as your traffic increases. Rails has had support for multiple databases, but that support is undocumented, hard to understand, and difficult to implement. While Rails can't take on scaling every part of your application, I believe that we can improve the developer happiness around these two main issues. My goal is to tackle these issues to make Rails 6 scalable by default. Scalable by default means that it's easy for engineers to scale their Rails application without having to spend hours writing a custom setup or using a lot of outside tools. Scaling your application doesn't have to be hard and Rails should be able to take on at least some of that work for you. Some of you may be thinking, hold up, wait, Rails 5.2 just came out. Why is Eileen talking about Rails 6? Well, major versions of Rails allow us to reconsider what's possible. It's going to be a while before 6 is out, but I wanted to share my plans with you. In the past few months, I've got a lot of work to refactor Rails and add features to make scaling easier. We'll look at the work done to speed up your test suite and improve Rails' multiple database handling to ensure that scaling is simple and intuitive. So the first feature that I'd like to introduce for Rails 6 is parallel testing. The parallel testing feature will allow you to run your test concurrently using either forked processes or threads. Parallelizing test suites is an important part of scaling web applications because as you add more code, the more tests you will add, or should be adding, and your test suite will become slower and slower over time. Sure, you can mock a bunch of calls or ensure you're not making external requests or building a lot of data sets, but that still won't speed up the time it takes to run 10,000 tests linearly. Parallelizing large test suites can significantly reduce the amount of time engineers need to wait for your builds to finish. A lot of other companies like GitHub, Basecam, and Shopify have been writing their own parallel testing implementation for years. The problem with all of us writing our own implementations is we weren't sharing our ideas with the broader community. Rails is popular for taking the complexity out of your app and turning it into reusable generic chunks of code. By adding parallel testing to Rails, you can reduce the complexity of Rails applications, set a standard for parallelization, and make it simple to scale your test suite. Earlier this year, Aaron Patterson and I worked together to write the parallel testing feature in Rails based off the parallel testing infrastructure in GitHub. While inspired by the GitHub code, the code in Rails has been rewritten to have a smaller footprint, be more generic, and a lot easier to use. Rails parallel testing also supports multiple databases out of the box. This was a requirement for us since we use a lot, we use multiple databases in GitHub in all of our environments. The concurrent test runner in Rails 6 can parallelize your test suite by using either forked processes or threads. First let's take a look at the implementation for the forking parallelizer. Forked process parallelization is implemented in Rails 6 using distributed Ruby, also known as DRB. DRB is a distributed object system for Ruby and can be implemented with zero dependencies. Distributed Ruby is really cool and it's one of those features that I don't think gets a lot of attention so I encourage you to check it out. Parallelization is invoked in your test, in your test, through the test helper with the parallelize method. This method takes a worker's argument that you wanna fork by. Parallelization with forking lives in active support and is backed by many tests parallel testing feature. When you start your test suite using the forking parallelizer, the testing parallelization class will be initialized with the number of workers you requested. In the initialize method, we set a queue size, a queue, a server, and a URL instance variable. The URL starts the DRB server locally and represents the URI for the server to bind to. Parallelization in Rails is supported by many tests so once this class is initialized, all the other methods within the class are actually called by many tests, executer, and parallelizeMe. From here, many tests will call the start method in the parallelization class. Start sets the empty pool array to a block that forks the processes based on queue size. Start then calls this after fork hook. This is provided so that active record can hook into the parallelization and create a database for each process. Start then creates a DRB object which represents the queue of work to perform and will maintain a reference to the DRB server. The DRB object queue variable is used in this loop to split the work for each process. Inside the loop run one method is called which will run the tests assigned to that process. Many tests will return a result and the queue variable then calls record. Record sends the reporter and the result so that at the end of the test suite run, each process can report which tests passed and failed. At the end of the start method run cleanup is called hook is called which allows active record to delete those temporary databases that were created in the after fork hook. And finally many tests call shut down which will empty the queue and wait for each process to terminate. And that's basically the entire implementation of work process parallel testing in rail six. As you can see the underlying implementation is really simple and it's only actually about a hundred lines of code. Don't worry if you didn't follow this though you don't actually have to know how any of this works to use parallel testing in rail six. For newly generated rail six applications, parallelization will be automatically included in your test helper and for upgraded applications you only need to add a call to parallelize in your test helper. The parallelize method takes a worker's argument that accepts an integer. The default number of workers is two which means that rails will automatically fork your tests into two processes and create a database for each process to run your tests against. Each database is suffixed with the worker number starting with zero. The databases are generated in the after fork hook that we looked at earlier. The databases are then automatically deleted in the run cleanup hook right before processes are terminated. Active record and parallel testing does all of this work for you so you don't need to do any of it. If you're using multiple databases in your application rails provides two callback hooks to handle your secondary or third or fourth or hundredth database. The parallelize setup is called via after fork and can be used to set up your secondary database. The parallelize tear down method is called by the run cleanup hook and can be used to clean up your fork databases. I get how we have a whole lot of complicated code to make this happen. So once we're on rail six this is going to be a very welcome abstraction. If you're using a single database as most applications are then you don't need to use these hooks because rails will automatically create and delete the databases in apps when as you're running the tests. As you can see the API for forked parallelization is relatively simple and easy to use. So the other type of parallel testing that rail six provides is threaded. You'll want to use threads over processes in a few cases. You'll need threads if you're using windows because the Unix sockets that DRB relies on aren't supported and you'll also want to use threads if you're a J review user. The last reason that you may want to use threads is if your tests are IO bound. Most of your tests are not IO bound but if they are they're going to be faster with the threaded parallelizer because they won't be waiting for the processes to finish before running the next thing. The caveat though is though that your code has to be thread safe to use the threaded parallelizer. The parallelize method takes a second optional argument that default called with the defaults to processes to use the threaded parallelizer in your application set with to threads. It's that simple. The parallelization setup and tear down hooks are not supported by the threaded parallelizer because it only uses a single database. One thing to keep in mind is that running tests in the same suite cannot use different types of parallelization or number of workers. You can't pass five workers to 100 of your tests using threads and processes for three workers to the rest. If you need that kind of behavior you should split your tests into two different suites. Threaded parallelization in Rails 6 uses many tests threaded executor. One of the differences between the implementation of forked process parallelizer and the threaded parallelizer is that we didn't have to write any of the code for the threaded parallelizer in the test suite. Many test parallelizer uses threads already so we simply just added a conditional that initialized that executor instead of Rails parallelizer. So we originally, Aaron and I originally didn't want to implement the threaded parallelizer because frankly threads are really hard to get right but I felt it was important to support threads out of the box for RJ, Ruby, and Windows users. An example of why threads are hard came up while Aaron and I were working on the threaded parallelizer. We were running our tests and we were suddenly seeing that everything was sort of randomly failing. Deadlocks, sometimes data was left behind. It was really strange. Some runs would pass but almost all of them would fail. We first thought it was related to the fact that my demo app that we were running the tests in was using multiple databases. So I switched to a brand new app that had a single test database but we saw the same issues. Running a single test always passed so we knew that it was something to do with concurrency. To look deeper at the problem we hacked the test log to display the connection ID for each connection to the database. Each thread should be connecting to the test database with its own connection ID. But what we saw instead was quite opposite and quite wrong. Each thread was connecting to the database with the same connection ID. And Aaron says to me, this is so weird. All the databases are using the same ID. We seem to have an isolation problem. And as he said those words, I suddenly knew that it was all my fault. I realized in that moment that code I wrote in Rails a year ago was coming back to haunt me. How many of you in this room remember system tests? Well when I implemented system tests I wanted to do it in such a way that we didn't need to use database cleaner anymore. The underlying problem was that when system tests are run Rails opens a connection to the database on one thread and then Capybara boots a Puma server which opens a second thread and second connection to the database. This meant that in system tests Rails thread and the Puma thread couldn't see each other and we would get inconsistent data between test runs because the transaction and the Rails thread didn't know about the data on the Puma thread to roll it back. The way we fixed this problem was to force each thread to use the same connection. Do you see where I'm going with this? The change that I implemented in the fixed system test is the exact change that broke parallelization because the change required Rails to tell all of the threads to use the same database connection. With parallel testing we want the exact opposite of that behavior because the threads aren't trying to share data and transactions like system tests are. For effective threaded concurrency each individual thread should maintain its own connection to the database so that the threads aren't influencing data on each other. Luckily, since I implemented the offending code it was easy to find and fix the problem. I simply added the ability to internally configure the setting so if you're using the threaded parallelizer the threads will each open a separate connection to the database. This does mean that system testing is probably not possible with the threaded parallelizer. I mean, I know it doesn't actually work. There might be a way to make it work but I didn't do that. Considering that the change to fixtures for the system test threads is the exact opposite behavior of what we need for the threaded parallelizer I don't know how well they would have worked even if I hadn't broken them in this way. If you want to use parallelization with system tests I do know that processes works. So running your parallelized tests doesn't require any extra work on your part after you set them up in your test helper simply run Rails tests like you normally would and Rails will handle the rest for you. Sometimes you may want to change the number of workers depending on the environment you're using. For example, you may want to use two workers locally but 14 workers on CI. So Rails provides an environment variable called parallel workers which will take precedence over the number of workers passed in the keyword argument. If the number of workers is one or fewer parallelization will not be implemented because you need more than one worker for it to actually function. If you're interested in the PR for parallel testing you can check it out on GitHub. Parallel testing is going to be a great feature for improving the speed of your test suite. I'm really excited for this feature and I hope that it helps you scale your application's test suite so that you can focus on deploying code instead of waiting for CI to finish. Now that we've got your test suite squared away let's take a look at the other way that Rails doesn't handle scaling well today and that's whether your application can handle a spike in traffic. I'm not talking about DDOS traffic. For that kind of spike you want to use an outside vendor or higher experts that can mitigate that for you. What I'm talking about is trying to scale your web application because Justin Bieber came along and started using it. You need to scale fast and you need to scale yesterday. I've been in these situations before and it's not fun. You're just waking up and starting work and suddenly Nagios is just yelling at you and you realize that your single primary MySQL database just isn't cutting it. There are too many reads and too many writes and you know your database is about to take your application down. It's at this point you know that you need to split tables off your main MySQL cluster. You need read and write and read only databases. You need replication and you need more capacity. Your single primary database won't cut it anymore. Rails has had support for multiple databases for a while in that you can do it. We do it at GitHub so it is possible. But it's hard, really hard. It's so hard that it took me three hours to build multiple databases into a demo app in Rails 5 and I already have done it before. I don't know how long it would take someone who's never worked on this kind of stuff. The reason it's hard is that Rails hasn't been setting a standard. While the underlying plumbing works great, we weren't exposing it through good documentation or the tools that you need to make it intuitive. So the second Rails 6 highlight that we're going to talk about are improvements to using multiple databases with Rails. While these aren't punchy brand new features like parallel testing, they're an important part of scaling your web application. Multiple databases are useful if your application is getting to a point where a single database instance can't handle all of the traffic reading and writing from it. If your application is going down because your database is crashing, you may need to split tables off into separate clusters or add read replicas to alleviate pressure on your primary database. At GitHub, we have multiple top-level primary databases that have write roles as well as read replicas for each of them. Rails has for a long time supported multiple databases and thanks to work from Arthur Neves and Matthew Draper, Rails 5 is a ton easier than it was in Rails 3 or 4. However, when I set out to build that demo app that uses multiple databases, I quickly realized there's a lot missing. Let's take a look at what an application with multiple databases looks like and along the way, I'll point out where Rails was failing to make it easy. Let's say we have an application with two databases. One is our primary database that has a flowers and people's table and the second is an animal's database with a cats and dogs table. By default, ActiveRecord knows about the primary database and connection because that's attached to ActiveRecord base. We need to tell our application how to connect to and talk to the animal's database. However, before we get started working on our application, we already have a problem. Best practices for multiple databases aren't documented in Rails. We have no documentation for how to set up your database YAML, your models and connections, your migrations or your rate tasks for the second database. Based on our setup at GitHub, I originally set up my database YAML like this. Here we have a two tier configuration with the default production namespace and a second namespace called production animals for the animal's production database. While this way may make sense or maybe the way that previous blog posts have said to do it, there's no longer the recommended way to set up your database YAML for multiple databases. If you're using Rails 5 or higher and need multiple databases, you should set up your database YAML using three tiers like this. The three tier config allows us to group all of our databases by their respective environment. This leads to a cleaner database YAML and it makes it easier to understand how everything works together. Primary is the default for your main database. This technically can be called anything, but I really just recommend you keeping it primary so it makes sense. This would represent your original production database. Then we have the animal's entry, which will hold all of the dogs and cats tables. With the three tier config, we know exactly what databases are tied to the production environment versus how the two tier config used to work. Don't worry though, if you're using a two tier config or if you're using a single database per environment, you don't need to change your config. It's just if you're using three, multiple databases. Once we have the databases created, we need to tell active record where the migrations for the dogs and cats tables live. So I've worked with migrations that deal with multiple databases before and there's a few ways to do it. You can hack your migrations to set the connection at the beginning of the migration. This is the way we do it at GitHub and I can tell you that I do not recommend it. So my preferred way is to put all of the migrations in the animal, for the animal's database in their own directory. It's cleaner and easier to group migrations by connection rather than putting everything in the DB migrate directory and having to open each migration file to figure out where the table lives. For this app, I moved the dogs and cats migrations into a directory called animals migrate. To use this way of handling migrations however, we need to set migrations paths any time we're going to call migrate. This works fine for the custom break tasks, but it's impossible to use with any of the existing code in Rails because the migrations paths is a class method or was a class method on migrator instead of a method on the connection instance. This means that every time we want to call migrate for animals, we have to set the migrations paths. So we've discovered the second problem with multiple databases in Rails. Rails doesn't really support migrations for multiple connections. You can write a database task that will run the migrations for you, but if you need any of the Rails internals that depend on the migrations path, they have no awareness of anything outside of the DB migrate directory. So the next step is setting up animals connection for the dog and cat models. First, we create a new base class. You can name it whatever you want, but I named it animals base. For this to function correctly, we need to set abstract class to true. This setting tells active record not to use the implied STI table from the parent class for the child classes. For example, we don't want the dog model to look in animals based dogs table, but we want it to look in the dogs table. Then we need to establish a connection to animals, and that's it. Now that our animals base is set up, we need to change our child classes that to inherit from animals base instead of application record. This tells Rails which connection to use when you call dog.new or cat.find to make sure that it looks in the animals database rather than primary. So while we've already discovered some problems with setup, this looks all pretty simple, right? The caveat though is that none of the rake tasks will work for your secondary database. The tasks only know about active record base and have no understanding of animals base. Create, drop, and migrate will only operate on your primary database and not on the animals one, so we're going to have to write all of those rake tasks for each non-primary database. Writing all of those tasks is tedious and time consuming. This is something that Rails should, this is something that you should not have to do. Rails should really be doing it for you. After I created all of the rake tasks, which was probably two and a half of the three hours that took me to set this up, I went to run the create task for the primary database and my application through an error, saying that the production database wasn't configured. I double-checked my rake tasks and even stashed all of my work except for the three-tier config to double-check that I hadn't done anything wrong. After some debugging I found that the default connections weren't working with the three-tier config. When you call rake DB create, it first loads the ActiveRecord initializer rail tie, which calls established connection with no arguments. This is where Rails will establish the default connection to the database. In a standard application with a two-tier config, Rails will assume that you want the database that corresponds with the environment. But when we have a two-tier config, Rails doesn't know which one is the default because there are two per our environment, primary and animals. So now we've defined four areas where ActiveRecord is making scaling your database very hard. There's no documentation, migrations don't work, database tasks don't exist, and the default connection doesn't even work. At this point I realized that multiple databases in Rails applications are still really hard. I know what I'm doing and I worked on like not even just multiple databases in an app but the plumbing of multiple databases before and I still couldn't quickly set up multiple databases in a demo app. Rails wasn't meeting the basic criteria to make scaling your database easy. The scope of my original project upstream some of GitHub's multiple database behavior to Rails had grown enormously and it started to look like I was going to have to do a massive yak shave to accomplish what I wanted, except instead of having one problem to solve I had three massive yaks to shave. Each of these problems was going to be time consuming and there was no easy way to solve any of them. It's easy in situations like this to get overwhelmed and frustrated. And believe me, quite a few times I had wondered why I'd even tried to upstream any of GitHub's multiple database behavior to Rails at all. But then I remembered that I proposed to talk to RailsConf and then they asked me to keynote and so there was no way I was gonna back out of that. So the first thing I decided to fix was handling for migrations. It didn't make sense to start with fixing documentation because shaving all of the other yaks was going to change how multiple databases work anyway. I chose to work on migrations because parallel testing needed to support multiple databases and I knew that the parallelized hooks were going to work better if I could fix the migration code. Previously, if you wanted to put your migrations in a separate folder you had to set the migrations paths any time you were going to call migrate and write your own fancy custom task. This is fine if you're calling migrate directly but if you want to rely on any of the Rails internals it becomes impossible to set that path since there's no way to tell Rails what migrations paths to use except for on the class level. This means that your app is always responsible for telling Rails where the migrations paths live because Rails can't ask where they are. It should be easier to work with migrations in different directories. I decided the best way to fix migrations was to store the migrations paths on the connection instead of the class so that Rails can ask the model where do my migrations live instead of having to be told by the app every single time it needed to run migrations. To accomplish this Aaron and I were a factor of migrations paths and you can now set migrations paths in your database YAML. All you need to do is change your app in the database YAML to set the migrations paths to the database, to the file directory that you created so we can set ours to the dbanimals migrate. If you don't set this field Rails will default to dbmigrate. This refactoring moves the information for migrations paths from the migrator class onto the connection. This change makes it possible for ActiveRecord to ask the connection what migrations paths it should use. Before this change this wasn't possible. This looks like a very simple change but required a medium size refactoring to accomplish. We drastically reduced the private API for the migrator class which allows for more flexibility in controlling where migrations are run from. It's also backwards compatible so you don't actually have to change anything in your app until you want to use this feature. Looking at refactoring PRs aren't fun for conference talks so if you're interested in the implementation you can check out the pull request on GitHub. All right so now that one YAC was done it was time to take on the next one. Changing Rails so that you no longer had to write all of the rate tasks for your new database. One of the hardest parts about multiple databases in Rails is that you have to write all of the rate tasks for your animals database yourself. This is extremely time consuming and something that Rails should really be doing for you. My goals with these improvements were to make the database tasks intuitive and predictable. The problem with the existing tasks where the code for dvDropMigrate and create would only operate on the primary database. That doesn't seem very intuitive. If I run create I expect it to create all of them. And there were no namespace tasks for primary animals respectively so even if I fixed the first problem there was not going to be any way to just run migrate on your animals database or drop on your primary. I wanted the existing tasks like rate dvDropMigrate, create, skip a dump, skip a load, all of those to automatically run for both the primary and animals databases. With a three tier config they were only running for the primary database. One of the challenges with dealing with a three tier config is that a lot of the code in Rails assumes that we have a two tier config and the config hash is tied to an environment. When you have a single database per environment Rails will simply select a single configuration hash that corresponds to the respected requested environment. But with a three tier config requesting a configuration hash by an environment will return the configs for that environment, return all the configs for that environment so we don't know which one is the default. With a three tier config we also don't know how many databases there are per environment or what their namespaces are. Configuration as hashes are an inflexible data model because Rails can't easily select the animals database for an environment because Rails doesn't know what the namespace of that database is set to. So to solve this problem Aaron and I decided that we should internally convert the hashes to objects so that we can ask them questions and more easily manipulate the data. We created a new class called database config. With this class we can pass an environment that would correspond with the environment and the first level of the hash. A specification name for the second level that corresponds to the namespace like primary or animals. And a config hash which corresponds to the configuration hash for that environment and specification respectively. The returned object looks like this. Now that we have this object we can ask it questions. What environment is this config for? What specification name is associated with this configuration? And what config hash is associated with the database? These objects are created by passing the entire database YAML hash to a method called dbconfig. This method walks all of the configurations in the database YAML and returns the database config objects. Now that we have these objects it's easy to manipulate the data model. We can collect the configs for an environment using a new method called configs for. This method will collect all of the configs for a specified environment by iterating over the objects, return and dbconfigs. If a block is given configs for or return a spec name that corresponds with the configuration hash and a corresponding configuration hash. If a block is not given the configs for or return an array of configs for that environment. So in the case of our application that we looked at earlier if we pass production to configs for we will get an array of database objects for the primary and animals production databases. By converting the hashes to objects we have more flexibility and control over the return data. We can ask the objects questions or enumerate over the return database configs with group by or find. This gives us the ability to more easily manipulate these objects. Let's take a look at how we use them in our to fix the database commands. In Rails 5 the migrate task was like this. It simply loaded the config and called migrate. Because it relied on the existing database connection this method would only migrate the primary database in a three tier config. To change this to migrate both the primary and animals database we use the configs for method to iterate over the databases and migrate them. First we pass the rails environment to configs for which will recollect the configs tied to that environment. When using a block configs for return the spec and config we don't need the spec name so we just have the underscore. Then we establish a connection using that return config and then we can call migrate which will migrate all of the databases. Because of the migrations paths refactoring Rails knows exactly where your migrations live so we can run the migrations for the primary and animals database respectively once it gets to that database in the loop. The changes for drop create and migrate commands are basically the same. Collect all the configs for the requested environment loop through them, establish a connection and perform the requested action. Now all of these tasks will operate on all of your databases. So now that these tasks operated in a predictable way it was time to build the dynamic tasks for each database in the configuration. I wanted the tasks for each database to be easy to discover and simple to use. To do this I decided the best way was to suffix the database commands with the database namespace. This way they're intuitive because they read like English hey database create animals and are easy to discover because they show up in the rake tasks list. These tasks are useful if you want to operate on just one of your databases for a specific environment. This would allow you to drop the animals database but not the primary or migrate the primary database and not animals. This gives you the same flexibility that you have in a same flexibility in a multiple database application that you have in a single database application. To accomplish this we collect all of the config names inside the migrate namespace loop through all of the specification names to create the namespaced task. Then using the spec name and environment we request the configuration with config for Envin spec. This will return a single database config object based on the environment and specification name. From there we establish a connection using the config so that we're attached to the correct database and finally migrate that database. This doesn't look complicated but trust me figuring out all of the required pieces and refactoring that we needed to do to make these tasks automatically create themselves took a really long time. I'm not going to show you all of the tasks and how I wrote them because they're all generally the same. For each database we create a command that loops through the configs, create the namespace of the task and then calls the methods for that task. If you're interested in the PR for the majority of this work you can find it on GitHub. Fixing migrations and adding improvements to the rake tasks drastically improves the experience for multiple databases and rails. These changes create an intuitive experience for working with multiple databases. There's still a ton of work to do to make multiple databases truly scale and rails but these changes are a big difference. As you know we still have one yak left to shave. The fact that default connections for three tier configs don't quite work. I've added a Band-Aid fix for this problem but I know that it is not correct and I'm trying to fix it the right way. After trying to force rails to work as is we've come to the conclusion that we need to spend time refactoring active records connection handling to be able to actually accomplish what we want. Aaron and I are working on changing active records underlying connection management to use the database config objects that we looked at earlier. This will allow us to more easily select the config that we actually want to connect to. This is a huge undertaking and connection management in rails has worked from a slightly larger yak into my personal white whale. One day I will defeat it but that work is neither complete nor interesting to look at. I hope the next year at RailsConf that I can talk to you about all of the features that will be possible after we finish this refactoring. There's a ton more work to do to make multiple databases in rails actually simple to use. I want to add nice APIs for rewrite splitting some performance improvements and that documentation that we talked about earlier. My hope is that the work that's been done so far and the connection management refactoring inspires more changes to scaling the database side of our Rails applications. I'd like to thank my friend and co-worker Aaron Patterson for working with me on these features. He started the work on parallel testing and helped me navigate connection management in Rails. I would not have finished any of these features if it wasn't for his help. So a couple of weeks ago someone asked me, aren't you concerned that Rails would become bloated with all of this scaling code? This is a valid point. Most apps don't need GitHub scale. But when I look at the ways in which GitHub has forced Rails to scale over the years, I can't help but wonder, what would Rails look like if we'd upstream these features five years ago? Would more apps be using parallel testing and multiple databases? Would we still be saying Rails doesn't scale? Or would we instead be asking, how can we make Rails scale even more? This past year at GitHub, I worked on upgrading our application from Rails 3.2 to 4.2. Through that work, I got to see all the parts of GitHub's stack from Git code to Wikis to Gist to pull requests to how Git actually works to our database infrastructure, ICI tooling, and our deployment code. I got to see all of the decisions made over the last 10 years, the good ones and the bad ones. That scaled GitHub to where we are now. I know that we're not the only ones who forced Rails to scale. I've worked on custom multiple database setup and parallel testing at other companies before GitHub. I know that Shopify has their own multiple database and parallel testing setup as well. I started to wonder why we were all built the same tools, but we weren't sharing with each other. And I think that comes from when we, the royal we, I'm not passing blame onto anybody specific here, think that our scaling problems are unique and special. This thinking means that when we're tackling problems in our application, we tend to look inward for a solution. You've got to fix that bug or that performance issue. We need multiple databases and Rails doesn't support it. Our engineers are complaining about CI time and we need a fix and that fix needs to be done yesterday. These kinds of emergencies cause us to spend time writing band-aid fixes instead of building a scaling solution upstream. When we're in a stressful situation, it's easy to say, let's just fix this now and we'll come back and build it right later. But later never comes. These patches have a way of taking on a life of their own and they grow and they grow and they grow until you don't know where your app ends and your framework begins. Our applications begin to resemble a game of Jenga, one wrong move and the whole thing comes down. And when that happens, we start to think that maybe they were right. Maybe Rails doesn't scale, but what's not scaling is the code that we wrote. Instead of building generic tools upstream, we build more and more on top of our fragile application. When we build tools while only looking inward, our code becomes specialized, rigid, and impossible to change. The tools we built stagnate because they don't benefit from community improvement. On top of that, we end up building the same tools over and over again. There's a better way though. We need to start looking at our problems from an upstream first point of view. The first step is admitting that our applications aren't special. If our applications are successful, we're all going to need to scale, so why are we hoarding the solutions to those problems? I've seen many companies reinvent parallel testing, hack together multiple database handling, and force their apps to scale inside their application rather than building a scaling solution upstream. I'm not claiming that upstreaming is easy, but when we see ourselves building the same patterns over and over again, it's time to ask, is this problem truly unique to my application? And when it's not, it should be upstream to Rails or built as open source. Upstreaming your code first has many benefits to your organization and your community. Upstreaming your tools forces you to write generic code. This prevents your application from becoming tightly coupled to your tools, which will keep your code more flexible and easier to change, less like Jenga or like Legos. Upstreaming scaling solutions means that you're training your future workforce. Your engineers don't need to learn how to implement multiple databases for your setup because your setup is everyone's setup. Rails will set the standard for scaling databases, so you don't need to redefine that standard at each new company you work at or each new application you build. And lastly, an upstream first mindset means that we're giving back to the community and that community gives back to you. When you push your code upstream, you stop porting solutions that aren't unique to your application, and others are able to use your tools and improve upon them. This is what makes open source magical. I know that by upstreaming multiple database handling that not only will other companies benefit from it, they're going to improve it for me too, so I can do less work later. Working by yourself in a vacuum isn't fun. When you share your experience and your code with others, you learn from their experiences. For years at GitHub, we've been hacking our application to scale, and it's time that we started giving back and sharing our experience with the community. In order to improve Rails scaling ability, I looked at all of the ways in which we forced Rails to scale at GitHub. I've worked on these tools before, but seeing them at GitHub scale really showed me how much Rails was missing. Over the last 10 years, GitHub has learned a lot from using Rails. We've learned where it stands up and we've learned where it falls down. And now we're giving back and upstreaming that scaling behavior because we know it's not special. By building these tools in Rails, we reduce the footprint of our application, train our future workforce, and give back to the community. I want everyone who uses Rails to experience that developer happiness, not just of the Rails new command, but five, 10, and 15 years later. But they can only continue to experience that happiness if Rails is providing the tools they need to run their application for 15, 20, and 25 years. So to answer the question, do I think Rails will become bloated with scaling code? My answer is no, because contrary to popular opinion, Rails isn't dying, it's maturing. If Rails wants to continue to be the choice for web applications, we, the Rails team, need to respond to the needs of those applications. We need to be built not just for prototypes, but for the long term. The path to maturing is to start building scaling solutions upstream so that our users don't need to constantly reinvent the wheel or worse, look elsewhere. Some may still think that Rails is just Fisher Price software, but like I said earlier, Fisher Price and Rails are both well-designed, robust, and unforgettable. We'll keep adding improvements to Rails to make it more scalable, more resilient, more unforgettable. We'll take your apps from Rails new to Rails scale. We'll keep improving Rails until everyone is saying, Rails does scale. But to do that, I need your help. I want you to go home and look at your applications. What have you done to force Rails to scale for you? What can you give back to our community? Can you make parallel testing faster? Do you have other things in your multiple database handling that I don't know about? Do you have other tools that you built that can be made generic and pushed upstream? Rails is and will continue to be defined by its community. Help us define the future of Rails. Let's make Rails 6 scalable by default. Thank you.