 All right, I think we can start. I'd like to start by saying thank you to all of you for attending this talk. It's my very first RailsConf and I'm very, very excited to hear, it's a great honor, so thank you so much. Thanks. I'd like to thank Shopify for sponsoring this event and letting me share the work we've been doing in our team. So today I'd like to talk and share with you on our process of upgrading Rails and how we were able to run a release and need the version in production. Our app is currently running on Rails 5.2, which was released last week, but since a month already we were running on Rails 5.2 release candidate two. I'll try to explain why this was a big milestone for us and how the open source community will benefit from it. But first of all, let me introduce myself. My name is Edward Sheen, Edward Sheen. I'm a software developer at Shopify and I work in our internal Ruby on Rails team. You can find me on GitHub as well as this very weird handle Twitter name. As a preamble, if you have attended Rails since last year or even if you have watched the recording on YouTube, you might have seen Raphael Frank has talked about how we upgraded Rails from 5.2 to 5.0, from 4.2, sorry, to 5.0. It was a massive project which lasted a year long. So this talk is not meant to be a repetition of his but more the continuation of our work, the improvement we have made in our process and the new tooling we have created to help us. Hopefully, at the end of this talk, you will be able to upgrade your application quickly and smoothly by reusing some of the idea we'll be talking about. If you haven't seen Raphael's talk, don't worry too much, although I definitely recommend watching it. I will be giving enough context for you to understand. So speaking about context, what is Shopify? Shopify is a cloud-based commerce platform. It has been running in production since 2006, so over 12 years now. And we power more than 600,000 businesses all over the world. As a merchant, you're able to manage your store across multiple sales channels. This includes web, mobile, social media, but a lot of other ones. Our motto is pretty simple, make commerce better for everyone. Our core application is pretty large and I think it might be the biggest Ruby on Rails application currently running in production. So this is a screenshot I took from our Github repository a couple weeks ago. As you can see, we have around 300,000 commits and more than 1,000 contributors. So quite a bit. Our application has a long history with Rails and our current CEO was already making news of the very first unreleased Rails version that DHH sent him over email. That was probably 13 years ago and it was a zip file. Since then, we have never rewritten our app and we keep improving the code base over the year as well as keeping any dependencies that we have up to date. And we have quite a few. Around 250 direct dependencies and more than 400 if you can in direct ones. As you might expect, upgrading Rails on a large application is not an easy task. We manage it to do it successfully over the years but it takes a lot of effort and resources. So this is a graph that shows the comparison between when Rails was released and what version of Rails Shopify was using. As you can see, we're not too far from each other. And one of the reasons is because since at least a couple of years, we had this idea of pointing our app on the head of Rails. So basically doing that instead of using a versionized gem. There is multiple advantages on doing that. The first one is that we want to get the new features, bug fixes, improvements as soon as they get merged upstream. But there's also another advantage that Rails and thus your application will directly benefit from. Let me explain. Our test suite is at the same scale as the monolith itself and contains more than 80,000 tests. By running Rails on a large application that contains that many tests, we are able to detect a lot of potential issues or regression that might have been introduced upstream. Of course, even if we have a lot of tests, that doesn't mean we are able to get them all. But at least we will be able to detect edge cases that might not have been discovered yet. So it's been around three years that we had this idea of using the master branch on GitHub. So why did it took so much time? In probably couple months, we will be able to achieve this big milestone. So let me give you some context on how we were doing things previously, the lessons we learned, and what is our new process. So when we start upgrading Rails, we prepare our application for dual booting. So if you have an app and you manage your dependency with Bundler, which I'm sure I do, you'll have a gem file looking like this. And of course, an associated gem file.lock snapshot. At Shopify, it's almost exactly the same, but we have a Bundler monkey patch. So I didn't bother showing the Bundler monkey patch because it's not clean, I admit it, but it works pretty well. There is another way to make it cleaner, but that's not the important part. You will probably want to focus your attention on these few lines of code right here. So these few lines of code will tell Bundler to get the dependencies from another snapshot whenever the Shopify next environment variable is present. So in the case of a regular application where dependencies are managed with Bundler, when you run bundle install, the dependency will get resolved from the regular gem file.lock. Whereas in our case, if we add the Shopify next environment variable, it's going to grab the dependencies from the gem file next.lock. So to summarize things up, we have two snapshots and we control which one get picked up by adding or removing an environment variable. That way we can run our application and run tests very easily and quickly on two version of Rails. The second thing we take care of when we upgrade Rails is to fix any issues that might happen during the booting process. Anything from broken initializer, broken code that was ego loaded, anything like that. The main goal for us is to be able to run our test suite. And when I say running our test suite, I'm not saying that the tests are going to pass, but at least they need to start. Because at this point, we know that a lot of tests are going to fail. And when I say a lot, it's literally thousands. And the last step, which takes probably 95% of the total time to upgrade is to fix all the issues. And that's where the fun begins. So until very recently, a small team was taking care of fixing all the problems. And while we were fixing all the things, the train was still moving. And developers that were working on the application were merging code that was working fine on current version of Rails, but broken on the next one. As a result, this delayed even more the upgrade process. So that's one lesson we learned. We did not stop the bleeding. One solution to that problem will have been to add a new CI check. So basically instead of testing code on one version of Rails, testing on two. One on the current version of Rails that is running in production and the other testing on the next version of Rails. That's actually what we ended up doing, but only after we fixed that all the tests. So basically after a few months is, we enabled CI and we stopped at the building at the very end when there was no more bleeding. And the reason why we could not enable CI at the beginning was because again, we had too many failures. Having a CI that is constantly read is not very useful. If a developer push code that is not working on Rails neck, there is no way for us to differentiate because we had so many existing failures. So this will give you context on how we were doing things previously. And as you can see, the main pain point was about keeping track of the issues and stopping the bleeding. On our side, we thought about the problem and we realized that we'll never be able to achieve our long-term goal. And again, it's to point our app on the master branch of Rails. Because one upgrade cycle was taking way too much time and effort. It was taking as much time almost as the Rails team to release a new Rails version. So we thought about the problem. We decided to create a new set of tools that we ended up trying during the 5.0 to 5.1 upgrade. This was a few months ago. And it turned out that this upgrade was actually the fastest and the smoothest we had ever done. So how does that work? It started the same way. We prepare our application for dual booting. Then we fix any issues that might happen during the booting process. And lastly, here comes the new part. We enable CI right away. We want to stop the booting at the very beginning. But I just said one minute ago that enabling CI at the beginning was not possible because of the amount of failures we already had. So our idea was to make CI fail only when new broken code is introduced. Any existing broken test will be marked and allowed to fail. So this is a concrete example of what I'm talking about. There is a test. It fails for any reason. We mark it. And even if it's failing, it's gonna pass. I'm going to explain how these work and I'm going to deep dive a bit inside our implementation because that's one of the things that make possible that upgrade to be very fast and smooth. And as I explained how it works, the first question that might come to your mind is what's the difference between marking a test and just skipping it? I will be happy to answer that question right after. So as you can see, we use the markers declaration to mark a test. And this syntax might maybe remind you about rake task where you define a task and on top of it you can add a description. Of course, that description only applies to that specific task and doesn't link to others. In our case, it's the same ID. We mark a test and we want the mark to be applied only on that specific test, not to any others. So the markers feature comes from a little module that we created. It's only a few lines of code so don't expect too much from it really. By itself, it's not what make the test to pass. However, it helps us marking tests with various tags. So in that case, marking tests that are failing on Rails Next. But we also use it for different things such as marking tests that are too slow or things like that. I'll explain how we make use of Jitsmark after. But first, let's have a look at our marking module. There is a bit of meta programming. I'm going to split that into small chunks but nothing to be really afraid of. The first thing we do when this module gets included inside a test class is to create a class variable called metadata. And this metadata is simply just a hash where the keys are going to be any tags that was passed to the markers method and the value is going to be an array containing the name of the test that were marked with these tags. Then for each tags, we create macros. That's just a convenient way to help us check if a test was marked or not with a tag. And lastly, we create a hook. So the metadata hook is not a popular feature in Ruby. It's not commonly used but it works exactly the same way as any other popular hook in Ruby such as included, inherited, extended, et cetera. The metadata hook gets triggered as soon as a method gets defined. So in our case, when the markers method gets called we define the hook. That hook gets triggered by Ruby as soon as the next test gets defined. Inside the hook body, we fill our metadata hash. And lastly, we remove the hook. We remove the hook because we don't want any other test to trigger it because let's say you have a test that is not marked. We don't want it to trigger our hook. And each time the markers method gets called we repeat the entire process. So this is an example of a concrete example. This is a test. We include the marking module. Then we have two tests that are marking these various tags. In the background, that's what the metadata hash would look like. So the keys are the tags and the value is an array of the test names. And lastly, that's how we can check if a test is marked or not. All right, so that's cool but that might confuse you because it's not helping us much here. And that's true. So here's kind of part where we make use of these marks. And for this, we use a mini test reporter. So I sadly won't have enough time to explain in details how mini test works. But if you have seen Ryan's talk yesterday about mini test, I'm sure you're an expert. But I'm sure that you have seen these output if you have ever run a test on mini test. If you use RSpec, our tooling is mostly around mini test so I'm sorry, I'm sure you will be able to create your own implementation. So there is basically two reporters that ships with mini test by default. The progress reporter, which allows you to see the progress of your tests. And the summary reporter, which outputs a summary of your tests after all the tests have run. And you might wonder how the reporters know about your tests. So basically, when mini test runs, each test is going to store the result of each test inside a result object. And that object contains information about the run, such as the number of assertions, the timing two to run, but also the failures here in our case. So you might have guessed the result object get passed along to all the reporters one by one. And when the reporters are done outputting all the information, mini test is going to exit. And it's going to exit with a shell exit code, regular shell exit code, zero for success, anything but zero for failures. And that's what your CI will look for to determine if your script succeed or failed. So now that we understood that, let's have a look at the reporter we created. It's called the Rails Next Reporter. And the basic idea of this reporter is to mutate the result object, that same result object I was talking about that contains all the information about the run, and clear the failures. So first, the after test method gets triggered by a hook and it's passed in argument an instance of your test. Inside the method, we check if we know that the test is going to fail. So first, we check if we're running on Rails Next because if we're not, we don't want to alter the result of the run. Mark has failing on Rails Next to only be applied to Rails Next. And then lastly, we check if the test is marked and that's where we make use of the marking feature. If both condition are met, then we simply just set a flag, like it's allowed, it's simply just a string. Then mini test will call the record method and that record gets passed the result object, that famous result object. Inside the method, we check if we set the flag to a load and if that's the case, then we simply just clear the failures. The failure is simply just an array, so we remove everything. So what's gonna happen next is that that same result object that we just mutated is gonna get passed along to other reporters one by one. And all the reporters won't see any fails because we cleared it already. So they're gonna think, okay, the test is great and that's what we want. Now I'll explain why, oh actually, one last thing is that the after test hook, so here, these are work around because inside the record method, we don't have an instance of the test anymore, so we cannot check if the test was marked or not. Now I'll explain why we don't use keep over mark as because so far it's exactly the same. If a test is failing, just keep it. It's the same as marking a test succeeded. So when someone is going to work on fixing a test, it's gonna modify something in any files. It might be inside views, inside controllers, inside models, anywhere. And almost all of the time, the chain is gonna do, we'll not only fix the test he was intending to fix, but a lot of others. And if we skip at all the tests, there will be no way for us to tell which tests are actually failing. So if a test is marked as failing on Rails Next, but does not contain any failures, that means it should not be marked. And the only way for us to make sure that the tests are unmarked is to make CI fail and explicitly tell to unmark the test that are not failing anymore. So this might sound an aggressive way of doing things, but that way we can ensure that we have a very accurate track of test that needs to be fixed and the progress we have done. So I was just saying that a change will fix a bunch of tests, but it's actually way more than a bunch sometimes. So this is a small change that literally fixed at 200 tests. Of course, unmarking every test manually is not feasible. It's too tedious. So we created a simple script to automatically unmark all tests that are not failing anymore. I won't bother to show you because it's not very important, but just as a heads up. So enabling CI at the very beginning was very helpful to us. Not only we stopped at the bleeding, but because developers had to have a green CI in order to merge the pre-request, they had to fix their code in case it was broken. And again, because that change usually propagates to other tests, they were helping us indirectly. Another thing that made the upgrade super fast is workforce. The more people involved, the less time it takes to fix all the issues. Around a year ago at Shopify, started a project called Componentization. So that project is not directly related to the Rails upgrade, but helped us by a lot identifying code owners, or as we prefer to call them, Codes T words. Componentization is kind of a big topic. I won't have time to expand, but essentially the main goal was to make the development on our app more productive. It's here to help with code organization as well as code design. But we really care about code organization in our case. So this is a concrete example of what componentization are. This is the structure of our app before and after componentization. As you can see before, it's more or less a regular Rails application. You can probably recognize most of the folder names. Whereas now there is a new component folder and inside it you can find every components. And the structure of each component is very much like a small Rails application itself. Inside it you can find anything from models, views, controllers, but also tests. So to not make this screenshot too big and for the purpose of this talk, I remove it all component, but just one, the access and us, but we have more than just one. We have around 30 components. So how does that help us? Well, we now have a natural way to identify which code belongs to which component. And thus, we decided to count the number of failures per component and created a good old spreadsheet out of it. Then we asked every component steward to help fix issues that was in their component. And lastly, every week, we were updating the spreadsheet to reflect the progress we had done. And without doing this deliberately, gamification took place. And stewards with their team worked very hard on having their components all green. Having so many developers be involved in the upgrade process was phenomenal. It also helped them understand where their code was broken and what was new in the framework. So not only they were helping us, but by doing that, I think they learned a lot. So our test suite is now entirely green. And I really mean it, not like it's green, but it's red, but let's make it green with our marking feature. What's next? Well, we can deploy. But first, we had an idea that we wanted to try out. And it turned out to work pretty well. We decided to enable deprecation logging in production. Because by default, when you create a real application, deprecation doesn't get locked. Because since we did not have any more failures, that means we should not have any more deprecations. Because if we do, that means the code will probably be broken once we deploy. In our test environment, this was the case, we did not have any deprecations. But we were worried about untested code passing production. And it was the case. So by enabling deprecation logging in production, we identified which code was not tested. We fixed the issue and we added test coverage. As a result, when we deployed Rails 5.20 in production, there was zero new exceptions related to the upgrade. If you want to reproduce this in your app, it's pretty simple. You just have to turn this configuration to log. However, if you do that, be warned that you might clutter your log with deprecation warning. In our case, I knew that some requests could literally trigger hundreds of deprecation, the same. Probably because of a code path that was getting hit multiple times inside a loop. So if you're worried about that, here's a very quick tip for you. Instead of sending the configuration to log, set it to notify, which is the default configuration. And notify will basically send a notification whenever a deprecation gets triggered. And of course, you can subscribe that notification. In the background, it uses active support notification. So if you have never used that feature, it's quite useful. That's our subscriber. It's very simple. What we do is simply add inside an array the deprecation, only if the deprecation does not exist yet for that request. So we want to deduplicate deprecations. So we have actually two subscribers. This one that takes care of adding every deprecation inside an array. And the next one, which I will show in the next slide, that takes care of logging everything. As a heads up, if you have never used a subscriber before, you need to attach it to a namespace. In that case, it's the Rails namespace because the notification is called Rails.deprecation. The Rails part is the namespace and the deprecation part is the notification itself. And your subscriber needs to implement a method that has the same name as the notification name. Then we have another subscriber, which is attached to the process action. And that notification gets sent by Rails as soon as a request hits a controller action and that action is done processing. At this point, we know that there won't be any more deprecations getting triggered. We iterate over the array of deprecation. And lastly, we clear it to prepare for the next request. All right, so we are now fully ready to deploy Rails Next in production. However, we want to take as less risk as possible, just in case something is gonna break, we never know. And for that, we have our rollout strategy to deploy Rails Next only on a small percentage of our data centers. By doing that, we eliminate the risk of a complete outage on our platform. If something is gonna break, only a small subset of our customer will be impacted. So this is our bot that post and Slack whenever Shopify is getting deployed. And as you can see, it also tells the number of the percentage of server that are going to serve a request on Rails Next. We usually start with a very low value, like one or five percent. That way we can see if there is any new exception coming in. And if everything goes well, then we incrementally increase over a few days. After deploying, we also profile our application and we get back a sampling call stack. We profile actually twice. Once for servers that are running on Rails Current and another time for servers that are running on Rails Next. We then compare each call stack to make sure that there's no new code that take too much CPU resources. The main goal is to check for performance regression. After a few days, when all our data centers are running on Rails Next, we can get rid of our Bundler monkey patch. And finally, we can open the champagne because we officially upgraded our application. So there is one last thing that I did not mention, but it's important when you upgrade your application and it's about deprecations. If you remember, I was saying that fixing all the problems is what takes the most time. But there is a way to make this step even faster because a lot of this problem can actually be avoided upfront if you take care of addressing deprecation as soon as I get introduced in your code base. The less deprecation you have for a gem, the easier it's gonna be to upgrade it. The decision we took on our side is quite simple. Introducing new deprecation in our code is not a load. CI will fail otherwise. Of course, if you introduce, if the reason to add deprecation is legit, I'd say we are trying to deprecate our own code or we just upgraded the gem that adds new deprecation, then we want to allow them case by case while we work on fixing them. Most of the time, deprecation get output in the console, but that's usually not an efficient way to keep track of them because here, I'm sure you cannot tell how many deprecation there is and so do I. So if someone adds deprecation, without doing this purposely, there is no way to tell. So to help with this, we created a deprecation toolkit gem and that gem works by recording all the current deprecation that you have in your system and for a given test and record everything inside YAML files. The next time tests are going to get run, the toolkit will compare the deprecation that were recorded against the one that were actually triggered. If there is a mismatch, that means deprecation was introduced. So this gem is basically a way for us to stop the bleeding while we work on fixing existing deprecations. If that's something you are interested on, we are going to open source this gem in a couple of weeks. So by reusing all of the idea I mentioned in this talk, we drastically reduced the time needed to upgrade Rails, but it still takes some time, around three to four months. And at this point, it's going to be very hard for us to reduce this time even further. So at the beginning I was saying that there is multiple advantages on pointing our app on the head of Rails. And the main motivation for us was able to upgrade way more quickly, way more frequently, sorry, but each upgrade will take a negligible amount of time. So the idea is to, instead of doing a big upgrade that takes months to complete and requires team and a project and progress tracking, we want to be able to upgrade every couple of weeks and do it a routine. The Rails team is usually pretty good at making sure no bugs are introduced in the framework, but as every software there will be bug. So there is a slight risk for us on pointing our app on the head of Rails because our application will be one of the very first to get to try out every commit that gets introduced upstream. We evaluated the risk and we are confident enough in our tooling and in our test suite to make sure that there won't be any big problems. The risk we take versus the advantages we get is definitely worth a shot. It's also a great way for us to contribute back to the open source community, not only for the Rails project itself, but the Rails ecosystem because if we find any issues in one of the dependency we use, and again we have quite a few, then we will be able to report, fix the problem so that other application won't encounter it. So this was our journey to upgrade a gem and I'm purposely saying gem and not just Rails because everything that I talked about can be actually applied to any gem upgrade. So if you're planning to build a dependency that requires a lot of work and need a way to do it progressively while stopping the bleeding, parallelize the work among your team, take, keep track of the applications and so on, then I hope you will still have in mind this talk. Thank you.