 And I'll be talking about how we deploy Shopify. Just a little bit about me. I'm a developer at Shopify. And actually, today is my one-year anniversary there, which is kind of cool. Yeah, so I started one year ago today as an intern, and now I'm speaking to all of you at RailsConf, so that seems kind of crazy to me, but I'm really excited. So on with the talk. A little bit about Shopify, if you don't know. Our mission is to make commerce better for everyone, and that includes anywhere. So Shopify powers online stores, but it also has a point of sale system, lets you put buy buttons anywhere on the internet, and basically is just trying to make commerce better for everyone. Over 275,000 people have Shopify stores, and in total they run over $17 billion worth in sales. So that's a lot of people and a lot of money going through the Shopify platform. As a platform, we're able to do 25,000 requests a second, which is super important for all these stores and all this money. We have some pretty famous people that are on Shopify that run a lot of flash sales that create a lot of traffic, some of those people being like Kanye West, Kylie Jenner, and the like. Shopify as a Rails app is one of the largest Rails apps in the world and has been around for over 10 years. So that's a lot of code and a lot of code that we deploy daily. And what we used to deploy Shopify is a little something called Ship It. Ship It's actually open source, so if you go to github.com slash Shopify slash ship it dash engine, you can see it there. And what it allows us to do is on GitHub, when you make your PR and you merge it to master, it shows up here in Ship It. It has some checks there that like the container build is successful and so on. And once all those checks pass, you can then deploy your commit. And when you deploy it, you get a screen that looks a little something like this. So on the side, you have like a visual representation of the deploy. And then in the middle, you have the whole record of the deploy and what's happening. So on the side, you'll see there's little boxes labeled SB and a number. And those are our servers and the SB just simply stands for Shopify Borg. There's about 200 of them and the five dots in each server represent a container. So the green dot is the container has deployed the new revision and the blue dot is that it's currently switching revisions and the gray ones we haven't got to yet. So you can track your whole deploy here visually. And at the bottom, you'll get a nice message when it says that your deploy has succeeded or you'll get another message if it hasn't. And we'll talk about all those messages today. So a couple fun facts, we deploy Shopify on average 30 times each day. The highest we've ever done it was 41 deploys in one day. So that's a lot of deploying. Also, because we use Shipit, basically anybody who makes a change and makes a PR can deploy Shopify, not just developers. So we have lots of people deploying Shopify every day. And it takes about four minutes to get through the whole deploy and have your code be out in production. Which is fast and exciting but also really terrifying. So what I'm gonna talk about today is sort of what happens when you press that little blue button in Shipit and send your deploy out into the world. But also what happens when people make requests during deploy and anything like that that can happen. So back to that little blue button. So deploying Shopify, all it really is is pressing that blue button and monitoring how things go. But what's actually happening is that deploy button triggers a Capistrano deploy script which then runs. And that script, the first thing that it does is it takes the new revision shot and puts it onto that host server. So I think I already explained it, but SB1 is the Shopify Borg and that whole box drawn there is the host and the server. So once the new revision is into that revision file, the supervisor daemons are then started. And their first thing that they do is they take the new shot written in that revision file and start all the containers. So the containers, each one has five containers. I'll explain why in a bit. But each container started one at a time. So the first container started first and once it completes, we go to the second container and so on. So the little green outline in the solid arrow is just like my way of noting that a deploy has finished on that container. Whereas a dashed arrow is we're switching revisions on that container. Another responsibility that they have is to check the exit status of the deploy. So there'll be three different statuses on the deploy. And those statuses each mean a thing. So we'll go into them now. So one is you can get a success exit code and that's a successful status. And if there's no more remaining containers, your entire deploy is successful. And things are good to go. You can also get a successful exit code on that container. But if there are more containers, you just continue through the process of the supervisor daemons restarting the containers that still require that new shot and repeating until each container has switched to the new revision, successfully, and there are no more containers. Or unfortunately, what if your deploy fails? That'll be a different exit code. And there's two options for deploy failing, so that the revision is flapping or that the deploy simply failed. So what does it really mean that a deploy is flapping? It's just kind of a term that I don't know. It's hard to figure out what it would mean. So what it really means is that here's an example of a server and it started with revision A. So all five containers had revision A and then we deployed revision B. So container one successfully got the new revision and everything looks great. But then on container two, we tried to change the new revision and that deploy failed. So now we have three containers that are running revision A, the old one. One container that's running revision B, the new one. And the second container that we have no idea what it's doing because it simply failed. So that's what it means when a deploy is flapping. But it's running multiple revisions at the same time. And that is not something that's desirable at all. And the solution for this, and what we do is we just restart the application. So similar to a deploy, but different. And we can just do it by pressing a button in ship it. And what it does is it restarts all the containers. But I'll go into more detail in a bit. So what happens if a deploy failed? So what if we deploy and it just like simply fails? It never switches revision. Now we have one mysterious container and the rest with the previous revision. That can happen if the container is just simply down. Like if we can't reach the container for whatever reason it's down, we can't restart it, it won't start up, then a deploy will fail. You can do the same thing to just restart your application. And this will restart all the containers and try again. So I mentioned a bit earlier that Shopify runs and has a lot of very big and successful merchants. And why that's important to deploy is that these merchants drive a lot of traffic to the platform. So whether it's a flash sale with a new product or just all the shops having Black Friday, Cyber Monday sales, a lot of traffic gets to Shopify. And one of our biggest Shopify plus merchants is Kylie Jenner. She's the one third, third in, and she sells lipsticks. And she sells them online with Shopify. She first started, she only had three lip kits in three different colors. And that's what she sold, but she drove so much traffic for everybody trying to get them. Now she sells multiple different types of lipstick, lip gloss, and different colors. And so you can imagine every time she has a sale, she just drives so much traffic. And that's, that can affect deploys. So we'll talk about that. So when somebody wants their lipstick from Kylie Jenner, they make a request. And you guys all laugh, but a lot of people want these lipsticks. Like head over to the internet, look up hashtag Kylie Lip Kit, you can see how it goes down. And so they all want these lipsticks. They make a request, it goes to the internet. The request gets routed by the load balancers to a server. And then that request, once it gets there, will be sent to a container. So if that container is switching revisions during a deploy, that container will not accept that request. And that request will have one retry and be sent to another container. And if that container is free, so not switching revisions, it doesn't matter if it's running the previous or the old revision, that container will serve that request. If a container is serving a request and then the deploy tries to switch its revision, that container will lock. It'll serve all the requests it's currently gotten. And then once it's finished, it will switch revisions. And any requests that are sent to that container in the meantime, will be sent to another one. So each request only has one retry. So if it gets denied, like if that container says no, then it gets one retry to go to another container. And this is because if you send a malicious request at worst, you've blown up two containers, and that's all. So sometimes we wonder, should we lock a deploy, lock deploys during a sale? And the reason we wonder this is when one of these five containers are switching revisions, we are down 20% in capacity. So one out of five, 20%, and that container can't take requests. So sometimes we think like, oh, if this sale is gonna be huge, or if we don't know what the sale is gonna be, maybe we can lock deploy, so we're at 100% capacity. Most of the time we don't, and we continuously deploy throughout the day and let people have their sales and everything is usually fine and dandy. And then we think like, what if we had more containers? Like if we had more containers, could we then deploy with even more certainty that nothing would go down because we're losing capacity because we're switching revisions? We're not, so here we have one server that has five containers and another that has 10. So if we were to have 10 containers, so double our amount, we would then have to switch revisions and deploy in parallel. We'd have to do two containers at a time and doing two containers at a time. So two out of 10, we're still down one out of five, which is still 20%. So even though we increased more containers, we still have the same downtime in capacity, which is why we only have five containers. And the reason why we chose five containers and not less is because we're okay with being down 20% of the time, like Shopify can handle it during a deploy, but we're not okay with being down like 25% if we had only four containers. And the reason we don't go up to 10 is because it would just add more complexity. Now we're running these containers and switching their revisions in parallel and trying to keep track of everything while still doing a deploy. So then we wonder like how, what if you had to deploy during a sale and you still knew you couldn't be down 20%? Like you couldn't lose 20% in capacity, but you still had to deploy. One thing that you could do is only deploy on half of the servers. So this SV1 and SV2, we have 200 of them. Like let's say you only deploy to 100 of them. So you're only deploying on half. So you're down 20% but only on half of your servers. So that means you've only lost 10% capacity. And if you want even less downtime, you just deploy onto a quarter and then you're only down 5%. So these are some frequently asked questions we have about deploys. So as I mentioned before, sometimes the solution is to restart the application. And what does that do? So when you press deploy, the Capstrano script runs and it puts the new revision SHA into the revision file on the server that the supervisor demons then get and then they restart all the containers with that file or with that SHA. So when you do a restart, they already have the new SHA, so they know what to do. So all they do is they start at the step of restarting the containers. So they just restart all the containers that still require the new SHA and continue on with the deploy from there. So some things can get interesting. So because we use Shipit and we just have this UI with buttons and we let anyone deploy Shopify, in fact you do it in your first few days of onboarding, a few things can happen sometimes. So what if you deploy the current revision again? So this server has all containers that originally started with revision A and now we've deployed revision B and then halfway through the deploy, someone decided to restart. So they press restart, like what's going to happen. So what will happen is that second container is already switching revisions from the deploy, which is all great and fine, but now another container in this example, the fifth container is also now switching revisions because of the restart. So all that means is that now, instead of only losing 20% of capacity, we've now lost 40. So if a request came in at this point, it can only go to three of the containers as opposed to four or five if there was no deploy or restart going. Usually we don't stop that, we let that happen because our platform can handle it, but you are down 40%. So like maybe if there was a crazy huge flash sale and all of a sudden we got so much traffic, maybe Shopify would go down because we're already down 40% capacity. But even worse what can happen is if you deploy a new revision. So you've started with revision A, you deploy revision B and someone comes along and deploys revision C. So that's a third revision. So kind of what happened in the previous example, another container will start switching revisions, but now it'll switch to a completely new third one. So right now you can see in this server we have three containers running revision A, one running revision B and a third running revision C. So not only are you down 40% in capacity because you have two containers switching at a time, but you also have three different revisions going which is gonna result in a deploy being tagged as flopping. Yeah, and at worst you could have five new revisions. Like what if you just keep deploying before the deploy finishes? So you deploy once, deploy doesn't finish, you deploy a second time, a third time. Like you can get into a situation where your server is running A, B, C, D and E all at the same time. So I talked a little bit about restarting and deploys and I just wanna highlight the difference between them because they are very, very similar. So a deploy gets a new revision, whereas a restart is using that current revision. They both restart the supervisor demons and that's fine. They'll go to restart the containers with the revision that they have. But another difference is that the deploy downloads a new image whereas a restart already has the current image. So usually a deploy takes a little bit longer than a restart does just because of that simple fact. Now this slide requires a little bit more context. So at Shopify, we have two data centers. One that we keep passive and one that we keep active. So the active data center is the one where we send all our traffic due and we deploy to both of these at the same time always. And the reason that we do this is that we want the passive one to always be as up to date as it possibly can with the active one. So if we have to failover, we know what kind of state that other data center is and which revision of Shopify it's running. Like it would suck if we failed over and it was running Shopify from a week ago or even a day ago because so many changes go into Shopify. Like one day does make a big difference. So when you deploy, you're deploying to both data centers at the same time. And if it fails in the passive one, we almost rarely ever do anything. We just like let it go because we know there'll be a new deploy soon and that deploy will restart all the containers and bring it all up to the same revision. So even if containers were running different revisions in the passive data center because we're not sending traffic there, it's all right. In the active data center though, if a deploy fails or it's flapping or there's failure or anything like that, we do need to take action on that because that's where we're running all our traffic. And normally what we do is just restart the application and monitor that. Thanks, that's my talk on how we deploy Shopify.