 I'm gonna be talking about a bit of a journey of the Shopify infrastructure This is the kind of talk that I would love to have seen from companies like Twitter Facebook or some of the other companies about how their infrastructure got to where they are today and Google has some of these talks where they talk about what Google looked like in 2001 and what it looks like well today It's so complicated. I think they barely understand how it all fits together But it was really interesting to see how their infrastructure evolved So what I want to talk about is as ratio said the past sort of ten years of a very specific history of one platform And how we got to the scale that we're at today so I work at a company called Shopify and What we do is we help people sell things whether you're selling them online or you're selling them on your phone or wherever and in the past ten years we've grown a fair amount and I Want you to keep these numbers in your in the back of your head while I give this talk because it's quite applicable for everything that happens We do tens of thousands of requests per second. We can do a massive amount of checkouts. We serve a We've served a lot of of transactions a lot of money downtime is very costly for us and our merchants and In the past couple of years. We've also grown tremendously in terms of our the number of shops that we serve and We have one particular problem that sort of dictates everything that we do on the infrastructure team at Shopify Which is what we call sort of the flash sale problem We have these merchants who can drive a completely disproportional amount of traffic to us and this creates Massive problems for us. It's things like from one second to the other There might be a 2x 3x difference in the amount of traffic that we have because there's some launch of this guy's shoes or We have Super Bowl ads that drive a massive amount of traffic And if you've ever tried to visit one of the websites doing the Super Bowl and for all the Europeans It's a big North American event. That doesn't matter that much to us If you try to go to some of these websites doings is Super Bowl. It is they're always They often have problems and we have tons of merchants that go into Super Bowl every year at this point not tons But one two three I think three last time and then like every good story And this is the story of our infrastructure. There's a villain and currently our villain is Kylie Jenner She's an American celebrity sprung out of the Kardashian family family, which is famous for being famous and she sells lipstick and when people can't access this lipstick they get fairly mad and This is a big problem for us It hurts our merchants trust because with Shopify it's sort of a white label experience Their customers has no idea that it's Shopify's fault that they're down due to their scale And not Kylie's fault. So she gets all of that and then she gets mad at us and we try to fix it I think there's even episodes of the Kardashian reality show where Shopify is down and they're filming. It's it's a mess You can you can find it later. I'm sure So really we're dealing with these problems of scale We've grown a lot in the past 10 years and I want to tell sort of this story of the evolution of the Shopify infrastructure and what we tackle and how we're getting to the point that in the next two months We're gonna be running out of multiple data centers, but 10 years ago. We weren't even thinking about that stuff so The way I think about Shopify is sort of on the spectrum from a single tenant platform to a multi-tenant platform it's cut off a little bit here and When Shopify started or on both sides of the spectrum there are sort of Benefits if you're a single tenant platform, it means that you're running just one shop There's one database per shop on a multi-tenant platform into most naive implementation You have one database for all of your shops and both of these approaches have sort of their own benefits, right? Okay, so I have to be the guiding hand here a little bit on the sides here So we have single tenant on one side multi-tenant on one one side and these this is sort of a spectrum if you have a Completely single tenant solution with one database Then you have a you have some advantages and if you have a multi-tenant solution where everyone runs on one database You have some other advantages So there's really a bunch of spectrums hidden in this if you have a multi-tenant platform There we go if you have a multi-tenant platform you have a large amount of capacity you have a massive database So if one shop Has a big sale they can sort of steal a bit of capacity from the other ones And then later when another shop has a big sale they can take some from some of the other shops But if you have a complete single tenant platform that becomes quite expensive because you need to provision a database for each one This brings us has sort of the secondary benefit of having good utilization If you have a big multi-tenant solution with many day many stores running on one database Then you can utilize that database and the hardware that you have better by sharing resources between the shops Whereas for in the other solution of having just one store per sort of set of hardware you don't have that and this sort of gives us a platform that is good for flat sales and multi-tenant and The single tenant is not quite as good for flat sales because you have this problem of a single store taking a massive amount of traffic Will be very costly per shop And there's some companies out there that are really good for doing stuff like that the single tenant solution Then there's things like the multi-tenant solution is just kind of cheaper You sort of leverage the network effect of having all these shops to sort of take costs from each other And on the single tenant side you it's expensive you have to care about hardware for every single shop We have 300,000 shops 300,000 databases. That's way too many to manage Then there are some benefits on single tenant like you have complete isolation if Kanye shop Completely breaks it doesn't break anyone else's shop. Whereas in this in the multi-tenant this this sort of naive multi-tenant solution You have that problem Scalability is also great for single tenant because you can just make that one database bigger if that shop grows big enough and so There's some of these don't even have Sort of a green end There is these are the extremes of a spectrum and in the middle of this spectrum We have some kind of middle ground that tries to harvest event the best of both worlds And this is sort of what I've been exploring for the past three years of how do we guess get the best of a single Tenant platform and the best of a multi-tenant platform in one platform I refuse to accept that we just have to be multi-tenant and just take all of these things and then give up all the things on the left I want to get the best. I want the entire D2B green so over the net over this talk We'll talk about how to achieve that So if we go back to the spectrum in 2004 our founders created this snowboard store And that was the first Shopify store they created a platform to sell snowboards and it looks something like this classic like mid-2000 web design and So we launched that but they realized that their business wasn't really in selling snowboards in 2004 e-commerce was kind of finally reviving after the dot-com bubble and There was a severe lack for good customizable e-commerce software out there So they sort of pivoted and two years later They created a multi-tenant platform to support many snowboard stores many stores in general that could That be and then Shopify became a multi-tenant platform instead of this Ruby and Rails app hosting just the one store It could now hold host well as many as it could until the database broke So Where is that go to middle where where do we get the best of the single tenant world of just running that one snowboard store? That can never impact any other store and then having that big platform that supports all of them So from 2006 to 2012 we moved a little bit in on that spectrum We started doing a lot of work Here and there to optimize the application to get a little bit more of the benefits of the single single tenant side and a little bit Fewer of the drawbacks of the multi-tenant side So for the first six years of Shopify's history we did sort of what what most companies did We focused on our product and every couple of months someone would go and optimize the performance There's a lot of low-hanging fruit if you did if you did a profile on a application You might be able to get five ten percent of your CPU time back for some dumb n plus one somewhere or Spending a lot of time just doing operations that you don't need to do you can do a lot of vertical scaling meaning you can just increase the size of Your database cluster increase decide the number of workers that you have and all these kinds of simple things You can do a lot a lot of caching our application is very cash heavy Because we have a lot of sort of just traffic onto stores that don't do right operations So we did a lot of this sort of stuff for the first six years But then at some point you reach a limit and we reached that ceiling in 2012 And there's a legendary picture of our CTO past past out place planted on the floor after having worked for months to try and support one of the biggest stores at the time and We reached a point where We couldn't vertically scale anymore because at the end of the day you can't cash a right You have to scale your rights at some point horizontally So in 2013 we started working on database sharding and this is when a full-time Infrastructure team was was created at Shopify because we had to start working on database sharding and scaling horizontally And only the mysql side was was sharded so before our infrastructure looks something like this We had some load balancers some workers that did some work at database some retuses some memcash and this all sort of worked together And then in 2013 we split the database into multiple shards so that the right traffic could go out And the reason why we have this right traffic is that if you're trying to sell something and you're trying to sell a Lot of it we need to create checkouts. We need to create orders. We need to take payments There's a lot of rights that have to go on and it have to be quite consistent and that's one of our big challenges So we did this in 2013 and really all of the deadlines of our team is around Is around Black Friday and Cyber Monday, which is a big American American holiday where Americans shop more than usual So we did that and it worked really well for for a long time and until then Sort of the right capacity or the scale capacity of Shopify had been the biggest threat to our existence We were terrified of these customers taking us down and on the other hand We didn't want to say no to customers who were driving a lot of traffic We could have just said we're not going to be a platform where you can do this kind of thing But instead we acted in a way where we decided we want to be the only platform in the world that can support these kinds of sales So in 2014 the biggest threat to our existence was that we had this like pretty big Abba this this point we were in the hundreds of servers in our data centers We had tens of data stores and all this kind of stuff and if and many of these sort of became single points of failure And it may sound somewhat embarrassing that just two years ago We had single points of failure in our infrastructure But it just hadn't been a priority to us because the threat the the threat of all these celebrities selling was much much higher Than that of a single database blowing up so in 2014 we started working on resiliency and the way I like to think about resiliency is kind of like in Chemistry if you have a big surface area You are gonna have more reactions and it's sort of the same with errors if you have a lot of servers Then there's just inevitably going to be more errors happening If you have one server the probability of you running into a kernel bug in a year is pretty low If you have a thousand servers the probability of you running into a kernel bug in a year is pretty high If you're if you're Facebook if you're Google and you have servers in the millions You're running into kernel bugs all the time and they have kernel teams to solve this So as you get larger more errors will happen and you have to deal with it This is really when you have to transition from this sort of Pet mentality of your servers in your infrastructure to the cattle mentality So errors are proportional to the surface area of your platform and we started reaching that critical mass in 2013-2014 where outages of these small components would sort of Disproportionate propagate into the rest of the platform and cost each these cascading failures so we started Operating with this mental model of a resiliency pyramid where we had to think about Certain areas that were happening in our platform and start getting them to not propagate downwards So we were really at the very bottom here We didn't have any idea what happened if one of our database nodes really blew up It hadn't been a threat to us at that point So we started working on things like mapping out all the single points of failure Making sure that if the caches were down and we hit a store that it would be fine that if we had warm caches And we hit a store and the database was down that it would be fine wrote test cases for all of this stuff and then started fixing them and we wrote tools to do these things we wrote a Little little proxy that sits between you and all your data source and development that allows you to easily emulate error conditions in development So you can do like hey for this test take down this database and see that everything happens And so we sort of mapped out the entire application all the single points of failure and did this in 2014 so 13 we optimized the database in fit in 14 We did a lot of resiliency efforts and in 15 we started working on multiple data centers So at that point we'd only really run out of one data center The risk of that one data center going completely out and being able to fill which were another data center Was just not as high than all of these other things worst case We would have to just provision a new data center We would have to be down for a couple of days, but it wasn't the biggest risk at the time but in 2015 that became the highest priority and we started setting up a second data center and What this what this boils down to is that you have one data center that runs all of the different charts and so on all of Shopify and then you have another data center and you set up a replication from the one data center to the other data center And then you somehow sort of move all of the traffic So what we did was we did a lot of work to go from the one data center to two to two data centers And then we built this script that can run the entire failover process in just about a minute of downtime on Shopify So when sometimes if you see a maintenance window on Shopify for going down for a database of failover It's me running this script to fail over the entire data center in about a minute And so it has a couple of steps here It updates sort of a service discovery layer and says okay The data center is now going to move and then it takes this the checkout down So if you're going through a checkout while we're moving data centers, it will show you an error Hey, please come back later if you go to a store during the time because the store is read only that's completely fine So most stores don't even appear to be down unless you try to perform a right And then we sort of stop everything the one data center make sure that the replication has caught up in the other data center Then start everything back up We tune our load balancers So all the traffic that went into the old data center now is proxied over to the new data center While we sort of update our internet routes to announce the IP's out of the new data center And then everything is moved and this takes very very little downtime And then you end up in this situation where data center 2 is now the primary and is replicating in the other direction And then we could always fail back and we've done this exercise probably around 10 times or so the very first time We did it. We were all sitting in a room There's about 20 different people looking at 20 different dashboards and it was kind of like like what I imagine like well if Like after a space mission they all like hug each other afterwards And that's what we're doing and people are on hangouts and it was a big thing And now it's like two or three two or three people running this script And it's still a little bit of a bigger deal than I want it to be but it's it's pretty it's pretty small thing to fail over a company hour size At this point so that's really great. So we got really good at this multi DC thing and then in 2016 we've been working on this concept called pods and The idea of pods is really let's take all this stuff that we learned from 2006 to 2012 about running a really really good Application and optimizing it and performance increases. Let's take that Let's take all of this stuff that we learned about database sharding in 2013 and all the stuff that we've learned since that Let's take that as well. All this stuff we learned about resiliency in 2004 and all this stuff We learned about multi DC in 2015 and sort of see if we can marry all these concepts in a way that makes sense and Do something with that so what we came up with was this idea of a pod And the idea of a pod is to take this sharding idea further So I shard by itself in sort of Shopify terminology is just starting the database We didn't shard everything else only the database because that was the most critical thing at the time So what a pod is is taking that concept to the extreme we take the workers We isolate them for for all that for a group of shops. We take Redis and we isolated for a group of shops So basically it becomes like running multiple deployments of Shopify at a smaller size So the goal is that if we have these completely isolated units of Shopify Then we can run those in multiple data centers one can run in Asia one can run in Europe one can run in India One can run in North America. It doesn't matter. They're all one shop does not need to talk to another shop There are some exceptions to that things like if you're talking to the API You don't want to have to care about shards or any of that or if you're a partner and you have Shops in Asia and you have shops in in North America. You don't want to care about that So there's all kinds of sort of global challenges to this So we had this 2013 had led us led us with this Architecture here when we did the database sharding and what we wanted to move towards was something more like this But we have these self-contained pods self-contained deployments of Shopify and instead of managing one big Shopify We manage a lot of small Shopify's And then if we have that then it doesn't matter where these things run some of them can run in data center one In North America and some of them can run in data center to in Europe And then all the European customers could be on data center to in Europe and all the North American ones could be and the ones in North America and we have multiple data centers there replicate them fail over the pods independently And we just have this really nice infrastructure And then if we return back to that spectrum chart from before we've arrived at something really nice We haven't quite reached the middle yet, but we got a lot better so in the Capacity on the capacity spectrum we have much better capacity than before and they even multi-tenant solution doesn't lend itself very well It's running in multiple data centers, but now we can do that if we wanted to spit up Google cloud pods of Shopify We can do that if you wanted to go into Amazon if you want to start experimenting with the cloud We can do all of that stuff its utilization is a lot better flash sales is also better because of the more capacity It's even cheaper because now we can start sort of having vendors and data centers bet against each other Isolation is better because now you have groups of shops so if one shop blows up the blast radius is sort of limited to that pod and not the entirety of Shopify and The scalability is also better because we can now run out of multiple data centers and scale sort of independently and So we're not quite there yet. Everything is not green, but it looks a lot better and Flash sales was one of these prerequisites prerequisites that I set up before it sort of is what everything comes back to at the infrastructure team at Shopify and what we do with flash sales is that We're limited by the CPU under workers the databases and are all over provisioned and not the bottleneck for us instead It's the workers. It's the Ruby and Rails workers that serve the application. And so what we have is we've written on top of some of our Resiliency primitives we've written it so that workers can sort of be shared between pods So if one pod is experiencing a massive sale It can take some workers from some of the less busy ones and then serve the traffic and then give them back to The other pods and this works really really well and we sort of get the best of both worlds So let's let's see what that looks like if we get a request in this setup you are going to do a request here where you get to these beautiful shoes on my shop calm and My shop calm resolves to one of our IPs our IPs are announced out of multiple data centers and you go to the closest data center and enter our networks and When you enter our network you reach this this software very dearly called sorting hat And what sorting hat does is it looks at the request and based on the request it thinks a little bit and Decides which pod this request is destined for and then routes you to the appropriate data center So your request goes to one data center enter our network the load balancer sorting hat module looks at the request decides is this This request is for pod to okay is pod to Local to this data center and active in this data center if yes, then you just send it to this data center If not you proxy it somewhere else So if you're going to Amsterdam you might be proxied to say Virginia to be serve your request And then you enter another sorting hat in that data center It knows that it's local and it's all it's all fine So this is sort of the multi DC strategy in a nutshell and I just want to give a shout out here to Open resty so this sorting hat module is written as an engine X module and we use Lua for this and This is some of the best software Like some of the best software that we run in my opinion engine X is extremely stable we found very few bucks in it despite our surface area on this and With open resty you can extend it with Lua you can do crazy things like customize the load balancer So we implemented an exponential Weighted moving average for our load balancing you can do things like serve as a cell search that are different in our case All of our customers point to the Shopify IP but based on the host name We can serve them the appropriate cert dynamically out of a database with this You can do anything that you imagine at a database layer if you're running Docker containers You can get all the containers from a service discovery layer What this stuff can do is incredible and what we use it for with sorting hat is that we go into the The request and looks at look at it and we route it appropriately Based on what the request looks like that's the sorting hat module So if we have this architecture We need to set up some rules of what Shopify can do to run in this setup And the two rules are rule number one is that any request that comes into sorting hat needs to be annotated with wherever It's going so sorting had needs to make a decision of where this request belongs to what shop Does it belong to what pot does it belong to if you're going to a data center in Chicago and the record and the sorting hat in Chicago can't figure out where This what what shop this request belongs to it just has to for for you It doesn't know whether it needs to send that request to Amsterdam or to Asia. It has no idea So it's just gonna error so every single request needs to honor that rule Rule number two is that any request can only touch one pod if it touches multiple pods multiple Shopify's it means that You might do a request to Amsterdam that also needs to reach to Asia and North America and everything in the same request And that's a mess that has really bad consequences in terms of resiliency that these requests now rely on multiple deployments of Shopify So these requests could look something like this you get a webhook from say stripe from a payment And then instead of having the shop domain in this case you would figure out which shop this request is going to by looking at the URI so you have to teach the low balancer about these different things The other problematic request might look like look something like this where in the past We might have gone over every single shop and tried to count all of them Or we might have had some operation to try and upgrade all the themes around all the Shopify stores on all the different Deployments of Shopify all the pods and these we can't have any longer So in the first case we need to teach the low balancer how to route that URL and in this case We need to teach Shopify to not reach to multiple shards in the same request and So we came up with this thing called shitless driven development because to honor these two rules we had 500 if not a thousand different code paths that violated these rules our Infrastructure for 12 years had relied on the very underlying assumption that you could do these two things You didn't have to care Shopify could just figure it out at runtime where the request came in so with this overwhelming list we came up with primitives to sort of create a deprecation API and Usually with deprecation what happens is that if you use a new API something would will yell at you But you can still use it if you want to what we did with shitless driven development Is that instead of just deprecating and spitting out an error message? We whitelist all the existing usage of that API in and you have a big checklist of everything that needs to get fixed And then if you violate that you get raised an exception So it looks something like this if the shit list includes your class or your request then just do it Otherwise print out a friendly message and come talk to us and we'll figure out how to do it together But the point is that we sort of stopped the bleeding to violate these multi DC rules and then go fix them And then we're left with this massive checklist of everything we need to fix But there is sort of a feedback loop we kill them one at a time and this this refactor is Coming up on taking a year for us to do so it's really really helpful to have that very tangible success metric So for our routes the shit list might look something like this the shit list is your routes file So here are all of your different routes In Shopify, we probably have 500 of these but these are just a subset of the one that you can see are a bit Problematic things like sign up that doesn't really inherently belong to one Shopify So you have to sort of go to every single pod and ask it like hey Do you want to sign up this store? It's European and fresh and whatever and then all the pods respond back like I'll take it I'll take it in low bouncer picks a random one things like changing your password or Recovering your password you actually have to ask every single pod because we don't want to leap that abstraction to the developer of or to Merchant about what pod they belong to so all of these are sort of shit listed because we don't know how to route these So we added this syntax on top of the rails default Route syntax to have a routing method So in the first case we know to extract the shop from the program and in the second case We know to use this special sign-up method and in the third case We need to try all the different pods and aggregate the results somehow and in this last case If you just visit a store you can extract the shop and therefore know the data center from the host From from the from the host parameter in the in the HTTP config And so somehow we needed to teach the low bouncer about all these rails routes So what we did what was we parsed this entire? AST that the rails routes generate and made it into JSON that our sorting had Lua module could read so it looks Something like this the rails routes are serialized into JSON and the JSON is then read and interpreted by sorting hat Sorting hat being this module that sees request and routes it to the appropriate data center So this Jason might look something like this It's basically just has like five thousand or like five hundred or a thousand different routes that just match the regexes It knows the routing methods so it knows when it sees the request that matches this Regex for the uri then it knows to then it figures out how what the shop Or pod that that request is going to from the method designed here And the mass vast majority of them are shop from hosts, but there's a ton of others that don't go through that So if we then reiterate this this diagram What happens is that you get that request is sorting hat sorting hat looks at this giant JSON file that is generated from the routes from the web app And then it knows based on that request which pod in the world that it belongs to and Then if you on top of that honor these two rules of every request being of sorting had being able to know where every request goes and every request only touches one pod you have the isolation to be able to run in multiple data centers and I think it's really neat how all of the different efforts from over the years sort of play together the Resiliency that we get through isolation is quite well, but we also get isolation through resiliency We get scalability through all of these different things and the multi DC that we achieve really comes through the isolating of Shopify into many many small Shopify's that can run independently And so just a couple of thoughts of the on the future because we haven't quite reached that middle yet from the spectrum before not Everything was green in the middle. So we're pushing the boundary further and further towards the center But we're not quite there yet Maybe 2017 is when we'll reach it and two of the things I would like to work on in 2017 are things like doing zero downtime pod failovers when I failover a pod from one data center to the other I don't want any customer impact whatsoever So what we're thinking about doing is that when we fail over the data center from or a pod from one data center to the other We tell the load balancers in the origin data center to pause all of the requests Then we move everything and then we resume and let the request flow through this means that there is not really any disruption for customers If you're doing a checkup while we do a pod failover it will just be slow But it will still succeed and nothing will be dropped Something else I would like to do is isolating the shops further We still have the blast radius of one store blowing up will sort of propagate into a pod I would like to put more constraints on single shop So that doesn't happen as much by looking at past disruptions But that's really what we're what we've been looking for for 2017 in just about one or two months My team will hopefully launch Shopify and multiple data centers with this pods approach in time for Black Friday Cyber Monday if we don't Well fuck Thank you so much So the first question is how do you handle transactions in the sharded SQL? Okay, so I think what's implied by that question is How do we handle transaction between the multiple shards in this setup? That's how I'm going to interpret the question We have this nice property in Shopify of one shop not caring about another shop So the transactions are really isolated if you're doing a transaction on your shop in updating an inventory count or Products or things like that you're not going to reach across the other the other shards because you can't do transactions that span multiple shards It just doesn't work. There are multiple databases. They have multiple sessions There is one database that I didn't talk about because it's sort of a very lengthy talk on its own And what I've spent the past four months on is that we have another database that is sort of the master database It hosts things that are not that don't belong directly to a shop things like billing things like partners things like API all these things that don't inherently belong to one shop and You can't do transactions between a shard and that either So we just kind of like do best effort and put all of the transactional things back in the shop and then backfill them into the master So basically the answer is try to avoid doing cross-shard transactions every theory everywhere you can If you can't you need you need to try to refactor your code or just live with the drawbacks of not having transactions cross data store Yeah transactions hard Okay, the next one Did you extract microservices for your app and did that help with the scaling? Yes, so this is something we debate a lot internally as well So without going on a too lengthy rant here. I think microservices in many companies are sort of overcompensation I think good object-oriented programming can get you out of most of the most of these things and I don't think you need to enforce this TCP IP boundary to have a good software That said there are some boundaries when your application sort of obviously just belongs into different apps The example of Shopify is this we have these sharded tables where a shop has all of its data But we also have like the API and partners and billing and stuff like that that belong to sort of a global database So we want to actually extract these things into services because they don't belong to a single shop and Shopify should only care about shops So billing should live in its own thing partners should live its own thing and API should live in its own thing because these three things don't belong to multiple shops and with API I don't mean the actual API of retrieving your products or performing a checkout because that still belongs to a store I mean things like authentication or creating apps and things like that And More how do you handle the pod deployment? So the way we handle deployments right now Is that we just deploy all the parts at once in the future? I would like to do canary deploys or ring deploys where we deploy to one part at a time Right now we just deploy to all of them at once and that works fine But something I would like to do over the next year or two is to try and deploy to one part at a time Because it just minimizes the risk when we have all of our developers able to deploy all day All right, and the last question is are you using Kubernetes? I don't think that's public yet Okay, all right. Well, I'll just one more round of applause for Simon. Thank you so much