 Can time travel keep you from blowing up the enterprise? Hopefully you read the abstract in this talk and know that we're going to be talking about scaling rails, but not this kind of scaling, right? We're not going to talk about making rails do stuff faster or making rails do more stuff at once. You can just throw hardware at that problem and that goes away. We're going to talk about scaling a code base or a team or things around using rails and how to use rails as we scale and what problems we have and all of that, so, right, like a growing team, a growing business or project or products, and a changing one, right? How do we scale ourselves through stuff like this? A lot of people think that the way to answer that question is to decide between do we do a monolithic architecture or do we do microservices? So we'll talk about those things, but it's kind of a false dichotomy and that's not really what we want to do. So to explore this, we use a framing device based on the Star Trek, the next generation episode, cause and effect. Anyone seen this episode? All right. It's okay if not, and if you hate Star Trek, it's okay. I'll tell you all you need to know. In this episode, the enterprise is flying through space as it does looking for trouble and they find a spatial anomaly, as they do, and of course they rush to investigate to find out what's going on with this spatial anomaly. As they approach and scan it and do their thing, a ship flies out on a collision course so the enterprise can't get out of the way in time and they blow up. This is like before the commercial break, so it's the middle of a season, first of all, so you know the enterprise is not dead, they're not all dead, what's going on here? So they come back from commercial, everyone's back where they started. They live out the same day over and over again and they eventually blow themselves up again. So they're stuck in this time loop and each time they get reset after the enterprise blows up, they can eventually learn to send information to themselves to try to make different decisions so that when they get to this critical point they don't blow up and they survive and they're of course able to do so by the end. So we're going to use this to talk about scaling rails. We're going to have a team that experiences all these sort of normal growth and change in a team experiences and they're going to make decisions that blow up their enterprise and then they're going to get to do it all over again and they're going to get to do it all over again a couple more times and we'll see what might be a way to avoid that. So a few assumptions. We're assuming that on this project, products, whatever, more people will work on the code than who initially worked on it, right? So we're saying this is not just five senior developers banging something out for five years. This is going to be a team that's growing because more things need to be done. More importantly, different people will work on the code. So that means people of different levels of experience, people with different opinions about how to write code, different opinions about how to write rails. The person who handcrafted that first controller is not going to be around to make radical changes to that controller later, right? So that's another assumption. And that there will be increasing demands of this code. The team will make decisions based on assumptions that they are told are absolutely true and will never change and then those things will change and the team will have to deal with that. I think these are all pretty reasonable things that happen to all of us. And failure then is the products not meeting the needs of the users, right? The software must be built to serve some purpose and if it's not serving that purpose, then it is failing and this is what happens in that, right? Simple changes become difficult. Difficult changes, which are also often important changes, become impossible. And it's low morale, like no one likes working on a failing or failed project and then people start to leave. So I would argue that this is all of this, these assumptions that I made and these failure modes are common but not universal. So I work at a startup and this is exactly the trajectory that we have, right? We're growing, changing, responding to change. But a new project at a big company or some longer term consulting gig all have this sort of shape to them, right? I spent three years working at the US Martial Service government, right? And the trajectory of the project that I worked on in those three years was very similar to the trajectory of what I'm experiencing where I am now, which is a startup called Stitch Fix, right? Two totally different businesses, totally different incentives, but still a very similar shape. So growth and change, that's what we're talking about. So let's get into it. Our team has been assembled to begin this project, product, business, what have you. And they're gonna use Rails because as we know Rails is awesome. That's what we're doing here. Rails makes lots of things easy and in terms of how you continue to use Rails, what does Rails tell you to do? Tells you to put everything in the Rails app, right? That's how Rails has its advantages, everything is there. And monolith kind of has a negative connotation, but it doesn't need to be. It's just a word to describe how you write your code. And this method of coding is great early because you can ship stuff quickly, you can respond to changes, you can handle anything that comes at you and you feel super productive. And it's really like Rails sweet spot, right? It's awesome. So as time progresses, the team grows. So initially the core developers, they've got no problem banging out features. We hire a few more developers and things are going pretty well. There's still a little bit of difference in opinion on certain things, but it's not a big deal. And then we hire more and more developers. And so when you get to a certain size, you're gonna want developers to not have to all sync up together. They're gonna be working on different projects at the same time in parallel. And they're not gonna be making the same decisions. And while that's inherently bad, they're gonna do all this in the same code base. And what happens is your monolithic application grows and grows and it has more and more things that it does. And since it's all in the same code base, since it's all in Rails, there's these hidden dependencies between things. So you do something like, I'm gonna tweak the CSS for our customer service portal. Well, because all the CSS gets glommed together at the very end, that ends up just blowing up the purchasing page, right? That happens. One of the teams decides that the referrals section of the site is too slow and they put a cache, but there's some hidden entanglement between referrals and search, and so this causes the search to blow up. These things become easier and easier to create as more and more people work on them and harder and harder to identify because everything is all in there. You also end up getting your business logic entangled with Rails-isms and end up writing really crazy stuff like user signed up chronologically, approved active orders, and zip East Coast Unshipped First Try First Name. I'm like, how could you ever unwind this? What if you had to test this? What if you had to change this? Like, this is rough. And then there's features that seem really useful and turn out to be a really bad idea. Like, suppose our team wants to send an email after a user is created. If only there are a method to run code after the creation of an active record. There is, it's called after create callback, so we do it in there. And then another team figures, well, we need to integrate with our payment processor, let's insert their record into the payment processor system while we're doing that. And then we hire a marketing team. And the marketing team is like, we want to track user creation signups, but we don't want to use a pixel on our website, we want to do it on the back end. So we throw that in there too, why not? And then we hire a second marketing person who thinks this marketing site sucks and we should use a different marketing site. And them and the first marketing person keep arguing with each other, and they can't reach a conclusion. So engineering is like, fine, we'll put the other one in there too, and you guys can battle it out later. This is problematic. What if someone now wants to write a test that uses a user? They gotta deal with all this crap that's in here. This is really hard to unwind. What if we fire one of those marketing people and we get rid of one of these tracking sites, that's gonna break all the tests that had to mock this out just so they could run, right? So it's not good. Test suite becomes glacial, which means shipping is hard. And we all know what happens in AppSS JavaScripts. It's a complete shit show, people just throw things in there, it's terrible. But because we have all of these different things in the same application, we're connecting things that really shouldn't be connected, right? If we add a feature to our admin CS portal that is very inefficient and runs a very terrible database query, that's gonna take down our public website because they're all the same thing. So that's not good, right? Simple changes are difficult. Difficult changes become impossible because of all those entanglements. And no one likes working on this. Has anyone ever worked on an app that's like that? Like, yeah, it sucks, right? And you leave. I mean, we're not all 100% motivated by the technology that we're using, but it's not fun to work like this. So we leave, and the enterprise blows up. So our team miraculously finds themselves back at the beginning when they were all started, and they have this pain from this monolith that they created. And how bad it was at the end, and how great it was at the beginning, but at the end it was terrible. And they think, we want things to be decoupled. That would be the solution. If things had been decoupled, different services and things, that would be the way to solve this problem. So let's do that. Let's make microservices right now. That's what we wanted, so we're going to do that. So before we even start with microservices, the team's like, okay, we're going to do this. We've got to do a little bit of background work first. So we have to figure out how we're going to deploy lots of apps, because we're going to have lots of different services to deploy. We've got to figure out how to do authentication. How are these things going to talk to each other? We have to figure out versioning. Are we going to put it in the URL, or are we going to put it in a header? We have to talk about our JSON format. Do we just do naked objects? Do we have a top-level object? Do we have some envelope format? We've got to have that discussion and make that decision. And which HTTP library, right? There's like 70 of them in Ruby. So we've got to figure out which one we want to use. And we're definitely going to have to argue about what is and isn't restful, and how to interpret HTTP status codes. After we do that, we also need to figure out what services like do we actually need. And this is maybe unsurprisingly hard, because at this point, when you're getting things going, you have some requirements, hopefully. And you're told that they won't change, but you know in your heart they will. So it's hard to know what actually to do. But the team perseveres. And they make the best decisions that they can, because they're experienced and they're pretty smart. So three months later, they've created this amazing microservices architecture, right? There's a React front-end, of course. It talks to a back-end for front-ends, because we don't want the React front-end to know about the particular services they've created. So they use the back-end for front-ends. That will then talk to an authentication service, which will then talk to a role service, because we don't want authentication and roles to be shared and conflicting with each other. That will talk to a third service to get consumer data, because again, that has to be separate from authentication and separate from roles. And then we can render a page that says, hello, Bob. Now, I know what you're thinking. This is awesome, but it gets even better, because we have used Docker, Ansible, AWS, Terraform, Vault, Bash, and of course, we've written some go, because you can't do all this DevOps stuff without writing go. So it's amazing. Totally amazing. We spent some money, right? This stuff's not cheap, not just the hosting, but the engineering time. I mean, we all know how much we get paid. It's not cheap. We got some blog posts on Hacker News because they love this shit, right? We, of course, we've been to some meetups as well to show how awesome we are. Haven't delivered any actual value. We said, hello, Bob, that's it, right? Nothing really to show. Is there any money left? If this were a startup, that's a very real question. We know how much money we have, and we can do some math and figure out how many days it is till we're all out of a job. But on a non-startup thing, someone might say, hey, why did I just pay on this money to have these developers do nothing for three months? Maybe I shouldn't fund this project anymore. It's not good. Now, of course, remember those services that we thought we needed? Well, because we made all these decisions up front, simple changes are now difficult. We wanna say, let's say we want to add Bob's email to our hello page. Well, if the email is in an email service, then that's a huge pain to make that change. Difficult changes are now really hard because if we made the wrong decisions about services, we now have to unwind all of that and make the ones that we should have made from the beginning. And it's not fun to work on this. It's, you're at the beginning, you wanna deliver value, be productive, and you can't be because you've pre-abstracted all this stuff, and then people leave, even if those people are the reason that this has happened. And so, again, the enterprise blows up. So this team miraculously finds themselves back at the beginning again, and the pain of that microservices thing is in them, and they remember how great it was to do the monolith. Like, initially it was so fast and easy and they just cranked out this stuff, and so they're thinking, okay, let's revisit that and maybe take a different approach. The problem was, we let things get messy and we never cleaned it up. We just had this big mess we just kept piling and piling on. So maybe if we pay attention to when things get messy and then we clean up the mess before it's too late, maybe that's the way through. We get all the benefits of the monolith and eventually we'll get to our microservices by cleaning up the mess and decoupling and creating services that way. Okay, so things play out largely as they did before. Team is keeping an eye on things. All this stuff happens and now they're like, all right, it is now too messy to continue. We are going to fix it. We are not gonna make it worse. We're gonna now start extracting services and all this stuff that we wanted to do with that microservices thing. Now the whole team, which at this point is somewhat large, they can't just stop and do this for a couple reasons. One, managing a large team to do a single project, I guess, is very tricky. Two, the business or the software or whoever needs things to keep going. They need changes to be made. They need features to be delivered. We can't just stop doing that. So we form a Tiger team. The Tiger team is gonna solve this. We take our best engineers and we get them on this task of extracting and decoupling and all that stuff. Now, this is, you know, it's a bit of a problem. So think about noticing the mess, right? Has anybody ever cleaned their bathroom, right? So the day after you clean your bathroom, you all should have raised your hands on that. But that's okay. Maybe you've solved the problem in a way I don't understand yet. But those of you who have cleaned a bathroom, the day after you clean the bathroom, it still looks pretty clean. And the day after that, it still looks pretty clean. It doesn't change day to day. But then when your roommates or a significant other or friend comes over and they see your disgusting bathroom, you're like, what are you doing with this disgusting bathroom? And so that's just a bathroom. Like imagine your code that you've lovingly created that you have all this emotional investment in. It's really hard to see that the thing you've made is messy. So we're starting way afterwards. But the Tiger team tries their best. So they extract one thing, but then the rest of the team put something else as place. So you ever been to an airport bathroom, right? There's always some poor person in there cleaning it. But then there's all these people using it. So they're never actually cleaning it. They're just keeping the mess in the bathroom at just some sort of, I don't know, acceptable level of messiness. It's never actually getting clean. So that's what happens here is you've got these one really great, these team of great developers doing all this stuff. But then the other developers are like, I don't know what to do. I'm just gonna keep shoveling shit into this monolith because no one told me what else to do. It's not good. So doing this requires the best engineers, right? And that means not only the engineers who have the technical capability to pull this off, but also who understand the business to make the right decisions. It's grueling work. It's not fun. It's hard. And it's not something where you would ideally want your best engineers working. And afterwards they're gonna be very unhappy about the whole thing because it's not fun. Small changes, right? They become difficult because we're sort of just keeping things going and trying to make it better, but it's not really working. Bigger changes become delayed or possibly never happen. And maybe the rest of the team is okay, but you're best and brightest that you put on this Tiger team. They are not gonna be too happy when this is done. And they may leave and that's not good. And then the enterprise blows up again. So our team somehow miraculously has gotten a fourth chance at solving this problem. And they're like, I don't know what to do. We tried the monolith. We tried the microservices. We tried the monolith and then cleaning it up and not making a mess and all that. And it didn't work. We must be fundamentally thinking about this problem in the wrong way. And they think it's all about predicting the future. You do this monolithic agile approach because you don't wanna predict the future so you just solve the problems in front of you. And then the microservices approach was completely mispredicting the future. We're making all these decisions based on information we don't have. So maybe we need to think about it in a little more of a fluid manner. We know we can't predict it but we don't have to be blind to what the future holds. We know kind of where we're going. We know that there's gonna be more developers, more code, more features, more changes. Like that's gonna happen. We don't know what they are. We don't know who those developers are but we know they're gonna be coming. We know we want things to be decoupled. We don't know what those parts are but we know that's the way that this works. The way that this succeeds is if things can be decoupled. So they decide, okay, let's not make a mess in the first place as our sort of number one rule. We're not gonna be in the business of cleaning the airport bathroom. We're just never gonna make a mess. They also decide on a few other things that they think might help. And they're making these decisions to enable the changes that they think they're going to have to make even though they don't know the specifics of those changes. So they decide one Rails app for business function. We're not making one app that has everything in it. We will, when a second business function needs us to do something, we'll put that in a second app and that will naturally create some decoupling but not opt us into this crazy microservices architecture. They also decide that when they do need to decouple things, those parts are gonna be around the business logic. So they decide that we're not gonna put the business logic in the active record models because if the business logic is in separate classes that are just basically Ruby, it'll be easier to extract when we need to and it's actually not that much more work to just put them in some other class. There's really not a huge gain in throwing all that stuff into your active record model so they figure that's a really good compromise. And then they decide to be very diligent about identifying patterns that happen as they work. So they use this thing called the rule of three to decide when to create abstractions, when to stop and fix problems, when to create tooling and generally identify patterns that are specific to them. So the rule of three is a way to predict patterns. You do something once, there's no sense in automating it or making it easier, you might never do it again. In fact, you probably won't do it again. You do it a second time and you really think there's a pattern because you can draw a line between those two things so therefore that's a pattern. But that's not necessarily a pattern. Lots of things happen only twice. So again, you don't do anything different. In fact, you might just copy, copy, paste things, do things that may be less than ideal because two times isn't a pattern. But the third time is a pattern. The third time is when you say, all right, this is us finding out by observing what we've experienced and what we've done that there is a pattern and we're now gonna take the necessary steps to make fourth, fifth, sixth times easier. And so you can see in this graph below here that the idea is you spend a little extra effort on time number three and that is to pay off times four, five and six. And the reason you're okay doing that is because you've seen it happen three times and that's a pattern. Now this is like an engineering culture thing and everyone has to kind of be on board with it and you need to publicize that you're doing this because you are going to have to say, hey, I need to take more time on something and that's so that we can scale our team and please, business people, trust that I know what I'm doing. But it's not a big deal. It's not a huge request to make. Sometimes you can level up to the rule of two if you know the team really well. So the team decides, yes, this is how we're gonna do things. So here's a few examples. The third time you make a Rails app, create a generator and generate that third Rails app with the generator. Then the fourth app is almost for free. Third time you make a Ruby gem, a command line tool that generates the Ruby gem scaffolding so that the fourth and fifth Ruby gems can just be created canonically easily. But it also applies to problems. Third time you are faced with making a mess with jQuery. First couple times of jQuery mess, no big deal. You can handle that. But the third time you're like, maybe our front end needs are not being met by making a mess with jQuery. Maybe we need something else, a front end framework or something. Also for outages, bugs, support. Third time someone complains about something. Maybe it's time to stop what we're doing and fix the root problem, right? So this lets the team identify patterns based on their actual experience and not what someone writing some blog posts or giving a conference talk is telling them to do. So the team decides to let this play out. So they made no decisions about the specifics of their architecture. They just created some rules and guidelines that help them know how to change it. So they make their first Rails app the public website and uses a database. Not a problem. Now the customer service team needs some functionality and so because they're not putting that in the public website, they're making a separate Rails app. They create that Rails app by copying the first one. They're sharing a database. And the team knows this is not ideal. This will be a problem later. It's not a problem now. And they just copy the active record models over. And because those active record models don't have business logic in them, there's not really much there. And so keeping the two copies in sync is not a big deal because they don't really change that much. So again, the team's making a reasonable trade off. They haven't been 100% sure they've seen a pattern. So they're just doing the simplest thing. Now the marketing team needs some functionality. Different business. We're gonna make a separate Rails app, but it's the third one. So we make a app generator, use that to create the app for the marketing team. But this creates another pattern, another three, which is that we could be copying the active record models into the marketing app. And then we've got three copies of these active record models. That's probably not good. Let's extract those into a shared gem, right? It's not perfect, but whenever changes are made to the database, we'll change the gem. That'll bump the gem's revision. And then each team is doing a regular bundle update so they will get it even if they completely forget to keep things in sync. So not too bad. The feature they needed to make for marketing was around promo codes. And it required some logic to be both in the customer service app and in the marketing app. And they're just gonna copy it for now. And again, these aren't active record models. These are separate classes. And the team is like, man, this could be a pattern, but they're not ready for Rule 2 yet. So they leave it. So later, it turns out that the customer service application is very slow at searching for customers. So they add a cache so that that search is fast. But they need a way to keep the cache up to date because customers are created in the public website, not in the customer service app. So they need a way to know when the customer, when the public website creates a person, how do we update our cache? Now these are separate apps. So how do we solve this problem? We could create some sort of service or we could query the database, but that's actually, turns out to be too slow for them. So they decided to go to claudamqp.com and set up a RabbitMQ message bus. And then the public website will send a message whenever a customer is created or modified and the customer service will listen for that message and update its internal cache. So now these two things can talk to each other, but they're still very decoupled. As long as the messages are happening, everything is fine and no one team can easily break that feature. And as a side benefit, when we hire some business intelligence people, they can start consuming these messages and get real data out of stuff without having to do anything in any of the engineering systems. Again, more decoupling because of some of these decisions that we've made. Now, the finance team needs something. Fortunately, our Rails app generator created this finance app super fast so we realized those gains of spending that time before, that's great. The finance app needs access to promo code information. That's something that goes on our accounting and our books and all that. So the team doesn't want to copy it over and they could make a gem like they did with the shared active record models but they decide this is simple enough that I think now is a good time for us to see what it's like to make a service. We're not making a zillion microservices, we're gonna make one service and see how that goes. Fortunately, because the promo code logic is not entangled with active record, it's just regular code, they have a perfectly defined interface for what their service will look like. So they set it up. All of the other apps can consume that service instead of using the copied logic and accessing the shared database and as it turns out, the marketing application no longer has any need of the shared database because all it was looking at was the promo code stuff and now that is hidden behind this service. And so we've actually kind of improved things. We've decoupled it from our shared database just on happenstance. And of course the finance app can store all the data it wants in its private database. So who would ever design this architecture? Who would sit down and be like, all right, our ideal architecture is gonna be some copy, pasted code, some Ruby gems, a couple of private databases, a shared database, a RabbitMQ message bus and then a one HTTP service. No one would ever design that, right? But I think we can see the advantages here. A team that's working on this marketing app can make massive changes without breaking everything else. Like their ability to break stuff is very difficult. And because things are kind of aligned on business areas, that's sort of where change tends to happen. But the team basically, what they realized their error was the first three times through this time loop was that they thought the architecture was a thing that they were setting up to create and then it would be done. And that's not actually true. What they realized is that the plans around their architecture are a process for changing it and allowing it to evolve in whatever way it needs to. So they set up a few guidelines, a few ground rules, a way to detect patterns and a way to respond when they did detect a pattern and then let the architecture kind of evolve however it needs to to serve their purposes, right? So the enterprise doesn't blow up, they escape the time loop, everything is great. So what do we learn? Don't be blind to the future, right? That's not the same as predicting it. You can acknowledge that you can't predict the future but you kind of can see the winds of change, right? Write code that enables change, right? That's where it gets confusing. You don't write code that is the change you want to make in the future, you write code to allow it to change what it needs to. And that's part of the engineering culture, right? The rule of three is a great way. Maybe there's a better way for you to detect these patterns, but that's a great way to do it. And acknowledge that the architecture is something that evolves and it's never completed. But just don't make a mess. Like if all you do is don't make a mess, like lots of good things, good things happen. So that's all I've got. If the thing I described sounds like a great way to work and you're like, I want to work in that crazy architecture then come talk to me because I can make that happen. Also have a couple of books you can buy for cheap. I also have a few free copies if you ask me a question that, you know, a question then I will give you a free copy up here and follow me on Twitter. So that's it. Thank you.