 from Chicago, not from Canada. I apparently have an accent that confuses people. I've never been there. I'm a tech lead at Groupon. I've been there for roughly, well, 15 months, which has given me the benefit of seeing a relatively, like, average-sized rail's code base grow to be ginormous. So we've learned a couple of things along the way, specifically about active record and scaling it. The first part is active record is a great tool to get you 80% of the way there. So you can get up and running in your application really, really quickly. This starts to totally break down once you have real users. So as soon as you start to scale, you immediately start feeling the pain of some things in active record. This is actually semi-real code that I found in our code base, and my face just, like, melted when I saw it. It's like, hopefully everyone knows, but that's pretty terrible. I'll go over it just to make sure everyone knows. So this was, like, the impetus for this talk, was seeing stuff like this. So this is a talk that's basically common mistakes that we make when dealing with active record. This could also be called active record safari. So some of it's going to be review if you've run a large rail site, you've probably already heard some of this. Hopefully I can teach you some new stuff anyways. So let's start with that first mess of code that I showed you. So active record, well, every abstraction, my definition is leaky, right? At some point, you'll get to dealing with your code where you'll need to understand the layer below that you're working on what's being abstracted away. With active record, unfortunately, that happens, like, instantly, but you don't actually realize that you need to know what's being abstracted. So it becomes really dangerous. Like, if we look at this code, this is, it's all groovy, but it's doing very, very different things, right? So the order not all is actually can you do any of this query? It's pulling every order in that table into memory, constructing active record objects, and then per, we're throwing a bunch of them out in groovy just for fun, and then we're selecting the ones that are open after that, which is bugging in itself because you don't actually guarantee that you're having an open order at that point, which I saw. The sad part about all of this is this is super easy to do correctly. Creating a simple scope on your order model and using the limit call that's given to you. This puts all of the work on the database and database is happening pretty good at dealing with data, which is crazy, but I take it almost everyone knows to do stuff like this over stuff like this, right? So another pain point that we had to deal with was running typical active record migrations. And that's, so we use MySQL, and the pain that we're feeling with how MySQL handles alter table statements. So in our example here we're writing an email and a status onto our user's table. Our user's table could be anywhere from 5 to 80 million rows. It depends on where you're at your scale, but this is how MySQL handles that actual migration. So we're going to lock the table, so no more writing. Then we're going to make our changes that we want, and then we're going to hold that copy to the primary. And now since we have two statements, we're going to do that twice. So if it takes an hour to move all of your data around, you're going to be doing that twice, so two hours. Or three hours, depending on what you're doing. This is bad. Especially if this is so it's our user's table, so no more users for the next two hours can sign up. And that would really be bad if that was your purchase table. I don't know if I could tell my boss that we won't be selling anything for the next, you know, three hours. So there are two different ways around this problem. One is kind of nasty, and that's just a right raw SQL. So we're just executing SQL directly, altering the table. If you're lucky enough to be working on a... So is that actually going to teach me transactional level between two columns? I don't know. I don't think you can count on that. Does anyone know? If this is transactional between... I believe it is. Yeah. So this is right. For my SQL, this is... All right. If you're using, I think Rails 3.1, this was introduced, it's a bulk flag on, which would do the same thing as what we did before. I have no idea why this isn't the default, except for that it's only in the MySQL driver, so it's not for Postgres. But very cool, definitely use it. It's just important to keep in mind these different leaks of the abstraction. It's definitely a benefit to be aware of though. Now, the next thing I want to talk about is not optimizing your queries. So we're back to our familiar statement of grabbing every order in the system, which is kind of not a real great example, because you probably never want to do this, unless you're running some global every sale ever report. Normally you limit it down somehow, but regardless, if this is a really big collection you're iterating over, that's going to be really painful. So what actually ends up happening is when we do this we're going to iterate over the database cursor, and then we're going to build up an array of rows, and then we're going to initiate active record objects for each row that we get back. Depending on the size that you're working with, this is going to be consuming huge and ultimate rate. So we have two different ways to work around this, which I think are pretty cool. The first one is findEach. So what findEach does for you is it grabs, it iterates over the collection in batches. So by default, it'll grab a thousand records from the database, and it hides the iteration from you, so you still just get an order, but you're only pinging the database for a thousand at a time, and then you can override that to be whatever you need it to be, I guess. And there's also finding batches, which allows you to do the same exact thing, only except it exposes the iterator to you. So if you need to do further limit it on it, or whatever you have the array of active record objects. So that's that's super low hanging fruit in most projects that simply learn remembering to use findEach or finding batches can save you a lot of memory space. And plus one queries. So we're using our same example. We're iterating over all of our orders, and then for every order, we're going into the users association and grabbing the first thing. So this means that we're going to be having an extra database call for every time you get user. This is normally pretty bad, and it can cascade really fast. So this actually comes from another example that I found, which was one of our reports would do something similar to this, only except it grabbed probably 15 or 16 objects in the graph. And for larger reports, it ended up issuing about 2 million queries. And they hit them. It didn't work. Surprise. So we can fix this pretty easily by just including the users table. So we'll do a join on this and grab all the data right away. And we can cascade this down to you to grab as many associations as we need. You still have to be careful with this. We're still bringing a bunch of data into memory. So combining this with finding batches or finding each would be the way to go. One tool that we use quite a bit, which helps us find crap queries like this is query redeemer. So what this does is it places a little clickable tag in the upper left of your window. And you can click on that and it'll show you all the warnings and critical queries that it finds that rendered on the page. Pretty decent. Another thing that I recommend is always having ActiveRecord log out to a console. It's really easy to see like a flurry of queries fly by. You know something's wrong with that. Someone talked briefly about some race conditions that you can have pretty easily in ActiveRecord. The one specifically I want to talk about is Validate's Unique and Sub. So Validate's Unique and Sub is interesting in that it doesn't actually work all that well. So this will work. All the point you hit production. This is assuming your production system has two mongrels, not just one. If you're running on one mongrel then you'll be fine. So let's see how this happens. It's pretty simple. So we have two users. User one wants to create a record. We check for uniqueness. The database returns true. User two comes along and wants to do the same exact thing. No answer has happened yet so it's going to return true as well. And then both of them have the all-paid insert. And success! No danger. We have integrity issues now. So, and this doesn't have to be two separate users either. This can be one user that's really impatient and double clicking. And if you don't have JavaScript protection on double clicking buttons all this can happen there too. Sorry guys. So the easiest way to fix this is by specifying unique index. This will prevent this from happening. This kind of goes against the Grails train of thought of business logic belongs in your application. I think that's kind of crap. This is an integrity issue. So I want my database to maintain integrity. Kind of up to you, I guess. The only other way I know to fix this is to change the isolation level whenever you're actually doing an insert. Which may work if only one application is talking to your database. In our case we have our database is shared across a couple different applications. So we can have one unique index here every place where we have to access it. Remember they have changes in our isolation level. Which I'll try to say this. There's the gem to help you find these called consistency fail by Colin Jones. It's a really quick executable that you just run against your project and it'll find that VailDate's unique missile that is in fact a unique index. It'll find it has one relationships as well. So another that we've had a lot of pain with is callbacks that we have used the hello callbacks. So the first pain point that we really started to feel at scale was loading additional associations inside the callbacks. So in this example we want to validate the credit card on the order is associated to the credit card of the user. Which is kind of maybe not the best example but it illustrates the point. You have another query to load up the user object now. And so that that cascades, right? So this one callback has one thing that it's loading and you have ten callbacks that are all loading one piece. You have ten additional queries that are going to happen. One of the best ways to take care of this is watching dirty attributes. So only actually do anything if the credit card has changed. So if the credit card ID never changed, you don't need to do anything, right? So you just totally skip it. Pretty simple. It gets slightly worse in cases like this where we're calling methods on other associations now. This is bad for a couple of different reasons and that it's slow. That it's extremely hard to debug when something goes wrong in production. So let's see how something go wrong. So in our case, we have an order that has a callback that touches an item. So we're updating something on the item. So now our item is going to go over and the callback can change something on the vendor, which maybe goes down to some sales thing or the object graph can get pretty deep if you're dealing with callbacks like this. But inevitably somewhere someone's going to return false from a callback that's going to hold the chain and nothing will save. So all the way back up and you're totally unrelated to order item you can't save. We've seen this happen for cases where our vendor was not available at state. So because of something way over here we can't save this object over here. That's really hard to figure out in production. So keep called x simple is the model there. The next thing we want to talk about is breaking the law. And that's all. Law of Demeter. So law of Demeter is a relatively simple kind of concept. Each unit should only don't talk to strangers basically is the thing. So this is a typical explanation. I like this one a lot better. So play with yourself, play with your own toys, toys that were given to you and toys you made yourself. Anything more than that and don't do it. So like this would be a Demeter violation. So we're going from our order object we're reaching through user and grabbing our address association. So this is bad. But why do you care? That's super easy to do. You see it all over in Rails applications. You should care about it because of coupling. So and through this graph that I made up we can see that the higher your objects are coupled together the higher the cost of change. So we don't want that, right? We want things to be easy to change so we want them to be less coupled. We approve. So we got a pretty easy way to kind of deal with this and that we can delegate things to other associations. So in this case we're delegating the address to our user class and we're calling a prefix which gives us this type of method. So now we can call order user address to make it back that address object. This kind of solves the issue and then it makes it easier to change. So if we needed to change the implementation we could just override user address on the order object if we needed to make the change. I'm not totally sure how I feel about this. It still feels equally as dirty. Other people use presenters to try to get around this. If anyone has thoughts on that. Just a quick recap. So pay attention to the abstraction. It leaks like crazy. You know about it. Optimize those queries. Avoid complicated call to X. It's the law. Follow it. And finally the database is your friend. We tend to get into this habit of abstracting it away and pretending it exists. Which is in my opinion pretty bad. Our database holds our data or applications rather useless in most cases. So we should care about our database. And like everything your mileage will vary on all of this. Like I said we ran into these issues at scale smaller for a smaller site that you're trying to grow it may be easier to make some of these mistakes in churn for faster velocity getting features on the door. Just be aware that it's technical data, right? So at some point you're going to have to pay for it. Thanks. So why did I feel dirty about using delegate to get around a perimeter violation? It just it didn't feel that much better in terms of coupling. Like I'm still I don't know how to guess. Thinking about it in a way abstracts a way that you're hitting another table. Right. Some of your other things that actually makes it you may not realize that this is actually causing another quarter of the space. That's true. And if you don't do an include in certain cases that And with that example specifically now our order class has methods about users on it which feels kind of like a violation of single responsibility. Especially in the part where we changed it. Like I almost feel that's where a presenter might come in better. I can't say for sure. Any other questions? Comments? Are there any other tools that you would recommend to help mine tell me that you mentioned a couple of gems? Is there anything else that as you're working there on your interactive just that you use that helps you with the scaling issues? So when you see them come across an active record I brought them through my SQL explain. The thing to be careful there with my SQL at least it's completely stat space query planning. So if you don't have like a relatively accurate table size you'll get different query plans locally than you would in production which has been messed before. So if you a U18 environment that you can run an explain against that has production like data that's helped us a much. Are there things that you guys run outside of active record? And what would those, what types of cases would those be? Are there things you run outside of active record? Do you run all of your queries through active record? Yes and no. Everything inside of our Rails app is active record. We use Vertica for large recording. But that's outside of the actual main app. We do, we actually write a bunch of Rails SQL to you. Thanks everyone.