 So performance tips for a real stack. I'm a very cliched title here. So I actually want to clarify, I mean, when I first came up with this title, I was like, okay, let me come up with the stuff we've done at Grab. And then once I realized, like, these might not really apply to everyone. So I want to rephrase and be like, performance tips, we use at Grab for a real stack, right? So, hi, I'm Altaf Hamiz. I'm a lead engineer here at Grab. My colleague told me to put a picture of myself. I couldn't find a good flattering one of me, so I put a cap. And Youngshun convinced me to give a presentation here. So, you know, when we think about rails, the first thing that we hear is always like, you know, does rails scale? Does rails scale? And so then, you know, I went to Google, as we always do. And then we know we do, can rails scale? Yeah, very disappointing for me, right? And then here at Grab, we use rails extensively. And so, that wasn't very encouraging. So we decided, no, we want to make it scale. And so I just want to preface with what this talk is not about, right? It's not your usual scale talk that says, you know, don't do N plus one queries. Use pluck instead of instantiating AR models. Cash frequently, use data, and ready some memcash, you know? The usual stuff that you see in pretty much any article that tells you how do you scale rails. Those are things that you should be doing, you know? No one should be telling you, don't do those things. Those are beginner traps, right? The main thing that everyone should know is that most of the time, it's not rails. It's how you designed your app. And that is something very hard for people to grasp. So what are our numbers here at Grab, right? This is what people are interested in. So our rails stack does around 360,000 requests per minute on peak, usually at peak. That's around 6,000 requests per second. And we just run that on 10 instances. We run it on, we could run it on, we run it on pretty big machines, C4, 4x large on AWS, for those matter. And our average response time is about 33 milliseconds, but our 99% time is around 128. The 99% is a pretty skewed by a few APIs, unfortunately, which we haven't fixed yet, but those are pretty decent numbers for any scale, right? 6,000 requests per second is nothing to be scoffed at. And we're just running it on 10 machines. And we're in no way throttled by machines. We could just pull out more machines and handle at least double the load easily. So the first thing that we do heavily here at Grab is something like this. So we use Redis to feature flag, right? So feature toggles or site bars, whatever you like to call it. So we do something like this. So we instantiate a Redis and then we just do like, if configurationRedis.get some feature and enable that new feature of some flag is true, we do a new feature else, something else. We do this heavily because we need to, we can't rely on deployments all the time. Deployments take at least like 10 minutes for us to roll out. So we need to be able to quickly turn on a feature. If things are not working, quickly turn it off. And so when these are on heavy endpoints, you get a lot of hits to Redis, right? And Redis is pretty good, but you start encountering the fact that you're hitting it a lot with a lot of gets, which are pretty meaningless because these flags barely ever change, right? You probably set them once. You probably flip them off at one point and you probably never flip them off until you remove them from code, right? So how do you cache something that's already in memory? I mean something that's already in a memory store. So active support cache memory store. So this is local memory, right? And this is exactly extract from the Rails guides. It's not appropriate for large application deployments, but can work for small low traffic sites. Yeah, we chose to ignore that. So this is something that one of my colleagues did. He think he's here. Oh, I'm not sure. Yeah, so soon. So what we actually did is we decided to basically delegate everything through here. And what happens is once we get it from Redis, we actually store it in memory on that instance, right? Because most of these times, most of these values don't change. Once they get fetched, we keep them for about 30 seconds. 30 seconds is pretty low TTL. And I'll explain why 30 seconds and why we didn't go higher. So what happens is you hit Redis once. You store it in memory for 30 seconds on a single instance. So at most, you need to hit 10 times, right? For all 10 instances to get a local memory store. And that actually works surprisingly well. Because if you look at our AWS graph, because we use Elastic H, that's right after we deployed, because we use gets and edge gets. That's the significant drop that we got. We got around 20K drop per minute, right? Which is pretty impressive. We can actually get more. The fact here is why it's so low in a drop is because we use a 30 second TTL. We could actually do an hour TTL. Why don't we? Is that when we want to turn off a feature fast, we can't wait for one hour for it to turn off. So we need a way to invalidate it in memory quickly across all instances. We just have to write a script that basically purges memory on each instance. We haven't done that. If we can do that, if we have a way to like say it. Look, we want to switch this flag and purge all the local memory stores, you could cache it for how long as you want. So as long as you, the moment you flip the flag, you bust all the caches, then you can actually increase this TTL to way more and get much more significant savings when you hit Redis. So this was pretty cool and everyone was like, okay, that's pretty awesome. But Redis is already very performant. You know, you don't get much gain from this. So big point is like always the DB, right? Can we do this with the DBs? And so surprisingly, yesterday we found out, yeah, we could do it with the DB. And so this is something that we actually deployed yesterday to production. So this is basically two models we have on ActiveRecord, which is content details and cities. And these are very static data on ActiveRecord, right? They have very basic configuration information that could exist in config files, but they exist on ActiveRecord, right? So that when we expand and stuff, we can just act, we have interface pages to add them up. Most of the majority of this data barely ever changes. So we decided to cache this in memory as well with a much longer cache family. And then these are most significant drops on queries. So this would be like 25K and 30K dropping to around 10K. That's around 40,000 queries per minute. Of the DB, by just caching it in memory. This is something I have not seen being done a lot in Rails apps, right? You don't see memory, like local memory caching being promoted anywhere in like Rails performance guides. I think that's something we should be exploring more because there's definite use cases. You need to be aware of all the edge cases with obviously storing in local memory and the fact that it's local memory and it's not a shared cache. But I think there are much bigger opportunities here and we've just barely started touching the surface on what we can do here with local memory because it's way faster. It reduces all your hits on your data store and it's way easier to manage, right? The problems, however, come with debugging, right? There is no easy way to debug this. So don't cache data that you feel as, you know, you'd want to be able to debug because it's really messy because you don't know when it's coming from local memory and when it's coming from either Redis or your DB. So that's something to keep in mind when doing it but we found out that it's pretty useful to cache these kind of data that you pull frequently by the rally average engines. So the next thing that we did was sort of partitioning our data, right? Rails apps are very much tailored to use a single DB and this single DB quickly becomes the bottleneck, right? As everyone would have experienced. And so usually when we say partition our data, we partition the DB, we might do some sharding and so on. But what we, so we are on RDS. We use AWS RDS and you could, we do partitioning of certain tables but RDS doesn't have really good off the box sharding abilities, right? So we decided to use something called a gem called octopus. So octopus is a gem that is mainly used for read replicas to read off replicas, which we don't use but it also has ability. So if you read this section, which is called mixing octopus with the Rails multiple database model, you will find out that you can actually, you can actually connect to multiple databases. So this is super useful when you start segregating your data. So you decide, let's say, for example, in the case of grab, let's say you want, you know, certain system of yours in your Rails app is quite, how would you say, encapsulated. It doesn't require a lot of other data. As long as it has its subset of data, you put that in a separate DB, right? The core data, you put in a main DB. The data that you might use frequently, you put in a separate DB. The one that has high reads, you put in a separate DB. So you can also then optimize your DB for the kind of workflow that you need. You need high write, you need a high write throughput. You can do that. So we have a database which is mostly analytical info coming in from the apps and so on. And so it's just like no one reads from that DB ever in production. It just basically goes straight to our analytics pipeline. So it doesn't need to be optimized for reads. My SQL by default would be sort of a balanced approach. When you put that DB on high writes, just partitioning, you can just forget about it, right? It definitely does have some gotchas there, right? You can't do joints so easily. So that's why you need to have your data quite segregated or you need to do multiple queries to get the data where you need, right? You need to be careful of transactions and rollbacks, especially when they do multiple, over across multiple databases. So because if you start a transaction on one database and then you start doing a query on the second one, you might end up rolling back only one of them and not rolling back the other. This link here, I won't go through all of them because this link actually summarizes most of it about what things to watch out for when you use the various database, multiple database model. But how does it work for us, right? So we were on RDS 8x large, which is the largest RDS instance available. And on average, we used to do like 30% CPU, right? So these are four databases, all different sizes, right? Names have been hidden unfortunately because it's recorded. So our main database, which is like you can see the most number of connections is like 10%, right? That's actually where the bulk of the data is. The second database is actually what I said about the analytics. So it's where all the writes go through and it's around 12%. The third database is interestingly used for a very cron-specific kind of system. So it basically runs mostly at midnight, past midnight. So you'd see the database spike up then to around 30%, 35%, and usually it's doing nothing during the day. And the fourth DB, which is something we introduced newly, which is why you don't see much CPU there, is something that we started to do for some filtering, right? So it's not yet aggressively used, so that's why it's so low and so little connections. And we found out it works pretty well. As long as you can encapsulate your data decently enough, Rails actually provides all the tools to work well with multiple databases. And we actually patched up, so we patched up the migration to actually work well with the current, the way you do normal migration. So we could actually do something like RailsG migration database name and it would actually generate the migration for it. You could do Rails, I mean, you could do React, DB, migrate that database and it will migrate that database and so on. So you could keep your exact same workflow, same deployment pipeline, I mean, just add these commands and you pretty much, it works out of the box. There are some problems with, as I mentioned about the joints, but most of them are work aroundable, right? But this applies to Redis as well, right? So we actually use multiple Redis, which is common, but we split it up again, same for workflow reasons, right? So our Rails cache is on some completely separate Redis. The configuration Redis, which I showed you, is on a completely separate Redis. Sidekick runs on a separate Redis. We have a general Redis that is used for whatever low-level caching that we do, or maybe, you know, when we use some of the data types that Redis provides. And then we have a shared Redis where we share some data or when we want to invalidate caches across stacks, right, across another service. So we have some services written in Go. So when we want to invalidate their caches, maybe, we would push something through Redis and they would then read the latest data of Redis, right? So we decided to talk about Redis, we talked about the DB, now we need to go deeper, right? Partition Rails, right? So if you talk about our Rails app at the moment, when it started off, I mean, as a general monolith here, you grab, it used to do a lot of things, right? But if you distill it, it stooped things in one. It was basically a front-end for our sort of a management page, which all our Ops teams uses, like, you know, to see the current status, to update, you know, to see bookings, details, and, you know, the same, you know, the usual CRUD flow and all other things, right? But it was also a backend service that was processing bookings, you know, was the API for our passenger app. And so it was two very much different workflows, as you could imagine, because one is basically a backend API JSON server, and a second one is basically a full-stack Rails app. And we found out that they have very different performance requirements, right? So we actually use Rails engines here, and so we actually split it up. So we defined a core, which basically had the shared models and methods across two separate Rails apps, right? So we called one Web API, which is basically the API for the front-end management system, and one we call Control Center, which is basically all the internal backend systems, and including the API for the passenger. Why would we do this? So one thing we found out was Control Center handles the bulk of the throughput, right? And because obviously, because it's coming from the app, while the Web API is coming from users accessing the page, which is our staff, right? Which is much significantly much less. So we could actually provision a few machines for Web API, very small machines, I think it's C4 large, and we provisioned C4X large for Control Center, right? The second thing that was more useful is that, for example, when we have to communicate with external APIs, for example, bookings is now, we have a Go service that handles bookings. So we have to query that API to get some details. Now the APIs that Control Center calls are related to the PAX app. So then you want very fast response times, right? A PAX app is not gonna wait one second, two seconds for a response, you know? You'd order it on the order of milliseconds. So we want to set timeouts, like very low timeouts, like three seconds, forget it. I don't care whether I get the response or not. I'd rather time out on the client side, right? But for the management portal, you're probably doing a bit more heavier queries. You're probably doing searches, some filters. So you don't mind waiting as a bit longer for the query to come through. So by splitting it, you could actually set different timeouts for your application or YML, however you load your configs, that actually treat them two differently. But because they share the same models, you don't have to replicate two different code bases, right? So that proved very advantageous. The second thing is different gems. So we don't want that math n problem, right? So if Control Center wants to use math n, that's fine. We're not including any in Web API, right? So that's something, so you don't want to get surprised by gems. So the core gems are things that are very core. So New Relic, RPM, and you know, Redis, and so on. Very few Puma, our web server. But we don't include any gem otherwise. So we include it first in the specific app that wants it. And if you want to migrate it to core, it has to be really reasoned why. The third thing is optimizing Puma. So Puma, I mean, if you've used Puma before, it has, it's a process, a multiprocess and multi-thread. So on Control Center, we run because it's very fast response and you know, each request lasts a very short time. We spin up a lot of threads and a lot of workers, right? We can't do that with a Web API because some requests take longer. They are more memory intensive. And so what we used to happen to us before was that we used to have memory constraints because someone would do an export and they suddenly like, you know, that instance is like running out of memory because we have so many threads as well because handling all the requests. By splitting that up, we could have fewer threads on Web API and more threads on Control Center because these ones process faster and are done quicker while the Web API ones can be slow and people are willing to wait for a thread to be ready to serve them. Finally, the Web API doesn't require background workers. So we've had, why do we need psychic running on those instances? Just throw them out. However, your packs up on replicas, your packs up once real-time data. It doesn't want outdated data. If we having replica lag, you know, that's probably something going to go wrong when we start returning data from our replicas lagging. But on our management platform, it's a bit fine. I mean, we can stand with replica lag. It's pretty obvious when replica is lagging and it's not a deal breaker. They can wait for their data to come maybe a few minutes late, they don't have to process it then and there. So we could reduce load on our main DB by actually using replicas for our management platform. So in short, it's basically optimizing each cranny for each app, right? And by doing that, you could actually reduce the overall load on your system by separating the different workflows that you have for your app. But again, you know, you have problems, right? So let's say for example, there's something that happens to your booking. And let's say for example, your management system can also cancel a booking. And so let's say, you know, the person tries to do something and you try to do something in your management system, you need to make sure that there's some locking in place, right? So we use optimistic locking. It works well across Rails app because it checks the lock version and then it raises stale object errors. But you need to be able to add the error handling to handle those stale object errors if they occur. So that's one thing that you need to be careful. Debugging. So debugging issues now span across both apps. We've got two sets of logs to Skyan. It's not in order. So those kind of issues come up as well. There's also operational overhead, right? If there's something changing on core, right? And if let's say control center deploys, then they are running a different version of core than the one that's running on Web API. And we haven't had too much issues with it. But if there's something that, you know, changes a fundamental behavior, we need to be really careful with how we deploy the change of production. So we might have to actually then, you know, do this whole feature flag, deploy both versions. They both read the same flag, then flip the flag over. So then they both start reading at the same time. And then they flip the behavior at the same time. So in managing stable releases, this can be a bit tricky because you don't, because it's still one repo, one, this thing, we don't do sub-modules. So it's a bit tricky to create stable releases. But other than that, we haven't had much issues. And for me, generally, the advantages generally outweigh the problems with it. So these are some of the things that we use at Grab to scale. We've done a lot more. Unfortunately, don't have much time to talk about it. I could go, I don't have much go to share here as well because it's pretty lengthy and I don't think anyone's interested in it. So you could contact me if you want to see some things on how we did stuff or you could just let me know. So yeah, thank you. Any questions? So you're saying you had engines. Is your core engine in a gym like separate to the main repo? Yeah, so it's also separate to the main repo. It's still in the repo, but it's pretty much we... It's written like a gym, but we require the rails engine directly in the core base. So is the Web API and the admin section and the core all in the same repo? Yes. We could have packaged it as a gym. So that would actually solve that verging problem, but we don't make much changes to core as often so, and most of the changes aren't breaking. So if you're introducing a new model, we would first introduce it to core, then deploy it. So then there's a core already out there, then we'd put the specific code for the specific app. Yeah, that would call it. Yeah. That's curious. If you put the engine in the repo and then when you're using engines, that means you have multiple rails app which is using the same engine. Is that true? So we don't run multiple apps on the same instance. We run only one of those apps. I mean the rails actually... Yes. The rails actually is only for one particular app, for your case or... So the engine is actually core, right? And both of the apps rely on that one engine. So there's only one engine. It's one repo and then the two rails app... No, no, they're all in one repo. Oh, it's not. Yeah, so it's all in one repo, so they all spin up, it's one rails instance, that spins up. So you start to start off things in memory that's GC-affected anyway. It's not real. So we don't, I mean, we don't store 100,000 records. I think if you... So for the active record models, I think they are about 200 records that we're storing in memory, which doesn't seem to be affecting GC at all, according to our stats, yeah. I haven't still, because we did this only on Monday, we haven't still got size comparisons to see how much of our memory store is being used, because I know it starts off with like 38 MB default or something. You need to expand. If it runs out, it will actually start throwing things out of memory. So because memory store already specifies the maximum it stores, and if it starts going to exceed that, it will actually just start throwing out things which are not last in use, yeah. Regarding the partition, how do you do the question? Sorry. Regarding the partition, how do you do the question? Questioning for the DB or for the app. So for the app, so when we deploy, we create tags on GitHub. How we do it is we basically, each team creates their own tag if they want to. So because it doesn't matter whether a change on core goes both ways. So what we make sure is that as long as you're changing core, it doesn't affect us, then you are fine. So we basically try not to, we always keep core as something that's backwards compatible, right? We don't introduce breaking changes on core. If you want to introduce a method that's going to change existing behavior, you'd introduce it as a new method. Then you'd ask the two apps to change to that method. So they can do it independently of just changing core. So that's how we keep versioning in track. So it requires a bit of a manual review there, which is where that step comes in. But otherwise, we create releases independent of each other. Sorry. What was the question? Such a heavy usage. Have you ever had any issues with memory leaks? Yes. Quite a lot actually. So we had a memory leak, but most of them have been through gems. We've never had a memory leak that was caused by us. We've had things like where we instantiate too much memory because we load too much active record models. So we need to switch to SQL instead, right? But so we had a gem that used to do push notifications. I do not remember the gem name right now. It was an Apple push notification gem. I think it was pusher, I can't remember though. So it would actually start leaking memory and then it would cause psychic to seg fault. We actually couldn't figure out, other than do a git bisect and figure out. It started somewhere around this release. So it's probably this and then we just switch out the gem and then that's one. The second memory leak we had was, when we tried to use GRPC in Rails. Unfortunately, the GRPC Ruby gem is absolutely horrible. Don't ever use it. We tried to patch it ourselves. We submitted pull requests, but it's very slow and it has a ton of memory leaks. Psychic also has some memory leaks, which is unfortunate. The recent one that they patched up in the latest version for psychic. Our web API app has a small memory leak, which is so small that we've chosen to ignore it at the moment because it doesn't affect us, but we need to figure out, but we haven't still figured out where's the leak from. But majority of the time it's been from gems. If you have more questions, can you talk about how to do it after that? So next up we have young, oh sorry, thank you all for your questions. Ha ha ha.