 Hi, I'm Yuki. And today, I'll be talking about how Tinder implemented Envoy Global Rate Limiting at scale. I'm a software engineer at Tinder in the cloud infrastructure team. And in my day-to-day, I work on Kubernetes and Envoy. So in particular, I've implemented the XDS control plane for our service mesh, which all our Envoys talk to. And today, I'm going to be going into detail about the rate limiting platform I built there, which is also based on Envoy. So in summary, I'll be talking about these things. First of all, our service mesh journey, how we implemented Envoy across all our services, the problems with our previous rate limiting implementations, Envoy rate limiting why it's good, how Envoy rate limiting works in detail, how Tinder's rate limiting works, how we've extended upon it, and just some tips on if you want to try it out yourself. So Tinder's service mesh journey, we were fully on Kubernetes before starting a service mesh migration. So we started the service mesh migration about a year ago. In 2019, it took about a full year for all our services fully to be on Envoy. It was definitely a very big effort. Took many people in the organization to achieve it. And today, about 1.6 million requests per second are meshed in our service mesh at peak. And there's about 200,000 containers in our service mesh. So this is just a chart of how our infrastructure works on the left is a user. It's using the Tinder app. The request gets routed to one of our three clusters. So we have a cluster per availability zone in AWS. There's DNS round robin that routes the request to one of these clusters. And each cluster is isolated. So they do not talk to each other. This gives us a lot of benefits in terms of performance as well as traffic bandwidth cost savings. We have an Ingress layer, which used to be nginx, but now we might get the Envoy. And once the request makes it into a Kubernetes cluster, of course, the request is routed to the correct service. So we tried out Envoy because initially, the default Kubernetes ELB routing was not that great. It was very uneven and resulted in hot pod issues where some pods got a lot more requests than others. But eventually, we started trying out a lot of the other great features Envoy has like these request routing, first of all, right? There's retry, circuit breakers, timeouts that you essentially get for free. Now you're not having to do all this network automation inside the application. It's all on the Envoy network layer. Now we're exploring the Redis filter, DynamoDB filter, where you're proxying database requests through Envoy. That way you get metrics for free. Now, of course, we're doing observability, where all our Envoy metrics are being scraped through Prometheus and then charted on Grafana. So previous rate limiting implementations at Tinder used nginx at the Ingress layer. Problem with this is there was really no visibility. You would need to tell logs to figure out if the rate limiting was even working or what's being rate limited. So that's not ideal. It was all done locally, as in, it was based on the number of requests certain nginx hosts saw, not on the global request count. And that was also very difficult to update the rate limits because you need to roll all the nginx hosts. And then secondly, there were implementations inside the application. So a lot of teams would like roll their own rate limiting code. So there was a lot of different implementations across the org. There's little visibility into how they were working or how they were configured. And often there was redundant infrastructure because a lot of teams would use reddish caches for the rate limiting and there were duplicates of those basically serving the same purpose. So why is Envoy rate limiting good? We were able to move all the rate limiting logic to the Envoy layer. So we have a uniform implementation across the company. We have global rate limiting where the rate limit is based on a global request count. And this is really important for our cluster per AZ model. It's got granular configuration, such as you can rate limit on multiple headers or one header or even no headers. You have monitoring and visibility through the Prometheus metrics it offers. And overall we're able to prevent system failure, patch concurrency issues, and due to stopping like spot traffic, we have a lot of cost savings. And right now it polices up 200,000 requests per second at Tinder. So this is a chart of Envoy rate limiting. It's a request flow of Service A making a request to Service B. So let's start at one, A makes a request to B and Service A has an Envoy sidecar, right? So the Envoy sidecar asks the rate limit service, should we rate limit this request or not? If the answer is yes, if 429 status code is returned to the Service A container, otherwise the request is let through and A successfully makes a request to B. And you can notice here that on the very right, the rate limit service is storing all the request count information in a Reddit's cache. Okay, so let's talk about the rate limit service. It is a Go project deployed in Kubernetes's pause with an Envoy sidecar. And this way we can proxy requests to Redis through Envoy, which gives us metrics for free, as well as being able to tweak some Redis connection settings. We have two separate rate limiting clusters, one for internal routes, another for external routes. And then you might be wondering what happens if the rate limit service is down or if it's slowing down. There is a 20 millisecond default timeout to request to the rate limit service. If it is exceeded, there is a fail open, meaning that the requests will be allowed through by default. And what's nice is that just by updating the rate limit service config map, it is hot reloaded. So if you're changing from say like rate limit from five requests per second to 10 requests per second, update the config map and it will reload automatically. So let's look at the configuration. It's important to note all the Envoy configuration is done in the caller. So here, with this route, you're attaching a descriptor key called foo if that request has a header name of foo, if it has a request with a header called foo, that is. And then with that descriptor, it is sent to the rate limit service and he's the corresponding rate limit service config. You can see here that it has a nested structure which allows for a pretty complex rate limiting logic. And then here for the foo key, there is a corresponding two requests per minute specified. So in the rate limit service config is where you define the thresholds. So let's talk about one use case at Tinder, which is SMS request rate limiting. So what happens is we have a lot of bots requesting a lot of SMS codes, which is very expensive because Twilio is expensive, which is our third party SMS provider. And initially we did have this rate limiting built into our applications, but it was pretty hard to update every time you wanna add a new rate limit and you'd have the right additional code. It was not ideal. So we'd be migrated all of it to Envoy, which had millions of dollars in savings. And we had these very adaptive rate limits. So for each IP, it had a per day and a per second quota and we could rate limit on a combination of headers. So if one user had an IP of 1111, the country is US devices iOS, they would have like a five requests per second rate limit. While if they had a different IP address, their country is in Japan, the devices is a web, they would have a lower rate limit. So on top of this, we also built in a analytics module into the rate limit service. So every time there is a rate limit event, it is sent to S3 for processing and long-term storage. So what we can do is automated block listing. So if like a user was rate limited three days in a row, we may ban them for like 30 days. And now we can also analyze long-term behavior. We can say, you know, how did the rate limiting perform like 90 days ago who got rate limited on what routes. And now we're also starting to be like machine learning on this data to enrich our existing bot detection. We're finding a lot of value in this rate limiting data. So this is just a cool little chart I made. So I got all the requests that got rate limited, got their geolocational data and put it on this world map. You can see that there's really bots coming from everywhere, particularly around where AWS data centers are located. As you can see, there's a couple of big dots in Virginia where AWS US East one is. And then just lastly, some tips. I would scale your infrastructure so that your P999 latency to the rate limit service is 20 milliseconds and not much more than that. Otherwise you're gonna have a lot of fail opens of 5,000 ques per second per rate in the service pod is a pretty good benchmark. I would use the Envoy Redis filter to make your life a lot easier, particularly because this gives you Redis performance metrics and how Redis performance is going to affect how your P999 latency is. So if you wanna just try it out on your laptop, clone this repo, run this Docker compose command and you can just curl your local host and you should get a 429 response like you see here on the bottom right. Thank you, that was all. Shoot me some questions on Twitter if you got any, but thanks for listening.