 Hi everyone, I am Varsha, I work at Capillary Technologies Bangalore, I work for the platforms team and we built a rate limiting module at Capillary Technologies, so I will be talking about it. So, the agenda for the day would be why we required a rate limiting module to be built at Capillary, what were the requirements or the possible solutions to implement rate limiting, but why Redis was chose over all the other possible solutions, implementation details like the algorithm, design of the data model, iterations it went through, few production numbers on throughput memory and further enhancements that are being planned. So, basically Capillary Technologies it provides a cloud based customer engagement and details like solutions like loyalty, campaigns, referrals, having a life cycle marketing, we usually send out communications to customers through SMS and emails. So, yeah as is a multi tenant, multi tenant platform with and also we have opened our platform to have integration with a lot of clients. So, there are few common concerns that everyone faces like your API can get abused where if this is done there is a cascading effect on all your downstream modules as well. So, and also you can have a very big client and that client can use up all your resources, hog all your resources and other clients might stop. So, and also as I said a lot of communication happens with the consumers, long back there was an incidence where there was a bug in the system and because of which we ended up sending 15 messages to the same customer in less than a minute. So, such thing they should definitely be a mechanism to control such things and also we have our e-commerce module where we want to limit the consumer apps per app ID or the access token that is generated. The idea was to develop a generic rate limiting module where each of the services which we have can make use of it to limit to record or limit the number of calls that are being made to the entity and it had to be the most important thing was latency because of for example to send a message itself the SLA which we have set is less than half a second. So, there is no way that we spend a lot of time in this and looking at the traffic which we are getting right now it really needs to scale possible solutions we could think of is the traditional MySQL, but then it is displaced and the queries during high loads the queries may be really slow and also with the traffic which we are experiencing the DB servers cannot handle such high QPS at all. So, there is no way we can scale with SQL and another option would be memcache obviously memcache is very fast, but it provides very low level data structures like your key value pairs and expects a lot of logic to be and the complexity to be in your application rather than transferring it to your which you are using and yeah it is volatile as well. So, if your memcache crashes or your data is lost there is no way you can recover it back. Another option would be it does not fit the use case as such MongoDB again it is it does not have an in memory mode and it is very slow especially when your writes and reads are at the same level and also it is document based so does not suit the use case much. So, the best thing with suited R use case was Redis as everyone knows it is extremely fast and it provides a lot of complex data structures like sorted sets list and hash maps that we spoke since morning and a lot of cool commands which we can run over them and the most important thing was the persistence and also the guarantee which it provides through various options like snapshot, period in time snapshotting and up and only file logging and also all these parameters are tunable so you can tune it based on your requirements and also keeping in mind the performance which you expect out of it. So, that is one good thing one more is batch processing Redis provides pipelining so you can get result for certain set of commands at once so your processing will be really fast and basically it eliminates the network latency which is involved when you are doing it one by one rather than doing it in a batch so that way it becomes very fast and also you can have when you have a lot of when your database is growing you can have a cluster of Redis servers and also you can have a master slave configuration when we spoke about this right when a master goes down you can promote your slave and such things. So, Redis is really good in the in the library which we have implemented following other terminologies so key counter is the entity that is being tracked and for example here you can see that if at all you want to limit the number of messages that are being sent out or the API calls in such cases for messages it would be your key would be your customer's mobile number or the email and in case of API it could be the user ID or the org ID scope nexus scope scope is the category under which your key belongs to so in for messaging it it could be I am sending a coupon kind of message or I am sending I am sending an OTP message or this is a campaign message that is going to a customer so such so these are the scopes and if it is an API your scope could be a resource or a method then namespace namespace as I said this is a generic library which is used across all the modules so namespace defines the client which is using the library for example communications would be for messaging and then API could be another namespace yeah window is the time window is the time range during which you want to you are having the rate limiting so currently we support only hourly daily and minute windows I think I am talking a lot any questions till now next is the limits so these soft limits are configurable by each service and also each organization can have its own configuration set for the limits and these limits are at the scope level there is also a provision to write list some of your keys so that you when you provide a reg X all the keys which match this particular reg X would be skipped and limit won't be applied on these keys this was the first implementation which we thought of where we use this base each time a new each time a call is made for an entity we record we push the timestamp at that particular time to the list suppose to the list if at all the length of your list has not exceeded the limit yet if at all the limit has already exceeded you pick up the earliest timestamp and see if and compare it with the current timestamp to see whether it's in the time window range which you have specified if it is within the range it means the rate has exceeded else you can push the the new timestamp and remove the oldest one so they were with this approach it here there is only one window which we have here you can see 60 seconds but we wanted multiple windows and also more flexible limiting to be done so for example if you want to support an daily window as well as an early window you can't always pick up the earliest one in case of you in case of a daily window then you'll have to in case of daily window and the limit being set to 10 and early window with a limit set to 5 and you want to check the limit for it you had to pick up the fifth fifth element and then compare the time range so this added a lot of complexity to the application code and also removing the counters from the and moreover you are storing the timestamps so your memory footprint also increased yeah yeah higher my window is increasing the memory is increasing so and also list doing searches and list is of order n and also there will be few there can be raise conditions at application level though red is a single threaded and atomic operations happen there might be raise conditions in your application itself so we thought of keeping it very simple in our next iteration we just made use of a key each time a new call came increment that key if it was the first time the key is entering into red is then set an expire with a TTL equivalent to the window window time and suppose your current values has exceeded either of the limits then it means the rate has exceeded so by this the memory usage dropped a lot and also it is thread safe because all these operations are atomic and there is no way there can be a raise condition and but the earlier approach had a moving window where at any point of because there can be conditions that your API calls might hit at the last last second and again at the first second so that was handled in the previous design but not in this there is no moving window over here but we we performance wise this this was very good this was a model so the counter is the combination of my namespace the scope key and the window and count we are you are just incrementing the count each time a new call is made and limits is to the hard and the soft limits are to basically soft limits are configurable the user or the organization can set whichever limit he wants hard limits are just to avoid someone by mistake setting a really high value so for example in limits you can say I can I want to send only five coupons for a customer in a day and only one reminder message in a in an hour or only two campaign messages in in four or five hours so such kind of flexibility we had with by specifying such a data structure so here I am I want to see if I can send a message to to the mobile number given here basically it is a message which sends a voucher to the customer and so my key would become this is a communication namespace since it is a message which is going out and it is a voucher to this customer and for the in the early window my current count is two and the allowed one is four and five so obviously the rate has not increased and you allow the message to go out of the system even after that iteration we did further enhancements like caching the limits instead of hitting your Redis each time and and also pipeline a pipelining helped a lot basically whatever messages that were left we pipeline all those commands to increment together and execute them and then pipeline all the expire commands to be fired and then execute so by this two process or 2000 messages of a batch are time reduced from 19 sorry 9 seconds to 11 milliseconds by just pipelining and yeah as I said there is a provision to whitelist keys also that was added later on and we have a centralized help dashboard kind of place where all such metrics which so whenever a rate limit exceeds you feed it to the health dashboard so that whoever concerned in that department will be able to check in a day how many exceeds happened in an hour what happened and so such things so these are the production numbers we are right now achieving around 11,000 request per second basically each request will be approximately equivalent to one or six commands and so basically we achieve around we are hitting around 50 K commands per second using Redis and the memory footprint is also very low considering that we are storing the entire email id is mobile numbers and such huge keys the memory used is around 1.35 GB for 10 million counters no so yeah it is not just about limiting the same data use the same data model and remove the limits and it becomes your rate monitor so you keep you will have a different counters for different keys which you have mentioned and also they are using this sorted sets you can actually have a you can see the trends of your latencies how your API latencies are doing and or is suppose you want to have you have any contact with the third parties how their response is what is the latency and such stuff so in sorted sets for your timestamp becomes your score and you keep feeding your latencies in the further set and you want to see between so and so time what how was my performance how did my API perform so you can just do a range of between those timestamps and get the data out of it so this is how we do in our dashboard this is for the messages so each time a message goes whether it's delivered it got bounced or rejects we increment that particular key and and there is a time series as well so we just extract between those time series and get a whole graph out of it and this is a similar one this is a real time Q which shows which says in this particular Q how many messages are yet to go if for such things we obviously can't use relational database and this is really fast so this is all the next step the immediate order that we have the next the new thing that's coming as part of plus when the master goes down it's the new slave the same goes down in the master so sentiment is a part of that it has a positive load amount between a lot of clusters sort of so when one of the most then super value these are awesome because it's a new process control code so it's a new it's like you're in these scripts so there's a lot of processes running so it's just a configuration let's say there are two processes the supervisor will actually monitor those processes let's say the process one the process goes down supervisor will see that and will help in the operation script so it will leave the operation script and restart the process so you can spawn new processes you can spawn new plans from the processes and so I don't know if we can work with the human he actually goes very well with that obviously I think it's a pretty good option for that is when it goes down not as much as it was to restart it but just the window when writes don't happen right no matter what you configure it is a small window so there's a bit of risk that you might lose some data yeah that's that's that's to do and that's one of the reasons why we use both RTB and AOL right because we also want both durability and performance so AOL has the best durability so when we make you know I mean it best restarts so we maintain both RTB and our F-Sync is always one second so sorry about this but I actually the slide where you describe the solution that works for you okay this one it's really simple it was a trade-off between what was the requirement and in our case it was more of performance and to have a very less memory footprint and the latency so that's the reason we came up with this approach especially during while you have new years or Diwali you will be sending around 10 million messages and such sort within a time frame before we have it in case it exceeded the whole global resources so there was a very bad cascading at the time the other one because anyone tried the graphic database approach where you keep the account between the window and keep a number of windows it's really tough to just get the last value back all I have is they're really good for things like rates so what they'll do is they'll normalize for the amount of time that has gone that works really nicely in these RTBs or just put a value and read that value back people are working on a lot of databases built on top of libraries although it's not very smart that was like using an extra tool and managing another one so just reuse the one which we were already using you can have different windows for example hourly, minute or the daily so if it is a minute your window will be 60 so accordingly if it's hour so many seconds in an hour basically what it does is while it just keeps implementing the account and only when a call comes get the account as a piece of that and another reason why we didn't use hash maps was it doesn't allow you to set expires on the hash keys though hash maps would have utilized much less of memory and might have been faster but anyways we are not doing a get anywhere it's just an increment which is also of order one so the only two commands which will be done would be increment and the expire and expire would be only on the first time only on the yes and we didn't want to have that complexity in our application also there's an awesome command where usually each write is being appended to the file so your aov file can grow really huge so there's this command I forgot the name so that when you run that command it reconstructs your aov file to a minute yes and that way your restart would be much faster than having the so we have a particular minute break thanks guys