 Hello everyone, my name is Aram. I go by Phoenix Wizard on Twitter. So who am I? So we can start with that is that I belong to the Kingdom of Himalayas, Nepal. For most of my career, I have been a long-term consultant, which is I've been working with startups writing their first line of code, getting their MVP. After I've also worked with Fortune 500 and Fortune 100 companies to get their product out in the market, a post which I also tried my hands in my own startup, give it some time almost for two years. And right now, currently I am a principal engineer in Zoom car. I head the ZAP team of Zoom car. So basically what ZAP is, is that it's a subscription model of the car and I lead the team that is actually enabling the tech of that part. So I go by Phoenix Wizard in most places. So what I'm planning to cover in this talk is basically to give everybody kind of all the reddish anti-patterns because what I have noticed is that whether it is a very big organization or it's a startup, everybody seems to be doing some stuff of Redis, which in the hindsight, it looks very obvious. But when you are battling your battles at that time, it is not that obvious and it should, how can you make things better? So I will be starting with a primer in which I would be just telling you what Redis is. Obviously everybody knows about it, but why not just mention it once? Then I have a hammer anti-pattern, which is like, when you have a hammer, everything looks like a nail, right? So that's what I'll be talking to you about. Then the duct tape lifeboat is when you are in lifeboat and if it is sinking, you are just going to use duct tape for it. So in this anti-pattern, I'll be showing you how people in the last moment put in Redis as a substitute and how they fail. The another one is Santa stuck in chimney. So just imagine Santa Claus stuck in a chimney. I'm going to talk about that kind of an anti-pattern. So it is basically when you are choking your Redis, make Redis a turtle. During this anti-pattern, I will be just discussing the blocking calls of Redis, the annoying neighbor. Obviously I'm going to talk about clusters and after that flying blind into the hurricane, obviously you need to know what you are getting into. So how can you monitor your Redis instances and kayaking in the Pacific Ocean? If you want to kayak into Pacific Ocean, at least know what you are getting into. Okay. So now Redis is the mythical silver bullet, which everybody loves to talk about. So the moment you open the Redis website, this is the first line. So Redis is an open source in memory data structure, which is used as a database. Now, somehow people just love used as a database part. They just forget that Redis is not supposed to be primarily a database, but it is supposed to kind of help you in solving some specific use cases, but used as a data care database, cash and message broker. People take this to be way too literally. And that is when things go wrong. So let me start with, I have a hammer anti-pattern. So the moment you tell anybody Redis, they'll be like cash key value. And that is where the definition of Redis stops. But the beauty of Redis is that it does a lot of things pretty well. So for example, you have the binary safe strings. So this is like the most basic kind of Redis data structure that is out there. So whenever there is binary safe strings, actually you can put any amount of data. I think the upper limit right now is 512 MB. So any data structure, which holds up to 512 MB, whether it is a JPEG file, whether it is text, whatever you have, you can just put it in strings. The next one is lists. What is list list is basically whatever is linked list, which is it will be a list in which will be kind of ordered according to the order of insertion. Then you have set sets is like completely an unordered set. You want an unordered set of data and you want to kind of consume that. Then there is sorted sets sorted sets is an amazing thing. So just like sauce, it is just like sets, but the beauty of it is that each one of them is given a weight due to which it can be used in leaderboards. So for example, if somebody's on the 10th position to bring them to the second position, all you need to do is to kind of track all of that in a sorted set and you can just put them in the correct order. And of course, there is hashes. Hashes is like one of the most common key value pairs. And I think in the node world in the Ruby world and in Python world, most of the developers are very comfortable using hashes and I have seen people over using and misusing hashes way more. But now you know, Redis with the latest versions after Redis 3.0 has introduced these new kind of data structure, which is bit arrays in which you can actually treat a string like a bit array and actually you can flip it on you can flip it off. At the same time, hyper logs is something pretty interesting. So as you know, if you have to find the count of something, there is, you are going to require the memory to kind of go through each of the elements and counted. But if you use hyper logs, you can get probabilistically the count with an error margin of 2% and the other one is streams. So ready streams are known to be very effective when you are kind of having a. So for example, in zoom car, we have this huge IOT data that comes from the car about the car health and all. So ready streams is a very nice place where we can kind of keep all of that data and kind of even if our consumers are not there, the producer Redis will be kind of just keeping it and we'll ensure that the data is safe till the time consumer is consuming it. Now the duct tape like duct tape lifeboat pattern now. So this is something that I have seen happen many times. So imagine that you have this amazing product, which is completely ready and you are like as an engineer, you're so enthusiastic that yeah, we are going live. Now your client or let's say you are part of a product company and you have sent out a mail blast to all your customers that hey, this is this amazing cataloging service that we have ready. And then all of a sudden, there's a press release in all the newspapers and what happens is that you have 100x traffic. When you build the system at that time, you're like, I think Postgres or MySQL indexes are going to solve all of my problem. Mongo is the boss. Nobody is going to touch us and we can easily scale. But then you realize that your server has choked and your server has completely crashed. Then what will happen? You go online, you say, my server has crashed. What is the next thing I can do? And the first thing any article will tell you to do is why don't you use Redis? Now you are not one of those people who actually goes and understands Redis properly. So what happens over here is that you just decide that hey, there is so much data over here, which I want people to access it very quickly. But at the same time, there are these inputs that the business team keeps telling me, why don't I just transform the data and put it on Redis without persistence, without backup? Because we all love running on the edge, right? So we are like, okay, we want the maximum performance. So let's just put it on Redis and very nicely the system is fine and everything seems to be working beautifully. But you know what happens after two, three weeks, all of a sudden, 3am in the morning, you are sleeping a very sound sleep and you get a call that nothing is working. Why? Because everything was on Redis. Now, when Redis goes down. So what do you expect for it to happen? Now the data that the business had given you, you've not kept it in your database. It was it is just over here in your head. So you have used that data to kind of populate everything on Redis and you do not have any persistence storage that is out there. So but a lot of people do not realize this that Redis does have persistence, but it comes with a caveat. You should know how to use that persistence. So the most basic persistence of Redis, if you decide to use the persistence level is to use RDB, which is Redis database file. So what it does basically is it takes a snapshot of your whole Redis, everything that is there in your Redis and makes it into a database file and all of this can be retrieved. So but this also comes with some gorgeous. Imagine. So what happens in RDB is that there are going to be five minutes, five minute slots in which the data is going the snapshot is going to be there. Imagine at the third minute, all of a sudden your database, your Redis goes down and all the data is gone. Now you have three minutes worth of data, which is nowhere to be found. Why? Because the last backup that you had taken was three minutes ago. So in these kind of scenarios, RDB is not the best solution. But if you think that hey, my data does not change that frequently, RDB is a fairly decent solution as a persistence so that even if your server goes down, you can have it back up. And there are others. So there are other sort of people who really want that everything that is there on Redis should have a record. And for those kind of people, we have the append-only file, which we also called AOF. So what happens in this is that every time, so the last parameter that you can see append f sync, this decides with what frequency it is going to be written. If you say no, you are like the first friend that we spoke about where you are like, hey, I want everything on Redis without any persistence. No, but the moment you make it every second or always, what it means is that before every write before every before the data is put on Redis, it is actually written in the AOF file that is when it is always and one second means that with the interval of one second, just ensure that the whole data is on AOF. So what happens in AOF is that just like a change log. So it is kind of a series of change logs that every write operation or delete operation that has happened. All of that is recorded in your AOF file. And so when the server goes down, basically that whole file is kind of replayed. Now, but there are some gotchas. Obviously, if this was the best solution, I would have given you this in the first place. So the gotcha is that AOF file is way bigger. So imagine you are taking a snapshot. Obviously, it's going to be a smaller file. But the moment you are kind of taking the whole change log, it is a much larger file. And another gotcha that is there, which a lot of people do not talk about is that there are certain data structures which do not replay very well on AOF. So it might actually lead to a loss in data. So this also brings you to another point which a lot of people do not understand is that Redis many a times says that okay, we are good enough with data. So if you if there if a little margin of error is going to kind of spoil things for you, Redis might really be a spoiler for you at that point. And but what is the best way you tell me? So we saw that with RDB. The thing is that the time slots that are there, the if there is a time slot in which the backup is not happening. At that point, you will be losing that thing. And in AOF, they're going, there is going to be some performance overhead. Well, the best part you can do is that you can do both of them actually. So what happens is that for the slots that RDB is defined, let's say I say that for every five minutes, a RDB snapshot is going to be taken place. Then for the remaining of the time, whatever keys are added, all of that is recorded in AOF. So what happens is that you are getting the best of both of the worlds. If you go through the Redis documentation, they have actually said that in the future, they are planning to kind of make this the default option and solve the problem. But for most of the people, one of these is going to solve the problem anyways. But the problem becomes when you do not know which one is the best solution for you. Now, all of a sudden you've listened to this talk and you're like, Hey, now I understand persistence. You go to a production server. You are not using their two cases. You are using RDB and you're like, Hey, let me just switch on AOF and see what happens. And you restart the server. Believe me, you don't want to do that. So what happens? Imagine what is append only file? I said that the whole change log is replayed. Now, when you switch on AOF and restart the server, the AOF file is created. And when it is replayed, it is an empty file. So what happens at the end of the day is that everything in your Redis just goes for a toss. So why would you do that? So obviously Redis, you should not do this. How would you do this in the runtime only? You can actually set the append only as yes. So this is a command. Just use this and automatically your AOF file will be created. All the change logs will be written to it and you will be like, Okay, finally, I have my strategy in place. Now let's come to Santa stuck in a chimney. We all have seen Santa stuck like this, right? Bringing your gifts. Now let's imagine that you are this amazing developer. Okay, you have listened to my previous slides and you're like, Hey, I have my persistence ready. And you are like, Hey, now this time nothing is gonna watch us down. But you realize that every five, 10 minutes, every half an hour without any pattern, all of a sudden the system just stops responding. So it's as if you do not even have Redis. It is just like you have just a database. But then what you realize is that when you look at it, you realize that, Hey, the business team had told us that every 10 minutes, we need a report. And that report actually requires millions of keys to be processed. Now imagine there is a single Redis instance on which every 10 minutes there is going to be a bombardment of queries. And at the same time, this is the Redis which is serving your customers in production. Obviously, there is going to be a choking, right? So now I want people to read this with Bohemian Rhapsody in their head. Then you will appreciate it. So what people forget is that Redis is single threaded at the end of the day. Redis is a single threaded process. And at some point, no matter how much CPU you are giving, how much RAM you are giving, it is going to reach its limit in which there is going to be a performance issue. So always be wary of this. But the moment you say that, Hey, Redis is single threaded, there will be a gentleman who would just come to me and he like, No, Redis is not single threaded. But actually all the main tasks that Redis does, or most of those are kind of performed in a single threaded manner, while the backups and all of those things and deletions of objects. So these kind of happen in the background. But most of the thing is happening on the single thread. So I would strongly recommend everybody to be very cognizant of this fact. And that is how you should architect from day zero. Now make Redis a turtle anti pattern. This is very interesting. Now you have you all of a sudden you realize that, Hey, there is this report or whatever I have to create. And for that, what I need is that all the keys or keys is just an example that I'm giving that I need all the keys and I need to process all of those. And based on that, I have to do some computation. Imagine that you've got tens of millions of rows and you run the keys command. Do you know what happens when that happens literally till the time that is processed, the nothing can be processed by Redis. Why? Because in Redis, there are stuff that is called blocking calls. Keys is a blocking call. Now there are other ways to kind of forego this issue for that you have this command which is called scan, which really solves your problem. But the thing is that you should be very cognizant that in Redis, whatever commands you are using, you should understand whether it is blocking or not. For example, I gave you the example of scan. What it does is that it literally it it is a cursor based browsing thing. So it takes, let's say whatever offset you give, let's say we've given an offset of 10, it gives you 10 keys at a time and it will give you a pointer to that. And after that, you can again query for the next keys. It has some drawbacks that obviously it is not going to ensure that the same key is not returned multiple times. But at the same time, what it does for sure for you is that it will just ensure that the system is not blocked because Redis is supposed to do millions of requests per second if you have a good enough cluster setup. But because of your one command, all of a sudden it is blocked. So it's better to kind of figure out what are which are the non blocking ways in which you can do this. And if you tell me that, hey, I don't only want scan, but I also wanted to kind of tell me a pattern that is there, match will help. But a lot of people don't realize that when you use scan and match together, match will only be working on the subset of scan. Believe me, I've seen people literally literally trying to solve this problem for days. And then that is when they realize it. The annoying neighbor pattern now annoying neighbor is very clear, right? Annoying name. When do you have a neighbor when you are living with someone? So Redis clusters. So the moment we say Redis clusters, what a lot of folks do not realize is that Redis clusters are not as mature and as amazing as other rdbms or no SQL databases work in clusters. So you should be very cognizant of the limitations that Redis has over here. To give you an example, you remember we spoke in data types about sorted sets, sorted sets does not work. Not all the commands of sorted sets work as it showed in a in a cluster because imagine that you've got a sorted set. Half of the keys are going to be in one partition and the other in the another partition. So obviously that is going to lead to some problems. And at the same time, you cannot very clearly say that, Hey, I have a master. I have 10 masters, which master is actually going to get the maximum traffic and I can just scale it up or scale it down. So Redis clusters are very tricky. And what really also makes it a little more tricky is that you cannot have a master to master. You are going to always going to be having a master to slave kind of an infrastructure. So imagine there's a master to slave for replication. At the same time, the whole cluster is full of master to masters, sorry, masters who are talking to their slaves. Now in this kind of a scenario, can you imagine the amount of network chatter that is going to be there? That is a noisy neighbor. And a lot of people do not realize that your performance of Redis very much depends on your network. If your network is so noisy, it is going to have some repercussions, which you would not have even thought about. And while these kind of problems are there, while all of these anti patterns are known and that we have all faced, you should also know that, hey, there is a problem I need to solve because many a times people do not even realize that there is a problem. So if you want to solve a problem, you obviously need some tools or you need to know what you are looking for. So one of the most basic things is that is the process running. What is the uptime? What are the metrics around it? And at the same time, you also are very much, you should know that how is my system behaving? What is my load average of the system? How much of my CPU is used? How is the memory usage? The swap usage? So swap usage is very important for Redis. Imagine you are using Redis for this amazing caching stuff. And all of a sudden what happened is that you realize that my caching is not working as fast as it should. When you look a little clear, clearly, you will see that your memory has been full. And because of that a lot of the keys are taken from the swap. So imagine you have gone from the efficiency of a RAM lookup to a disk lookup. So obviously there are going to be repercussions. So if you do not know that the swap usage is so high, there is no way for you to solve this problem. Network bandwidth. Obviously, network bandwidth is one of the main region of latency among with Redis. So imagine that you have got you should just ensure that there is minimum amount of network latency between your machine and Redis. And that is going to help you a lot. Because I have seen cases where because of too many bounces, the latency was so bad that people were like, Hey, we could have just had a very good index on our database and it would have solved our problem way better. And these are some of the other parameters which are given to you by Redis, which you should always be kind of very often that connected clients, how many clients are connected to the master in just remember that whenever there is a replication call that is happening between your master and slave, it happens asynchronously. But at the same time, the slave many times give these gives these acknowledgments to master. So the more the more number of slaves there are to the master, you should be a little wary of that and kind of figure out how to make it better. So this is going to help you with that key space. You should know that how many keys are there in your database and you've got to know that how is the IO going on. So all of these these this is not a very extensive list. I will be longer talking about the Redis CLI command in some time in that you can actually get all the parameters and based on what actually you want to know, you can track all of those things. These are some very important things which you need to track latency as I've been talking about since such a long time used memory. So you so Redis lets you to actually set the amount of memory that it can use. Please use that because if you give Redis unlimited memory, it will just eat it up. So it would be really good to know that hey, I have got let's say a 12 gig system. How much of it am I going to let Redis have it and how can you track that? If if there is if you are again and again seeing that your memory consumption is going high, then obviously it's time for you to upgrade your machine. So it's really important that you track this. Another very important thing over here is the evicted keys. So it's very interesting that when you are using Redis as a LRU cash, actually the key eviction strategies are something which a lot of people just ignore about it. I would strongly recommend everybody to read about it. For example, the latest version, I think the least frequency used key is the default thing, which is a very probabilistic implementation of LRU. But there are other strategies in which the key eviction happens and if the key is not found and error is given. So many times you do not want a key to be removed just because the server is full. So be very, very of how, what kind of a key eviction strategy you are giving and to make those decisions, you need to track that. That is what is very important. But you will be like, hey, you told me all the problems that I need to track. But who is going to be my savior? Which is the one silver bullet that is going to help us track all of those? So actually Redis CLI info is an amazing command. With that you really get a lot of these parameters. And I know that there is some node Ruby and Python implementations of this, which gives you this amazing dashboards. You can look it up on GitHub and they will kind of give you all of the output of Redis CLI info in a much more visual form. But if your organization is already on New Relic, you can use New Relic. Otherwise you can use Nagios Icinga. Also Redis Labs has a pretty good monitoring system, which you can just check it out. It's pretty good. Now we have seen that how we can monitor everything. But the problem, many times the problem that comes is that we do not realize some of the underlying gotchas, which might not be that obvious. For example, how fast is your fork? So what happens is that every time there is going to be any kind of background job that is supposed to be done by Redis, other than its main job, it has to fork a process. And the speed at which it is done is very, very important. Imagine that you are on a medium or a small machine on AWS. It will be way, way slower than what it will be on a C-class machine. So be very, very off how fast your fork is because this can actually lead to a lot of performance issues. And as I spoke about swapping, you should be very aware of that. The RAM IO and disk IO is very different. And this pattern is again not only restricted to Redis, right? Any Linux application, if a lot of your memory is going to be used, it is going to use the swap space and automatically, it is going to have some performance repercussions. So be very, very off all of these things. And as I said, the persistence strategy that you use, if you put F-Sync on, always believe me, you are not getting the performance from Redis that you want to. Put the F-Sync at one second when you are doing AOF and RDB also you can keep it at five minutes. With this kind of a strategy, most of the time you are pretty sorted and you should not have a lot of problems. Some other stuff that you really need to think about when you are kind of deciding about how your architecture of your system should be is your network latency, so that you know that, okay, is everything in the same place. CPUs. So what it has been noticed is that Intel CPUs just win with Redis. This is actually written in their documentation that with AMD processors, actually the performance goes down. So you should know this. And at the same time, RAM speed for smaller objects does not make much of a difference, but for larger objects, it does actually make a lot of difference. And also if you really are in the, if you are really in the business of good performance and you really want to milk Redis, then you better use bare metal because on VM it is known that it is not going to work very well. And another thing is that most of my slides have been based on my experience working with these organizations and the amazing Redis documentation. Believe me, it is awesome. So it is one of those things that you really want to give a chef kiss to it. You are like, wow, this is something that I really enjoyed reading. So I would strongly recommend you to go through the Redis documentation. It is really good. Now for a quick conclusion, Redis is not a silver bullet. So Redis does some stuff very well. Redis can do cashing very well. Redis can do distributed locks very well. Redis can do a lot of things very well. Even leaderboard with sorted sets, it can do very well. And also a lot of people use Redis for creating rate limiters. That also works beautifully, but you should know exactly what is the best case for you to use Redis. Understand all the gotchas for it. Do not just jump on it. And yeah, so that's all from my side. I'm Arun, thank you. Questions? Thank you. Yeah, hi. What are your thoughts on storing files in Redis? Temporary files? How big are your temporary files? Around 10 to 15 MB. Sorry? 10 to 15 MB. 10 to 15M, it depends how frequently are you going to use them again. Okay. So see right now, the biggest problem that you are going to face with that kind of a 10 to 15M files is that is it going to be many files or is it going to be a few files? So it's a use case like Gmail. So a user will upload a file. Okay. While the file is uploading, he can write the email. Okay. And after he's done writing the email, he will just send it. So till that time, I want to store this file in a temporary location. Yeah, for that, it's pretty fine. But if you tell me that this is going to be the same file that a lot of users are going to kind of access. At that time, I will be a little vary because then the partitioning aspect of it will come. It won't be accessed. Yeah, exactly. So for that kind of a scenario, it's perfectly fine. Because as I told you, it can go to 512 MB with the string data. So you can put any file, anything over there. So it works pretty well as, but please ensure that whenever you are using for these kinds of scenarios, use different Redis instances for other different stuff. If you're using Redis for queuing mechanism, it should not be the same Redis instance, which is actually handling all of these use cases. When you have a Redis instance, which is dedicated to your use case, it will work fantastically well. But if let's say that 10 or 15 people are at the same time kind of uploading files and there are let's say 100,000 people again, trying to retrieve something which is stored in Redis as a data, when it is used as a database cache. So at that time it can become tricky. So as long as you know that, hey, this Redis instance is specifically for this purpose, it works beautifully well. Okay, thanks. Hello? Yeah. Have you tried any other like in-memory cache processes like AeroSpike in name instead of Redis because like I think AeroSpike works in somewhat faster than Redis in some cases. Yeah. Basically when you're dealing with the larger data sets. Yeah, so AeroSpike, the beauty of it is that it actually works better because it also takes care of how to optimize the disk IOS. And AeroSpike I did not mention. So actually, most of the Node community and JS community are very much aware of Node Redis and they've used it a lot. AeroSpike is mostly because of the pricing. A lot of OGS do not use it, but AeroSpike is wonderful. As you said, for larger files, AeroSpike works beautifully for any data set because they have optimized that how can you minimize the lookups and it's fantastic. Yeah. So you mentioned that the IoT data, you can take it into the Redis streams. So for IoT data, we can take it into Kafka or any other such kind of thing. So when will you go for a Redis instead of a Kafka? So actually the stream is just now released in Redis 5.0. It is not even that battle tested, but so we had used it as a POC because right now we are also using Kafka for the IoT data. We had used it for POC and we were very surprised about how well it worked for us. So we have ourselves not gone into production right now. But according to everything that I have read and according to the POC we did, it seemed like a fairly stable solution which can be taken to production. Thank you.